pypi.org "warc" keyword
View the packages on the pypi.org package registry that are tagged with the "warc" keyword.
otmt 1.0.5
Tools for determining if web archive collecions are Off-Topic9 versions - Latest release: almost 4 years ago - 1 dependent repositories - 1.39 thousand downloads last month - 9 stars on GitHub - 1 maintainer
ipwb 0.2024.10.24.1853
InterPlanetary Wayback (ipwb): Web Archive integration with IPFS244 versions - Latest release: about 1 year ago - 2 dependent repositories - 1.25 thousand downloads last month - 606 stars on GitHub - 2 maintainers
mseep-mcp-server-webcrawl 1.0.0
MCP server for search and retrieval of web crawler content2 versions - Latest release: 5 months ago - 9 downloads last month - 0 stars on GitHub - 1 maintainer
Top 8.7% on pypi.org
13 versions - Latest release: 11 months ago - 2 dependent packages - 6 dependent repositories - 5.22 thousand downloads last month - 21 stars on GitHub - 1 maintainer
cdxj-indexer 1.4.6 💰
CDXJ Indexer for WARC and ARC files13 versions - Latest release: 11 months ago - 2 dependent packages - 6 dependent repositories - 5.22 thousand downloads last month - 21 stars on GitHub - 1 maintainer
warcbench 0.1.0
A tool for exploring, analyzing, transforming, recombining, and extracting data from WARC (Web AR...1 version - Latest release: 4 months ago - 22 downloads last month - 2 stars on GitHub - 1 maintainer
cocrawler 0.1.14
A modern web crawler framework for Python8 versions - Latest release: over 4 years ago - 1 dependent repositories - 22 downloads last month - 190 stars on GitHub - 1 maintainer
Top 7.2% on pypi.org
33 versions - Latest release: 8 days ago - 4 dependent repositories - 6.33 thousand downloads last month - 157 stars on GitHub - 1 maintainer
cdx-toolkit 0.9.38
A toolkit for working with CDX indices33 versions - Latest release: 8 days ago - 4 dependent repositories - 6.33 thousand downloads last month - 157 stars on GitHub - 1 maintainer
mcp-server-webcrawl 0.14.3
MCP server for search and retrieval of web crawler content35 versions - Latest release: about 2 months ago - 148 downloads last month - 26 stars on GitHub - 1 maintainer
warcdb 0.2.2 💰
WarcDB: Web crawl data as SQLite databases4 versions - Latest release: about 2 years ago - 40 downloads last month - 406 stars on GitHub - 1 maintainer
forum-dl 0.3.0 💰
Scrape posts and threads from forums, news aggregators, mail archives3 versions - Latest release: over 2 years ago - 86 downloads last month - 66 stars on GitHub - 1 maintainer
warcreader 0.4.3
Library for reading HTTP responses from WARC (Web ARChieve) files9 versions - Latest release: about 9 years ago - 2 dependent repositories - 8 downloads last month - 1 maintainer
scrapy-warcio 0.0.8
Scrapy WARC I/O8 versions - Latest release: almost 6 years ago - 1 dependent repositories - 67 downloads last month - 22 stars on GitHub - 1 maintainer
warc2zim 2.2.2 💰
Convert WARC to ZIM33 versions - Latest release: 9 months ago - 1 dependent repositories - 281 downloads last month - 44 stars on GitHub - 1 maintainer
Top 4.3% on pypi.org
71 versions - Latest release: 8 months ago - 2 dependent packages - 4 dependent repositories - 314 thousand downloads last month - 119 stars on GitHub - 1 maintainer
resiliparse 0.15.2
A collection of robust and fast processing tools for parsing and analyzing (not only) web archive...71 versions - Latest release: 8 months ago - 2 dependent packages - 4 dependent repositories - 314 thousand downloads last month - 119 stars on GitHub - 1 maintainer
Top 3.3% on pypi.org
77 versions - Latest release: 8 months ago - 6 dependent packages - 5 dependent repositories - 328 thousand downloads last month - 119 stars on GitHub - 1 maintainer
fastwarc 0.15.2
A high-performance WARC parsing library for Python written in C++/Cython.77 versions - Latest release: 8 months ago - 6 dependent packages - 5 dependent repositories - 328 thousand downloads last month - 119 stars on GitHub - 1 maintainer
Top 4.0% on pypi.org
76 versions - Latest release: almost 2 years ago - 4 dependent repositories - 3.32 thousand downloads last month - 24,925 stars on GitHub - 1 maintainer
archivebox 0.7.2 💰
Self-hosted internet archiving solution.76 versions - Latest release: almost 2 years ago - 4 dependent repositories - 3.32 thousand downloads last month - 24,925 stars on GitHub - 1 maintainer
metawarc 1.1.1
metawarc: a command-line tool for data extraction from WARC files (web archives)3 versions - Latest release: about 3 years ago - 1 dependent repositories - 14 downloads last month - 34 stars on GitHub - 1 maintainer
scrapy-webarchive 0.4.0
A webarchive extension for Scrapy4 versions - Latest release: 9 months ago - 16 downloads last month - 7 stars on GitHub - 1 maintainer
archivebox-likn 0.6.3 💰
The decentralized hosted internet archive.4 versions - Latest release: over 2 years ago - 48 downloads last month - 19,808 stars on GitHub - 1 maintainer
basc-warc 0.0.1
Create and manage WARC files. Currently in planning / pre-alpha stage.1 version - Latest release: about 2 years ago - 3 stars on GitHub - 1 maintainer
internet-archive-extractor 0.0.7
Tool for extracting archived web sites from the Internet Archive saving as WARC files.7 versions - Latest release: 24 days ago - 436 downloads last month - 0 stars on GitHub - 1 maintainer
warc-extractor 0.1.1
A simple tool for extracting warc files.2 versions - Latest release: over 3 years ago - 56 downloads last month - 78 stars on GitHub - 1 maintainer
Top 2.2% on pypi.org
23 versions - Latest release: 11 months ago - 20 dependent packages - 150 dependent repositories - 2.23 million downloads last month - 326 stars on GitHub - 1 maintainer
warcio 1.7.5 💰
Streaming WARC (and ARC) IO library23 versions - Latest release: 11 months ago - 20 dependent packages - 150 dependent repositories - 2.23 million downloads last month - 326 stars on GitHub - 1 maintainer
archivebox-example 0.8.6rc2 💰
Self-hosted internet archiving solution.1 version - Latest release: 12 months ago - 25,010 stars on GitHub
cdxsummary 0.1.1b5
Summarize web archive capture index (CDX) files5 versions - Latest release: about 4 years ago - 1 dependent repositories - 70 downloads last month - 75 stars on GitHub - 1 maintainer
Related Keywords
python
10
web-archiving
8
web
5
wget
4
archive
4
webarchive
4
digipres
3
internet-archiving
3
archivebox
3
backups
3
memento
2
mcp
2
mcp-server
2
mcp-servers
2
web-archives
2
pinboard
2
pocket
2
web-archive
2
cdx
2
rss
2
self-hosted
2
singlefile
2
youtube-dl
2
wayback-machine
2
scrapy
2
scraper
2
bigdata
2
cpp
2
cython
2
extraction
2
htmlparser
2
bookmark-archiver
2
browser-bookmarks
2
chromium
2
archiving
2
firefox
2
knowledgebase
2
headless-browser
2
zim
1
openzim
1
webcomponents
1
summary
1
statistics
1
report
1
nodejs
1
collection
1
download
1
browser
1
puppeteer
1
bookmarks
1
preservation
1
web archiving
1
internet archiving
1
pywb
1
internet-archive
1
webarchive-data-scraping
1
wacz
1
WACZ
1
WARC
1
Webarchive
1
Scrapy
1
webarchiving
1
warc-files
1
osint-python
1
osint
1
metadata
1
offline
1
web-crawling
1
parsing
1
library
1
harvard
1
analysis
1
llm
1
mseep
1
service-worker
1
memento-rfc
1
docker
1
wayback
1
odu
1
distributed
1
ipfs
1
archives
1
http
1
topic
1
timemap
1
simhash
1
measure
1
cosine
1
offtopic
1
similarity
1
webarchives
1
simplemachines
1
phpbb
1
forum
1
discourse
1
data-fetching
1
web-data
1
sqlite
1
database
1
crawling
1