An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "warc" keyword

View the packages on the pypi.org package registry that are tagged with the "warc" keyword.

otmt 1.0.5
Tools for determining if web archive collecions are Off-Topic
9 versions - Latest release: almost 4 years ago - 1 dependent repositories - 1.39 thousand downloads last month - 9 stars on GitHub - 1 maintainer
ipwb 0.2024.10.24.1853
InterPlanetary Wayback (ipwb): Web Archive integration with IPFS
244 versions - Latest release: about 1 year ago - 2 dependent repositories - 1.25 thousand downloads last month - 606 stars on GitHub - 2 maintainers
mseep-mcp-server-webcrawl 1.0.0
MCP server for search and retrieval of web crawler content
2 versions - Latest release: 5 months ago - 9 downloads last month - 0 stars on GitHub - 1 maintainer
Top 8.7% on pypi.org
cdxj-indexer 1.4.6 💰
CDXJ Indexer for WARC and ARC files
13 versions - Latest release: 11 months ago - 2 dependent packages - 6 dependent repositories - 5.22 thousand downloads last month - 21 stars on GitHub - 1 maintainer
warcbench 0.1.0
A tool for exploring, analyzing, transforming, recombining, and extracting data from WARC (Web AR...
1 version - Latest release: 4 months ago - 22 downloads last month - 2 stars on GitHub - 1 maintainer
cocrawler 0.1.14
A modern web crawler framework for Python
8 versions - Latest release: over 4 years ago - 1 dependent repositories - 22 downloads last month - 190 stars on GitHub - 1 maintainer
Top 7.2% on pypi.org
cdx-toolkit 0.9.38
A toolkit for working with CDX indices
33 versions - Latest release: 8 days ago - 4 dependent repositories - 6.33 thousand downloads last month - 157 stars on GitHub - 1 maintainer
mcp-server-webcrawl 0.14.3
MCP server for search and retrieval of web crawler content
35 versions - Latest release: about 2 months ago - 148 downloads last month - 26 stars on GitHub - 1 maintainer
warcdb 0.2.2 💰
WarcDB: Web crawl data as SQLite databases
4 versions - Latest release: about 2 years ago - 40 downloads last month - 406 stars on GitHub - 1 maintainer
forum-dl 0.3.0 💰
Scrape posts and threads from forums, news aggregators, mail archives
3 versions - Latest release: over 2 years ago - 86 downloads last month - 66 stars on GitHub - 1 maintainer
warcreader 0.4.3
Library for reading HTTP responses from WARC (Web ARChieve) files
9 versions - Latest release: about 9 years ago - 2 dependent repositories - 8 downloads last month - 1 maintainer
scrapy-warcio 0.0.8
Scrapy WARC I/O
8 versions - Latest release: almost 6 years ago - 1 dependent repositories - 67 downloads last month - 22 stars on GitHub - 1 maintainer
warc2zim 2.2.2 💰
Convert WARC to ZIM
33 versions - Latest release: 9 months ago - 1 dependent repositories - 281 downloads last month - 44 stars on GitHub - 1 maintainer
Top 4.3% on pypi.org
resiliparse 0.15.2
A collection of robust and fast processing tools for parsing and analyzing (not only) web archive...
71 versions - Latest release: 8 months ago - 2 dependent packages - 4 dependent repositories - 314 thousand downloads last month - 119 stars on GitHub - 1 maintainer
Top 3.3% on pypi.org
fastwarc 0.15.2
A high-performance WARC parsing library for Python written in C++/Cython.
77 versions - Latest release: 8 months ago - 6 dependent packages - 5 dependent repositories - 328 thousand downloads last month - 119 stars on GitHub - 1 maintainer
Top 4.0% on pypi.org
archivebox 0.7.2 💰
Self-hosted internet archiving solution.
76 versions - Latest release: almost 2 years ago - 4 dependent repositories - 3.32 thousand downloads last month - 24,925 stars on GitHub - 1 maintainer
metawarc 1.1.1
metawarc: a command-line tool for data extraction from WARC files (web archives)
3 versions - Latest release: about 3 years ago - 1 dependent repositories - 14 downloads last month - 34 stars on GitHub - 1 maintainer
scrapy-webarchive 0.4.0
A webarchive extension for Scrapy
4 versions - Latest release: 9 months ago - 16 downloads last month - 7 stars on GitHub - 1 maintainer
archivebox-likn 0.6.3 💰
The decentralized hosted internet archive.
4 versions - Latest release: over 2 years ago - 48 downloads last month - 19,808 stars on GitHub - 1 maintainer
basc-warc 0.0.1
Create and manage WARC files. Currently in planning / pre-alpha stage.
1 version - Latest release: about 2 years ago - 3 stars on GitHub - 1 maintainer
internet-archive-extractor 0.0.7
Tool for extracting archived web sites from the Internet Archive saving as WARC files.
7 versions - Latest release: 24 days ago - 436 downloads last month - 0 stars on GitHub - 1 maintainer
warc-extractor 0.1.1
A simple tool for extracting warc files.
2 versions - Latest release: over 3 years ago - 56 downloads last month - 78 stars on GitHub - 1 maintainer
Top 2.2% on pypi.org
warcio 1.7.5 💰
Streaming WARC (and ARC) IO library
23 versions - Latest release: 11 months ago - 20 dependent packages - 150 dependent repositories - 2.23 million downloads last month - 326 stars on GitHub - 1 maintainer
archivebox-example 0.8.6rc2 💰
Self-hosted internet archiving solution.
1 version - Latest release: 12 months ago - 25,010 stars on GitHub
cdxsummary 0.1.1b5
Summarize web archive capture index (CDX) files
5 versions - Latest release: about 4 years ago - 1 dependent repositories - 70 downloads last month - 75 stars on GitHub - 1 maintainer