Ecosyste.ms: Packages
An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.
Top 0.7% downloads on pypi.org
Top 0.2% dependent packages on pypi.org
Top 1.9% dependent repos on pypi.org
Top 3.8% forks on pypi.org
Top 2.4% docker downloads on pypi.org
pypi.org : trafilatura
Python package and command-line tool designed to gather text on the Web, includes all necessary discovery and text processing components to perform web crawling, downloads, scraping, and extraction of main texts, metadata and comments.
Registry
-
Source
- Homepage
- Documentation
- JSON
purl: pkg:pypi/trafilatura
Keywords: corpus, html2text, news-crawler, natural-language-processing, scraper, tei-xml, text-extraction, webscraping, web-scraping, article-extractor, corpus-builder, corpus-tools, crawler, html-to-markdown, news, news-aggregator, nlp, readability, rss-feed, scraping, tei, text-cleaning, text-mining, text-preprocessing
License: Apache-2.0
Latest release: about 1 month ago
First release: almost 5 years ago
Dependent packages: 71
Dependent repositories: 63
Downloads: 434,104 last month
Stars: 2,965 on GitHub
Forks: 228 on GitHub
Docker dependents: 13
Docker downloads: 9,651
Total Commits: 1395
Committers: 39
Average commits per author: 35.769
Development Distribution Score (DDS): 0.102
More commit stats: commits.ecosyste.ms
See more repository details: repos.ecosyste.ms
Funding links: https://ko-fi.com/adbarbaresi
Last synced: about 7 hours ago
opsci-toolbox 0.0.3
a complete toolbox4 versions - Latest release: 27 days ago - 321 downloads last month - 1 maintainer
thirdai 0.8.2
A faster cpu machine learning library101 versions - Latest release: 27 days ago - 1 dependent repositories - 11.1 thousand downloads last month - 2 maintainers
langroid 0.1.245 💰
Harness LLMs with Multi-Agent Programming237 versions - Latest release: 27 days ago - 1 dependent repositories - 7.57 thousand downloads last month - 1,779 stars on GitHub - 1 maintainer
griptape 0.25.1
Modular Python framework for LLM workflows, tools, memory, and data.70 versions - Latest release: 28 days ago - 4 dependent packages - 5 dependent repositories - 3.93 thousand downloads last month - 1,668 stars on GitHub - 6 maintainers
marvin 2.3.4
A lightweight AI engineering toolkit for building natural language interfaces that are reliable, ...62 versions - Latest release: 28 days ago - 14 dependent packages - 22 dependent repositories - 13 thousand downloads last month - 4,705 stars on GitHub - 2 maintainers
edubot 0.7.6
Basic Edubot module26 versions - Latest release: 29 days ago - 1 dependent package - 1 dependent repositories - 318 downloads last month - 4 stars on GitHub - 2 maintainers
scrapework 0.5.4
simple scraping framework26 versions - Latest release: about 1 month ago - 161 downloads last month - 1 stars on GitHub - 1 maintainer
mediacloud-metadata 1.0.1
Media Cloud news article metadata extraction39 versions - Latest release: about 1 month ago - 1 dependent package - 4 dependent repositories - 316 downloads last month - 11 stars on GitHub - 2 maintainers
minet 2.0.3
A webmining CLI tool & library for python.264 versions - Latest release: about 2 months ago - 1 dependent package - 6 dependent repositories - 8.39 thousand downloads last month - 251 stars on GitHub - 1 maintainer
datatrove 0.2.0
HuggingFace library to process and filter large amounts of webdata3 versions - Latest release: about 2 months ago - 11.7 thousand downloads last month - 1,632 stars on GitHub - 3 maintainers
testing-datatrove 4.0.1 removed
HuggingFace library to process and filter large amounts of webdata1 version - Latest release: about 2 months ago - 1 maintainer
dolma 1.0.3
Data filters17 versions - Latest release: 2 months ago - 17.7 thousand downloads last month - 800 stars on GitHub - 2 maintainers
dataverse 1.0.5
An open-source simplifies ETL workflow with Python based on Spark14 versions - Latest release: 2 months ago - 2 dependent repositories - 356 downloads last month - 1 maintainer
hollarek 0.9.1 removed
Collection of general python utilities for future projects10 versions - Latest release: 3 months ago - 1 dependent package - 52 downloads last month - 0 stars on GitHub - 1 maintainer
contentmap 0.4.0
5 versions - Latest release: 3 months ago - 71 downloads last month - 1 maintaineropendatagen 0.0.35
Data preparation system to build controllable AI system33 versions - Latest release: 4 months ago - 201 downloads last month - 15 stars on GitHub - 1 maintainer
delphai-ml-utils 1.0.17
A Python package to manage delphai machine learning operations.17 versions - Latest release: 4 months ago - 1 dependent repositories - 144 downloads last month - 1 maintainer
yvestest 0.1.3
An open-source simplifies ETL workflow with Python based on Spark5 versions - Latest release: 4 months ago - 81 downloads last month - 1 maintainer
docop-tasks-restricted 0.3.3
Tasks for docop that have more restrictive open source licensing4 versions - Latest release: 4 months ago - 26 downloads last month - 0 stars on GitHub - 1 maintainer
obsei 0.0.15
Obsei is an automation tool for text analysis need14 versions - Latest release: 5 months ago - 3 dependent repositories - 138 downloads last month - 1,146 stars on GitHub - 1 maintainer
wasc 1.1.0
Web Accessibility Simple Checker12 versions - Latest release: 6 months ago - 34 downloads last month - 2 stars on GitHub - 1 maintainer
maincontentextractor 0.0.4
A library to extract the main content from html. Developed for information on LLM and for feeding...4 versions - Latest release: 6 months ago - 694 downloads last month - 4 stars on GitHub - 1 maintainer
openams 0.1.24
2 versions - Latest release: 6 months ago - 18 downloads last month - 1 maintainernextpy-ai 0.1.24
1 version - Latest release: 6 months ago - 10 downloads last month - 1 maintaineragentdb 0.1.23
1 version - Latest release: 7 months ago - 30 downloads last month - 1 maintainercode-context 0.1.22
1 version - Latest release: 7 months ago - 15 downloads last month - 1 maintaineragent-context 0.1.22
1 version - Latest release: 7 months ago - 28 downloads last month - 1 maintaineropenagent-py 0.1.22
2 versions - Latest release: 7 months ago - 32 downloads last month - 2 maintainersopenagent-dev 0.2.1
Web apps in pure Python with all the flexibility and speed of nextjs.2 versions - Latest release: 7 months ago - 18 downloads last month - 2,124 stars on GitHub - 1 maintainer
agent.ngo 0.1.22
1 version - Latest release: 7 months ago - 27 downloads last month - 1 maintainerincognitogpt 0.1.22
1 version - Latest release: 7 months ago - 14 downloads last month - 1 maintaineropenlora 0.1.22
1 version - Latest release: 7 months ago - 13 downloads last month - 1 maintainerllm-server 0.1.22
1 version - Latest release: 7 months ago - 12 downloads last month - 1 maintainerllmproxy 0.1.22
1 version - Latest release: 7 months ago - 22 downloads last month - 1 maintainernextapi 0.1.22
1 version - Latest release: 7 months ago - 10 downloads last month - 1 maintaineragent-system 0.1.22
1 version - Latest release: 7 months ago - 13 downloads last month - 1 maintainerappnext 0.1.22
1 version - Latest release: 7 months ago - 7 downloads last month - 1 maintainernext-llm 0.1.22
1 version - Latest release: 7 months ago - 11 downloads last month - 1 maintainernext-ams 0.1.22
1 version - Latest release: 7 months ago - 10 downloads last month - 1 maintainernamas 0.1.22
1 version - Latest release: 7 months ago - 23 downloads last month - 1 maintainerdotnext 0.1.22
1 version - Latest release: 7 months ago - 15 downloads last month - 1 maintainernextagent 0.1.22
1 version - Latest release: 7 months ago - 13 downloads last month - 1 maintainerauto-ams 0.1.22
1 version - Latest release: 7 months ago - 9 downloads last month - 1 maintaineropen-ams 0.1.22 removed
1 version - Latest release: 7 months ago - 1 maintainercodegraph-agent 0.1.22
1 version - Latest release: 7 months ago - 14 downloads last month - 1 maintainerdotagent 0.1.211
9 versions - Latest release: 7 months ago - 2 dependent repositories - 63 downloads last month - 2 maintainersdotagent-dev 0.1.7
5 versions - Latest release: 8 months ago - 36 downloads last month - 1 maintainergalactic-ai 0.2.16
Curate, annotate, and clean massive unstructured text datasets for machine learning and AI systems.27 versions - Latest release: 8 months ago - 213 downloads last month - 305 stars on GitHub - 1 maintainer
python-switch-case-nguyenquoctuan02011992-2 0.0.1 removed
A sample python package to start sharing your code with the world1 version - Latest release: 8 months ago - 0 stars on GitHub - 1 maintainer
ams-core 0.1.0
1 version - Latest release: 8 months ago - 12 downloads last month - 1 maintainerams-python 0.1.0
1 version - Latest release: 8 months ago - 9 downloads last month - 1 maintainerdotams 0.1.0
1 version - Latest release: 8 months ago - 4 downloads last month - 1 maintaineragent-management-system 0.1.0
1 version - Latest release: 8 months ago - 32 downloads last month - 1 maintaineragentvm 0.1.0
1 version - Latest release: 9 months ago - 13 downloads last month - 2 maintainersagentbox 0.1.0
1 version - Latest release: 9 months ago - 12 downloads last month - 1 maintaineragent-cloud 0.1.0
1 version - Latest release: 9 months ago - 25 downloads last month - 1 maintaineragent-cloud-os 0.1.0
1 version - Latest release: 9 months ago - 18 downloads last month - 1 maintaineragent-vm 0.1.0 removed
1 version - Latest release: 9 months ago - 1 maintaineropenagentos 0.1.0
1 version - Latest release: 9 months ago - 7 downloads last month - 1 maintainerdeva 1.2.3
data eval in future33 versions - Latest release: 9 months ago - 1 dependent repositories - 196 downloads last month - 9 stars on GitHub - 1 maintainer
opencopilot-ai 0.3.8
OpenCopilot Backend11 versions - Latest release: 9 months ago - 3 dependent repositories - 35 downloads last month - 532 stars on GitHub - 1 maintainer
aicompleter 0.0.1rc5 removed
Interactive AI program framework for Python2 versions - Latest release: 10 months ago - 79 downloads last month - 1 maintainer
atradebot 0.1.0
atradebot package1 version - Latest release: 10 months ago - 31 downloads last month - 3,835 stars on GitHub - 1 maintainer
oneai 0.9.89
NLP as a Service119 versions - Latest release: 10 months ago - 2 dependent repositories - 1.63 thousand downloads last month - 34 stars on GitHub - 1 maintainer
genia 0.4.0
Your Engineering Gen AI Team member 🧬🤖💻3 versions - Latest release: 10 months ago - 21 downloads last month - 344 stars on GitHub - 1 maintainer
readthis 0.1.1
readthis - A command line tool to read a text file aloud9 versions - Latest release: 11 months ago - 90 downloads last month - 4 stars on GitHub - 1 maintainer
newsfeedback 0.1.0
Tool for extracting and saving news article metadata at regular intervals.1 version - Latest release: about 1 year ago - 10 downloads last month - 1 maintainer
skatepark-lib 0.10.0
Python framework for AI workflows and pipelines.4 versions - Latest release: about 1 year ago - 14 downloads last month - 1,547 stars on GitHub - 1 maintainer
nlp-toolbox 0.0.3 💰
Natural Language Processing Tools3 versions - Latest release: about 1 year ago - 17 downloads last month - 2 stars on GitHub - 1 maintainer
ur-gadget 0.0.4 💰
Useful gadgets for your python projects4 versions - Latest release: over 1 year ago - 3 dependent packages - 1 dependent repositories - 67 downloads last month - 5 stars on GitHub - 1 maintainer
pydata-master 0.0.7 💰
All frequently used functions in one package for the data operation in a daily basis.5 versions - Latest release: over 1 year ago - 1 dependent repositories - 18 downloads last month - 0 stars on GitHub - 1 maintainer