Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

Top 1.7% on pypi.org
Top 0.7% downloads on pypi.org
Top 0.2% dependent packages on pypi.org
Top 1.9% dependent repos on pypi.org
Top 3.8% forks on pypi.org
Top 2.4% docker downloads on pypi.org

pypi.org : trafilatura

Python package and command-line tool designed to gather text on the Web, includes all necessary discovery and text processing components to perform web crawling, downloads, scraping, and extraction of main texts, metadata and comments.

Registry - Source - Homepage - Documentation - JSON
purl: pkg:pypi/trafilatura
Keywords: corpus, html2text, news-crawler, natural-language-processing, scraper, tei-xml, text-extraction, webscraping, web-scraping, article-extractor, corpus-builder, corpus-tools, crawler, html-to-markdown, news, news-aggregator, nlp, readability, rss-feed, scraping, tei, text-cleaning, text-mining, text-preprocessing
License: Apache-2.0
Latest release: 18 days ago
First release: almost 5 years ago
Dependent packages: 71
Dependent repositories: 63
Downloads: 476,459 last month
Stars: 2,688 on GitHub
Forks: 205 on GitHub
Docker dependents: 13
Docker downloads: 9,651
Total Commits: 1395
Committers: 39
Average commits per author: 35.769
Development Distribution Score (DDS): 0.102
More commit stats: commits.ecosyste.ms
See more repository details: repos.ecosyste.ms
Funding links: https://ko-fi.com/adbarbaresi
Last synced: 9 days ago

    Loading...
    Readme
    Loading...