Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "text-extraction" keyword

Top 1.7% on pypi.org
trafilatura 1.9.0 💰
Python package and command-line tool designed to gather text on the Web, includes all necessary d...
44 versions - Latest release: 14 days ago - 71 dependent packages - 63 dependent repositories - 476 thousand downloads last month - 2,688 stars on GitHub - 1 maintainer
Top 1.2% on pypi.org
tika 2.6.0 💰
Apache Tika Python library
35 versions - Latest release: over 1 year ago - 33 dependent packages - 528 dependent repositories - 351 thousand downloads last month - 1,420 stars on GitHub - 1 maintainer
Top 1.6% on pypi.org
sumy 0.11.0 💰
Module for automatic summarization of text documents and HTML pages.
16 versions - Latest release: over 1 year ago - 9 dependent packages - 413 dependent repositories - 379 thousand downloads last month - 3,421 stars on GitHub - 1 maintainer
Top 2.1% on pypi.org
srt 3.5.3
A tiny library for parsing, modifying, and composing SRT files.
30 versions - Latest release: about 1 year ago - 26 dependent packages - 205 dependent repositories - 93.4 thousand downloads last month - 422 stars on GitHub - 1 maintainer
Top 6.2% on pypi.org
slate3k 0.5.3
Extract text from PDF documents easily.
1 version - Latest release: about 7 years ago - 2 dependent packages - 29 dependent repositories - 20.6 thousand downloads last month - 30 stars on GitHub - 1 maintainer
Top 2.7% on pypi.org
justext 3.0.1 💰
Heuristic based boilerplate removal tool
7 versions - Latest release: 7 days ago - 7 dependent packages - 43 dependent repositories - 538 thousand downloads last month - 682 stars on GitHub - 1 maintainer
Top 4.2% on pypi.org
boilerpy3 1.0.7
Python port of Boilerpipe, for HTML boilerplate removal and text extraction
7 versions - Latest release: 7 months ago - 12 dependent packages - 25 dependent repositories - 256 thousand downloads last month - 70 stars on GitHub - 1 maintainer
yt2text 1.0.2
Extract text from a YouTube video in a single command, using OpenAi's Whisper speech recognition ...
3 versions - Latest release: 7 months ago - 26 downloads last month - 1 stars on GitHub - 1 maintainer
balena-cpu 1.0.0 💰
BALanced Execution through Natural Activation : a human-computer interaction methodology for code...
1 version - Latest release: 4 months ago - 35 downloads last month - 5 stars on GitHub - 1 maintainer
yirabot 1.0.9
YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, o...
20 versions - Latest release: 2 months ago - 172 downloads last month - 11 stars on GitHub - 1 maintainer
deboiler 2023.46.150
Deboiler is an open-source package to clean HTML pages across an entire domain
2 versions - Latest release: 6 months ago - 248 downloads last month - 7 stars on GitHub - 1 maintainer
fundus 0.3.1
A very simple news crawler
6 versions - Latest release: 3 days ago - 800 downloads last month - 38 stars on GitHub - 1 maintainer
ammico 0.2.0
AI Media and Misinformation Content Analysis Tool
2 versions - Latest release: 8 months ago - 22 downloads last month - 4 stars on GitHub - 1 maintainer
arachnio 0.0.0
Client library for interacting with Arachnio API
1 version - Latest release: about 1 year ago - 11 downloads last month - 0 stars on GitHub - 1 maintainer
wikipedia-ner 0.0.24
Python package for creating labeled examples from wiki dumps
22 versions - Latest release: about 9 years ago - 3 dependent repositories - 93 downloads last month - 68 stars on GitHub - 1 maintainer
wagtail-textract 1.2
Allow searching for text in Documents in the Wagtail content management system
8 versions - Latest release: over 4 years ago - 1 dependent repositories - 52 downloads last month - 31 stars on GitHub - 2 maintainers
util-ds 0.5.3 💰
This project is a convenient part of the NLP project, including several already exposed projects ...
22 versions - Latest release: almost 2 years ago - 1 dependent repositories - 31 downloads last month - 3,421 stars on GitHub - 1 maintainer
pdf-layout-scanner 1.3.3
A more complete example of programming with PDFMiner, which continues where the default documenta...
7 versions - Latest release: over 4 years ago - 1 dependent repositories - 78 downloads last month - 8 stars on GitHub - 1 maintainer
newsman 1.1.0
A tool for web news scraping.
2 versions - Latest release: over 4 years ago - 1 dependent repositories - 18 downloads last month - 0 stars on GitHub - 1 maintainer
Top 6.5% on pypi.org
mobi 0.3.3
unpack unencrypted mobi files
6 versions - Latest release: about 2 years ago - 8 dependent packages - 13 dependent repositories - 1.6 thousand downloads last month - 55 stars on GitHub - 1 maintainer
hnlp 0.0.1
Humanly Deeplearning NLP.
2 versions - Latest release: almost 4 years ago - 1 dependent repositories - 14 downloads last month - 27 stars on GitHub - 1 maintainer
extracteur-de-fou-malade-pour-charles-le-charlo 0.0.1 💰
PDF data parser
1 version - Latest release: over 3 years ago - 1 dependent repositories - 15 downloads last month - 3,421 stars on GitHub - 1 maintainer
Top 3.4% on pypi.org
breadability 0.1.20
Port of Readability HTML parser in Python
21 versions - Latest release: about 10 years ago - 2 dependent packages - 212 dependent repositories - 225 thousand downloads last month - 203 stars on GitHub - 1 maintainer
articleparse 0.2.1 💰
Heuristic text extraction from news articles
3 versions - Latest release: over 6 years ago - 12 downloads last month - 9 stars on GitHub - 1 maintainer
galeodes 0.7 💰
Browsers options
7 versions - Latest release: about 2 years ago - 8 dependent repositories - 1.91 thousand downloads last month - 0 stars on GitHub - 1 maintainer
pd3f 0.4.0
Reconstruct the original continuous text from PDFs with language models
5 versions - Latest release: about 3 years ago - 1 dependent repositories - 68 downloads last month - 32 stars on GitHub - 1 maintainer
aiopytesseract 0.14.0 💰
asyncio tesseract wrapper for Tesseract-OCR
15 versions - Latest release: 3 months ago - 1 dependent repositories - 468 downloads last month - 15 stars on GitHub - 1 maintainer
apple-ocr 1.0.8 💰
An OCR (Optical Character Recognition) utility for text extraction from images.
9 versions - Latest release: 4 months ago - 99 downloads last month - 68 stars on GitHub - 1 maintainer
Top 5.2% on pypi.org
slate 0.5.2 💰
Extract text from PDF documents easily.
4 versions - Latest release: 9 months ago - 46 dependent repositories - 1.28 thousand downloads last month - 422 stars on GitHub - 1 maintainer
pnlp 0.4.10
A pre/post-processing tool for NLP.
23 versions - Latest release: 4 months ago - 1 dependent package - 2 dependent repositories - 170 downloads last month - 27 stars on GitHub - 1 maintainer
hotpdf 0.5.2
Fast PDF Data Extraction library
27 versions - Latest release: 3 months ago - 1.89 thousand downloads last month - 164 stars on GitHub - 1 maintainer