An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "text extraction" keyword

View the packages on the pypi.org package registry that are tagged with the "text extraction" keyword.

bocr 0.2.0
A Python package for OCR using Vision LLMs
4 versions - Latest release: 7 months ago - 35 downloads last month - 0 stars on GitHub - 1 maintainer
flashgeotext 1.0.0
Extract and count countries and cities (+their synonyms) from text
15 versions - Latest release: 2 months ago - 2 dependent repositories - 2.06 thousand downloads last month - 57 stars on GitHub - 1 maintainer
pysin 1.6.1
PySin is a toolbox for text retrieval in unstructured documents datasets. It contains both a mult...
16 versions - Latest release: about 5 years ago - 21 downloads last month - 1 maintainer
easyocr-itgn 1.2.3
Modified Easyorc By IntoThatGoodNight
3 versions - Latest release: about 2 years ago - 38 downloads last month - 20,429 stars on GitHub - 1 maintainer
parsethisio 0.2.3
A Python library to extract text from various sources for LLM preprocessing.
11 versions - Latest release: about 2 months ago - 65 downloads last month - 1 maintainer
rainbow-pdf-processor 0.1.0
A powerful PDF processing tool with text extraction, table recognition, and image extraction capa...
1 version - Latest release: 5 months ago - 11 downloads last month - 0 stars on GitHub - 1 maintainer
picturetextcrop 0.6.1
Interactive extraction of selected text from images and batch processing of stored image files.
3 versions - Latest release: over 1 year ago - 15 downloads last month - 0 stars on GitHub - 1 maintainer
spectrepdf 0.2.1
A tool for processing and redacting PDFs based on target words using OCR.
3 versions - Latest release: about 2 months ago - 32 downloads last month - 1 stars on GitHub - 1 maintainer
pdf-word-counter 0.3.3
Search PDF files for specific words and generate frequency statistics.
4 versions - Latest release: 4 months ago - 19 downloads last month - 0 stars on GitHub - 1 maintainer
hotpdf 0.5.2
Fast PDF Data Extraction library
27 versions - Latest release: over 1 year ago - 617 downloads last month - 196 stars on GitHub - 1 maintainer
codebase-to-text 1.0.7
A Python package to convert codebase to text
9 versions - Latest release: 12 months ago - 387 downloads last month - 80 stars on GitHub - 1 maintainer
paper2txt 0.0.4
Simple tool to extract text from scientific PDFs
4 versions - Latest release: over 2 years ago - 1 dependent package - 1 dependent repositories - 25 downloads last month - 0 stars on GitHub - 1 maintainer
text-extra 0.1.4
A simple tool for text extraction from pdf, epub, txt, and docx files
3 versions - Latest release: over 1 year ago - 21 downloads last month - 2 stars on GitHub - 1 maintainer
pyshotter 1.0.0
PyShotter: Smart, annotated, and shareable screenshots for Python.
2 versions - Latest release: about 2 months ago - 0 stars on GitHub - 1 maintainer
Top 4.2% on pypi.org
boilerpy3 1.0.7
Python port of Boilerpipe, for HTML boilerplate removal and text extraction
7 versions - Latest release: almost 2 years ago - 12 dependent packages - 25 dependent repositories - 283 thousand downloads last month - 88 stars on GitHub - 1 maintainer
unstructured-platform 0.4.3
Python SDK for the Unstructured Platform API
4 versions - Latest release: 8 months ago - 22 downloads last month - 1 maintainer
gotext 0.9.5
GoText is a universal text extraction and preprocessing tool for python which supportss wide vari...
2 versions - Latest release: over 3 years ago - 1 dependent repositories - 8 downloads last month - 0 stars on GitHub - 1 maintainer
ripit 1.0.2
Python port of Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages
2 versions - Latest release: 12 months ago - 21 downloads last month - 1 stars on GitHub - 1 maintainer
doc23 0.1.2
Powerful Python library to convert documents (PDF, DOCX, TXT) into structured JSON trees for lega...
3 versions - Latest release: 4 months ago - 141 downloads last month - 0 stars on GitHub - 1 maintainer
webpage-to-text 0.1.0
LlamaIndex-powered web content extractor for RAG applications
1 version - Latest release: 2 months ago - 1 maintainer
textextraction 0.1.4
Extract and process text from images and PDFs
5 versions - Latest release: 5 months ago - 37 downloads last month - 0 stars on GitHub - 1 maintainer
ocr-json-processor 0.1.0 removed
A Python package for OCR response processing and JSON updates.
1 version - Latest release: 7 months ago - 1 maintainer
unstructured-platform-sdk 0.1.0 removed
Python SDK for the Unstructured Platform API
1 version - Latest release: 8 months ago - 1 maintainer