pypi.org "text extraction" keyword
View the packages on the pypi.org package registry that are tagged with the "text extraction" keyword.
bocr 0.2.0
A Python package for OCR using Vision LLMs4 versions - Latest release: 7 months ago - 35 downloads last month - 0 stars on GitHub - 1 maintainer
flashgeotext 1.0.0
Extract and count countries and cities (+their synonyms) from text15 versions - Latest release: 2 months ago - 2 dependent repositories - 2.06 thousand downloads last month - 57 stars on GitHub - 1 maintainer
pysin 1.6.1
PySin is a toolbox for text retrieval in unstructured documents datasets. It contains both a mult...16 versions - Latest release: about 5 years ago - 21 downloads last month - 1 maintainer
easyocr-itgn 1.2.3
Modified Easyorc By IntoThatGoodNight3 versions - Latest release: about 2 years ago - 38 downloads last month - 20,429 stars on GitHub - 1 maintainer
parsethisio 0.2.3
A Python library to extract text from various sources for LLM preprocessing.11 versions - Latest release: about 2 months ago - 65 downloads last month - 1 maintainer
rainbow-pdf-processor 0.1.0
A powerful PDF processing tool with text extraction, table recognition, and image extraction capa...1 version - Latest release: 5 months ago - 11 downloads last month - 0 stars on GitHub - 1 maintainer
picturetextcrop 0.6.1
Interactive extraction of selected text from images and batch processing of stored image files.3 versions - Latest release: over 1 year ago - 15 downloads last month - 0 stars on GitHub - 1 maintainer
spectrepdf 0.2.1
A tool for processing and redacting PDFs based on target words using OCR.3 versions - Latest release: about 2 months ago - 32 downloads last month - 1 stars on GitHub - 1 maintainer
pdf-word-counter 0.3.3
Search PDF files for specific words and generate frequency statistics.4 versions - Latest release: 4 months ago - 19 downloads last month - 0 stars on GitHub - 1 maintainer
hotpdf 0.5.2
Fast PDF Data Extraction library27 versions - Latest release: over 1 year ago - 617 downloads last month - 196 stars on GitHub - 1 maintainer
codebase-to-text 1.0.7
A Python package to convert codebase to text9 versions - Latest release: 12 months ago - 387 downloads last month - 80 stars on GitHub - 1 maintainer
paper2txt 0.0.4
Simple tool to extract text from scientific PDFs4 versions - Latest release: over 2 years ago - 1 dependent package - 1 dependent repositories - 25 downloads last month - 0 stars on GitHub - 1 maintainer
text-extra 0.1.4
A simple tool for text extraction from pdf, epub, txt, and docx files3 versions - Latest release: over 1 year ago - 21 downloads last month - 2 stars on GitHub - 1 maintainer
pyshotter 1.0.0
PyShotter: Smart, annotated, and shareable screenshots for Python.2 versions - Latest release: about 2 months ago - 0 stars on GitHub - 1 maintainer
Top 4.2% on pypi.org
7 versions - Latest release: almost 2 years ago - 12 dependent packages - 25 dependent repositories - 283 thousand downloads last month - 88 stars on GitHub - 1 maintainer
boilerpy3 1.0.7
Python port of Boilerpipe, for HTML boilerplate removal and text extraction7 versions - Latest release: almost 2 years ago - 12 dependent packages - 25 dependent repositories - 283 thousand downloads last month - 88 stars on GitHub - 1 maintainer
unstructured-platform 0.4.3
Python SDK for the Unstructured Platform API4 versions - Latest release: 8 months ago - 22 downloads last month - 1 maintainer
gotext 0.9.5
GoText is a universal text extraction and preprocessing tool for python which supportss wide vari...2 versions - Latest release: over 3 years ago - 1 dependent repositories - 8 downloads last month - 0 stars on GitHub - 1 maintainer
ripit 1.0.2
Python port of Boilerpipe, Boilerplate Removal and Fulltext Extraction from HTML pages2 versions - Latest release: 12 months ago - 21 downloads last month - 1 stars on GitHub - 1 maintainer
doc23 0.1.2
Powerful Python library to convert documents (PDF, DOCX, TXT) into structured JSON trees for lega...3 versions - Latest release: 4 months ago - 141 downloads last month - 0 stars on GitHub - 1 maintainer
webpage-to-text 0.1.0
LlamaIndex-powered web content extractor for RAG applications1 version - Latest release: 2 months ago - 1 maintainer
textextraction 0.1.4
Extract and process text from images and PDFs5 versions - Latest release: 5 months ago - 37 downloads last month - 0 stars on GitHub - 1 maintainer
ocr-json-processor 0.1.0 removed
A Python package for OCR response processing and JSON updates.1 version - Latest release: 7 months ago - 1 maintainer
unstructured-platform-sdk 0.1.0 removed
Python SDK for the Unstructured Platform API1 version - Latest release: 8 months ago - 1 maintainer
Related Keywords
pdf
8
ocr
6
python
6
text-extraction
3
full text extraction
2
html text extraction
2
api
2
documents
2
boilerpy
2
sdk
2
unstructured
2
document processing
2
text
2
image processing
2
boilerpipe
2
screenshot
2
redaction
2
OCR
2
nlp
2
confidence scores
1
Python package
1
GitHub repository
1
command-line tool
1
code analysis
1
sharing
1
file parsing
1
code documentation
1
formatting preservation
1
readability
1
screen capture
1
panorama
1
paper
1
hotkeys
1
epub
1
docx
1
history
1
txt
1
cross-platform
1
change detection
1
annotation
1
JSON
1
table detection
1
machine learning
1
RAG
1
llama-index
1
web scraping
1
open-source
1
json
1
honduras
1
document-parsing
1
AI
1
structure
1
PDF to JSON
1
NLP
1
legaltech
1
document parsing
1
text-preprocessing
1
similarity-score
1
datacleaning
1
data-preprocessing
1
text utils
1
document extraction
1
text preprocessing
1
html-text-extraction
1
full-text-extraction
1
smart detection
1
lstm
1
information-retrieval
1
image-processing
1
easyocr
1
deep-learning
1
data-mining
1
crnn
1
cnn
1
character recognition
1
medical
1
dataset generator
1
search engine
1
text retrieval
1
arkhn
1
search-in-text
1
search
1
named-entity-extraction
1
geotext
1
flashtext
1
geonames
1
ollama
1
phi-vision
1
llama-vision
1
qwen-vl
1
bocr
1
vllm
1
llm
1
vision
1
document conversion
1
file contents
1
folder structure
1
text conversion
1
code conversion
1
codebase
1