pypi.org "document-parsing" keyword
View the packages on the pypi.org package registry that are tagged with the "document-parsing" keyword.
unstructured-cpu 0.15.1
A library that prepares raw documents for downstream ML tasks.13 versions - Latest release: 8 months ago - 368 downloads last month - 10,877 stars on GitHub - 1 maintainer
llama-cloud-services 0.6.12
Tailored SDK clients for LlamaCloud services.13 versions - Latest release: 8 days ago - 2.54 million downloads last month - 3,878 stars on GitHub - 1 maintainer
flexidata 0.0.19
FlexiData is an open-source Python package designed for processing unstructured data.19 versions - Latest release: 11 months ago - 734 downloads last month - 0 stars on GitHub - 1 maintainer
llama-index-readers-docling 0.3.2
llama-index readers docling integration5 versions - Latest release: about 1 month ago - 5.73 thousand downloads last month - 27,013 stars on GitHub - 1 maintainer
contextgem 0.1.1
Easier and faster way to build LLM extraction workflows through powerful abstractions3 versions - Latest release: 12 days ago - 338 downloads last month - 35 stars on GitHub - 1 maintainer
llm-parse 0.1.4
Parse data from documents optimised for downstream llm tasks.5 versions - Latest release: 7 months ago - 258 downloads last month - 3,859 stars on GitHub - 1 maintainer
extended-docling 2.12.1
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...1 version - Latest release: 4 months ago - 74 downloads last month - 26,056 stars on GitHub - 1 maintainer
docling-google-ocr 2.13.1
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...2 versions - Latest release: 3 months ago - 117 downloads last month - 26,056 stars on GitHub - 1 maintainer
llama-index-node-parser-docling 0.3.1
llama-index node_parser docling integration4 versions - Latest release: 2 months ago - 3.86 thousand downloads last month - 26,056 stars on GitHub - 1 maintainer
llama-index-readers-llama-parse 0.4.0
llama-index readers llama-parse integration8 versions - Latest release: 5 months ago - 6 dependent packages - 2.04 million downloads last month - 3,827 stars on GitHub - 1 maintainer
tikara 0.1.6
The metadata and text content extractor for almost every file type.6 versions - Latest release: 3 months ago - 214 downloads last month - 1 stars on GitHub - 1 maintainer
anyparser-crewai 0.0.2
Anyparser CrewAI Integration2 versions - Latest release: 2 months ago - 112 downloads last month - 1 stars on GitHub - 1 maintainer
pdf-bank-statement-parser 0.1.1
Command-line tool for converting PDF bank statements into CSV2 versions - Latest release: 6 months ago - 94 downloads last month - 1 stars on GitHub - 1 maintainer
pyreparse 0.0.3
Build Efficient RegExp Parsing Engines.1 version - Latest release: over 2 years ago - 51 downloads last month - 0 stars on GitHub - 1 maintainer
Top 1.5% on pypi.org
181 versions - Latest release: about 1 month ago - 113 dependent packages - 3,374 dependent repositories - 2.7 million downloads last month - 9,368 stars on GitHub - 1 maintainer
unstructured 0.17.0
A library that prepares raw documents for downstream ML tasks.181 versions - Latest release: about 1 month ago - 113 dependent packages - 3,374 dependent repositories - 2.7 million downloads last month - 9,368 stars on GitHub - 1 maintainer
Related Keywords
pdf
11
pdf-to-text
10
document-parser
10
pdf-to-json
9
tables
7
docx
7
pptx
7
document
6
structured-data
5
parsing
5
ai
5
PDF
4
llm
4
xlsx
4
convert
4
documents
4
html
4
markdown
4
pdf-converter
4
ppt-to-json
3
ppt-to-markdown
3
pdf-to-markdown
3
pdf-to-excel
3
pdf-document-processor
3
docx-to-markdown
3
ocr
3
nlp
3
natural-language-processing
3
ml
3
machine-learning
3
document-intelligence
2
data-extraction
2
information-extraction
2
CV
2
document-understanding
2
text-processing
2
content-extraction
2
unstructured-data
2
docling
2
layout model
2
segmentation
2
table structure
2
table former
2
pdf-parsing
2
retrieval-augmented-generation
2
python
2
NLP
2
HTML
2
document-image-processing
2
donut
2
information-retrieval
2
langchain
2
document-image-analysis
2
deep-learning
2
data-pipelines
2
preprocessing
2
XML
2
document-processing
2
word-documents
1
tika
1
text-recognition
1
text-parsing
1
text-mining
1
text-extraction
1
report-parsing
1
text-analytics
1
powerpoint
1
office-documents
1
mime-type
1
metadata
1
language-detection
1
image-extraction
1
format-identification
1
format-detection
1
file-type
1
file-reader
1
file-processing
1
file-parsing
1
file-identification
1
file-format
1
file-conversion
1
file-analysis
1
excel
1
regexp-group
1
regexp
1
python3
1
formatted-text-parsing
1
pdf-parser
1
fnb
1
first-national-bank
1
financial-analysis
1
banking
1
bank
1
typescript
1
rag
1
knowledge-graph
1
kag
1
crewai-rag
1
crewai
1
crew-ai-rag
1