pypi.org "document-parser" keyword
pdf-oxide 0.3.17 💰
The fastest Python PDF library: 0.8ms mean, 5× faster than PyMuPDF. Text extraction, markdown con...26 versions - Latest release: about 10 hours ago - 3.76 thousand downloads last month - 154 stars on GitHub - 1 maintainer
Top 1.5% on pypi.org
197 versions - Latest release: 2 months ago - 113 dependent packages - 3,374 dependent repositories - 3.2 million downloads last month - 13,122 stars on GitHub - 1 maintainer
unstructured 0.18.24
A library that prepares raw documents for downstream ML tasks.197 versions - Latest release: 2 months ago - 113 dependent packages - 3,374 dependent repositories - 3.2 million downloads last month - 13,122 stars on GitHub - 1 maintainer
mseep-docling 2.64.1
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...1 version - Latest release: 3 months ago - 17 downloads last month - 55,111 stars on GitHub - 1 maintainer
docling-enhanced 2.32.0
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...1 version - Latest release: 10 months ago - 26 downloads last month - 55,111 stars on GitHub - 1 maintainer
llama-index-readers-llama-parse 0.5.1
llama-index readers llama-parse integration10 versions - Latest release: 6 months ago - 6 dependent packages - 2.59 million downloads last month - 3,956 stars on GitHub - 1 maintainer
llama-cloud-services 0.6.94
Tailored SDK clients for LlamaCloud services.93 versions - Latest release: 24 days ago - 29.3 million downloads last month - 3,956 stars on GitHub - 1 maintainer
rfdetr-doclayout 0.1.0
Add your description here1 version - Latest release: 4 months ago - 22 downloads last month - 0 stars on GitHub - 1 maintainer
opendataloader-pdf 1.10.1
A Python wrapper for the opendataloader-pdf Java CLI.45 versions - Latest release: about 1 month ago - 3.25 thousand downloads last month - 758 stars on GitHub - 1 maintainer
semantic-ai 0.0.6
Sematic AI RAG System8 versions - Latest release: over 1 year ago - 84 downloads last month - 18 stars on GitHub - 1 maintainer
docling-google-ocr 2.13.1
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...2 versions - Latest release: about 1 year ago - 52 downloads last month - 53,206 stars on GitHub - 1 maintainer
vision-parse 0.1.13
Parse PDF documents into markdown formatted content using Vision LLMs14 versions - Latest release: about 1 year ago - 1.27 thousand downloads last month - 430 stars on GitHub - 1 maintainer
unstructured-cpu 0.15.1
A library that prepares raw documents for downstream ML tasks.13 versions - Latest release: over 1 year ago - 245 downloads last month - 12,775 stars on GitHub - 1 maintainer
openparse 0.7.0
Streamlines the process of preparing documents for LLM's.17 versions - Latest release: over 1 year ago - 5.12 thousand downloads last month - 3,116 stars on GitHub - 1 maintainer
python-docparser 1.1.0
Extract text from your docx document.3 versions - Latest release: about 3 years ago - 29 downloads last month - 11 stars on GitHub - 1 maintainer
llamarker 1.0.2
A universal GenAI-based local parser for complex documents of all types.3 versions - Latest release: about 1 year ago - 39 downloads last month - 1 stars on GitHub - 1 maintainer
extended-docling 2.12.1
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...1 version - Latest release: about 1 year ago - 26 downloads last month - 42,147 stars on GitHub - 1 maintainer
langchain-document-parser 0.1.0
A document parser built with LangChain1 version - Latest release: 4 months ago
crabparser 0.1.1
🦀 Blazingly fast text parsing library with Rust backend - 10x faster than pure Python with suppor...2 versions - Latest release: 6 months ago - 24 downloads last month - 0 stars on GitHub - 1 maintainer
automated-document-parser 0.1.6
A powerful and automated document parser built with LangChain for intelligent document processing5 versions - Latest release: 4 months ago - 145 downloads last month - 0 stars on GitHub - 1 maintainer
autorag 0.3.21 💰
Automatically Evaluate RAG pipelines with your own data. Find optimal structure for new RAG product.73 versions - Latest release: 4 months ago - 3.65 thousand downloads last month - 4,395 stars on GitHub - 1 maintainer
llm-parse 0.1.5
Parse data from documents optimised for downstream llm tasks.6 versions - Latest release: 9 months ago - 41 downloads last month - 3,859 stars on GitHub - 1 maintainer
marie-ai 3.0.29
Python library to Integrate AI-powered features into your applications7 versions - Latest release: about 2 years ago - 37 downloads last month - 76 stars on GitHub - 1 maintainer
Top 4.6% on pypi.org
66 versions - Latest release: 24 days ago - 14 dependent repositories - 3.68 thousand downloads last month - 2,954 stars on GitHub - 1 maintainer
deepdoctection 1.0.7
Repository for Document AI - server/inference core package66 versions - Latest release: 24 days ago - 14 dependent repositories - 3.68 thousand downloads last month - 2,954 stars on GitHub - 1 maintainer
stg609-dots-ocr 0.0.1
dots.ocr: Multilingual Document Layout Parsing in one Vision-Language Model1 version - Latest release: 5 months ago - 42 downloads last month - 4,901 stars on GitHub - 1 maintainer
graphlit-client 1.0.20260215001
Graphlit API Python Client253 versions - Latest release: 21 days ago - 52.6 thousand downloads last month - 5 stars on GitHub - 1 maintainer
df-extract 0.0.2
DecisionFacts Extraction Library extracts content from PDF, PPTX, Docx, png, jpg., and convert as...3 versions - Latest release: over 2 years ago - 1 dependent package - 1 dependent repositories - 43 downloads last month - 14 stars on GitHub - 1 maintainer
clearedge 0.1.17
Build a RAG preprocessing pipeline18 versions - Latest release: almost 2 years ago - 56 downloads last month - 12 stars on GitHub - 1 maintainer
docstrange 1.1.8
Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, J...19 versions - Latest release: 4 months ago - 3.9 thousand downloads last month - 935 stars on GitHub - 1 maintainer
anyparser-crewai 0.0.2
Anyparser CrewAI Integration2 versions - Latest release: about 1 year ago - 34 downloads last month - 1 stars on GitHub - 1 maintainer
novalad 0.1.16
Novalad: AI-powered platform for transforming unstructured documents like PDFs and PowerPoints in...17 versions - Latest release: 6 months ago - 51 downloads last month - 17 stars on GitHub - 1 maintainer
mixedbread-ai-langchain 1.0.2
The official Mixedbread AI integration for LangChain.2 versions - Latest release: 8 months ago - 15 downloads last month - 0 stars on GitHub - 1 maintainer
charlie-dots-ocr 0.0.3
0.0.3 support aparser in async way. dots.ocr: Multilingual Document Layout Parsing in one Vision-...3 versions - Latest release: 5 months ago - 105 downloads last month - 4,901 stars on GitHub
llama-index-node-parser-docling 0.4.2
llama-index node_parser docling integration8 versions - Latest release: 3 months ago - 16.8 thousand downloads last month - 28,777 stars on GitHub - 1 maintainer
llama-index-readers-docling 0.4.2
llama-index readers docling integration9 versions - Latest release: 3 months ago - 14.5 thousand downloads last month - 27,013 stars on GitHub - 1 maintainer
Related Keywords
pdf
20
pdf-to-json
15
pdf-to-text
14
document-parsing
14
tables
11
pptx
11
ocr
11
markdown
11
docx
11
ai
9
llm
8
document
8
nlp
8
pdf-converter
7
documents
7
html
7
rag
7
pdf-to-markdown
7
langchain
6
python
6
convert
6
xlsx
6
parsing
5
machine-learning
4
docling
4
structured-data
4
layout model
4
segmentation
4
table structure
4
table former
4
retrieval-augmented-generation
4
table-detection
4
PDF
4
docx-to-markdown
3
pdf-document-processor
3
text-extraction
3
pdf-parser
3
pdf-to-excel
3
ppt-to-json
3
ppt-to-markdown
3
document-layout-analysis
3
document-image-analysis
3
deep-learning
3
layout-detection
3
table-recognition
3
embedding
2
document-ai
2
llama
2
document-understanding
2
vision-language-model
2
document parsing
2
AI
2
genai
2
NLP
2
rust
2
pytorch
2
XML
2
preprocessing
2
HTML
2
data-pipelines
2
document-image-processing
2
pubtabnet
2
publaynet
2
donut
2
information-retrieval
2
ml
2
natural-language-processing
2
CV
2
document-processing
2
python3
1
extraction
1
haystack
1
llamaindex
1
asyncio
1
jpeg
1
pdf-ocr-extraction
1
rag-pipeline
1
document-conversion
1
image-processing
1
intelligent-document-processing
1
ai-training-data
1
unstructured-alternative
1
docling-alternative
1
iwr
1
omr
1
optical-character-recognition
1
optical-mark-recognition
1
layoutlm
1
tensorflow
1
api-client
1
api-client-python
1
chatbot
1
copilot
1
graphlit
1
df
1
extract
1
content
1
ppt
1
doc
1
png
1