pypi.org "document-parsing" keyword
View the packages on the pypi.org package registry that are tagged with the "document-parsing" keyword.
docling-enhanced 2.32.0
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...1 version - Latest release: 5 months ago - 41 downloads last month - 40,160 stars on GitHub - 1 maintainer
contextgem 0.19.2
Effortless LLM extraction from documents40 versions - Latest release: 2 days ago - 2.93 thousand downloads last month - 1,511 stars on GitHub - 1 maintainer
langchain-opendataloader-pdf 0.0.1
A LangChain integration for OpenDataLoader PDF1 version - Latest release: 2 days ago - 2 stars on GitHub
opendataloader-pdf 1.1.0
A Python wrapper for the opendataloader-pdf Java CLI.21 versions - Latest release: 2 days ago - 2.02 thousand downloads last month - 650 stars on GitHub - 1 maintainer
docstrange 1.1.6
Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, J...17 versions - Latest release: 22 days ago - 2.23 thousand downloads last month - 625 stars on GitHub - 1 maintainer
je-paddleocr 2.9.1
Awesome OCR toolkits based on PaddlePaddle(8.6M ultra-lightweight pre-trained model, support trai...1 version - Latest release: 7 months ago - 324 downloads last month - 54,973 stars on GitHub - 1 maintainer
ppocrlabel-japan 0.0.2
PPOCRLabelv2 is a semi-automatic graphic annotation tool suitable for OCR field, with built-in PP...2 versions - Latest release: over 2 years ago - 26 downloads last month - 56,061 stars on GitHub - 1 maintainer
paddleocrwordleveldetection 2.6.1.0
Awesome OCR toolkits based on PaddlePaddle (8.6M ultra-lightweight pre-trained model, support tra...1 version - Latest release: over 2 years ago - 17 downloads last month - 56,061 stars on GitHub - 1 maintainer
Top 1.5% on pypi.org
193 versions - Latest release: 15 days ago - 113 dependent packages - 3,374 dependent repositories - 3.94 million downloads last month - 12,775 stars on GitHub - 1 maintainer
unstructured 0.18.15
A library that prepares raw documents for downstream ML tasks.193 versions - Latest release: 15 days ago - 113 dependent packages - 3,374 dependent repositories - 3.94 million downloads last month - 12,775 stars on GitHub - 1 maintainer
unstructured-cpu 0.15.1
A library that prepares raw documents for downstream ML tasks.13 versions - Latest release: about 1 year ago - 73 downloads last month - 12,775 stars on GitHub - 1 maintainer
doculyzer 0.41.0
Universal, Searchable, Structured Document Manager37 versions - Latest release: 4 months ago - 202 downloads last month - 1 stars on GitHub - 1 maintainer
tikara 0.1.6
The metadata and text content extractor for almost every file type.6 versions - Latest release: 8 months ago - 55 downloads last month - 4 stars on GitHub - 1 maintainer
doc23 0.1.2
Powerful Python library to convert documents (PDF, DOCX, TXT) into structured JSON trees for lega...3 versions - Latest release: 5 months ago - 90 downloads last month - 0 stars on GitHub - 1 maintainer
llm_etl_pipeline 0.1.0
LLM extraction from documents1 version - Latest release: 3 months ago - 17 downloads last month - 1 stars on GitHub - 1 maintainer
flexidata 0.0.19
FlexiData is an open-source Python package designed for processing unstructured data.19 versions - Latest release: over 1 year ago - 61 downloads last month - 0 stars on GitHub - 1 maintainer
docling-google-ocr 2.13.1
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...2 versions - Latest release: 8 months ago - 72 downloads last month - 38,808 stars on GitHub - 1 maintainer
invaro 0.0.4
Python SDK for Invaro's document parsing and unified accounting APIs4 versions - Latest release: 4 months ago - 23 downloads last month - 1 maintainer
pdf-bank-statement-parser 0.1.1
Command-line tool for converting PDF bank statements into CSV2 versions - Latest release: 11 months ago - 19 downloads last month - 4 stars on GitHub - 1 maintainer
anyparser-crewai 0.0.2
Anyparser CrewAI Integration2 versions - Latest release: 8 months ago - 29 downloads last month - 1 stars on GitHub - 1 maintainer
extended-docling 2.12.1
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...1 version - Latest release: 9 months ago - 19 downloads last month - 38,808 stars on GitHub - 1 maintainer
pyreparse 0.0.3
Build Efficient RegExp Parsing Engines.1 version - Latest release: about 3 years ago - 8 downloads last month - 0 stars on GitHub - 1 maintainer
llama-cloud-services 0.6.69
Tailored SDK clients for LlamaCloud services.68 versions - Latest release: 10 days ago - 12.5 million downloads last month - 3,956 stars on GitHub - 1 maintainer
llama-index-node-parser-docling 0.4.1
llama-index node_parser docling integration7 versions - Latest release: 24 days ago - 7.64 thousand downloads last month - 28,777 stars on GitHub - 1 maintainer
llama-index-readers-docling 0.4.1
llama-index readers docling integration8 versions - Latest release: 24 days ago - 13.9 thousand downloads last month - 27,013 stars on GitHub - 1 maintainer
llama-index-readers-llama-parse 0.5.0
llama-index readers llama-parse integration9 versions - Latest release: 2 months ago - 6 dependent packages - 2.22 million downloads last month - 3,956 stars on GitHub - 1 maintainer
llm-parse 0.1.5
Parse data from documents optimised for downstream llm tasks.6 versions - Latest release: 3 months ago - 103 downloads last month - 3,859 stars on GitHub - 1 maintainer
documiner 0.8.2
Advanced tool designed for text analysis and data mining in documents1 version - Latest release: 3 months ago - 1 maintainer
Related Keywords
pdf
16
document-parser
13
pdf-to-json
12
pdf-to-text
11
document
10
docx
10
tables
10
ai
9
pptx
8
ocr
8
llm
7
markdown
7
structured-data
7
documents
6
pdf-converter
6
html
6
machine-learning
5
document-understanding
5
parsing
5
pdf-to-markdown
5
document-processing
5
xlsx
5
nlp
5
convert
5
artificial-intelligence
4
information-extraction
4
PDF
4
document-intelligence
4
rag
4
document-extraction
4
pdf-parser
4
document-analysis
4
data-extraction
4
textrecognition
3
ppt-to-markdown
3
paddleocr
3
east
3
crnn
3
textdetection
3
text-extraction
3
langchain
3
zero-shot
3
unstructured-data
3
text-processing
3
structured-data-extraction
3
ppt-to-json
3
pdf-to-excel
3
pdf-document-processor
3
docx-to-markdown
3
python
3
natural-language-processing
3
ml
3
NLP
3
pp-structure
3
pp-ocr
3
pdf2markdown
3
kie
3
document-translation
3
chinesetextrecognition
3
chinesetextdetection
3
chineseocr
3
db
3
ocrlite
3
rosetta
3
star-net
3
table former
3
content-extraction
3
table structure
3
segmentation
3
document-qa
3
layout model
3
llm-library
3
multilingual
3
docling
3
document-pipeline
3
llm-reasoning
3
generative-ai
3
multimodal
3
neural-segmentation
3
llm-framework
3
insights-extraction
3
automated-prompting
3
knowledge-extraction
3
large-language-models
3
legaltech
3
llm-extraction
3
contract-management
2
contract-intelligence
2
contract-automation
2
pdf-parsing
2
contract-parsing
2
contract-review
2
entity-extraction
2
extraction-justifications
2
extraction-pipeline
2
fintech
2
retrieval-augmented-generation
2
aspect-extraction
2
document-management
2
concept-extraction
2