An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "document-parsing" keyword

View the packages on the pypi.org package registry that are tagged with the "document-parsing" keyword.

docling-enhanced 2.32.0
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...
1 version - Latest release: 5 months ago - 41 downloads last month - 40,160 stars on GitHub - 1 maintainer
contextgem 0.19.2
Effortless LLM extraction from documents
40 versions - Latest release: 2 days ago - 2.93 thousand downloads last month - 1,511 stars on GitHub - 1 maintainer
langchain-opendataloader-pdf 0.0.1
A LangChain integration for OpenDataLoader PDF
1 version - Latest release: 2 days ago - 2 stars on GitHub
opendataloader-pdf 1.1.0
A Python wrapper for the opendataloader-pdf Java CLI.
21 versions - Latest release: 2 days ago - 2.02 thousand downloads last month - 650 stars on GitHub - 1 maintainer
docstrange 1.1.6
Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, J...
17 versions - Latest release: 22 days ago - 2.23 thousand downloads last month - 625 stars on GitHub - 1 maintainer
je-paddleocr 2.9.1
Awesome OCR toolkits based on PaddlePaddle(8.6M ultra-lightweight pre-trained model, support trai...
1 version - Latest release: 7 months ago - 324 downloads last month - 54,973 stars on GitHub - 1 maintainer
ppocrlabel-japan 0.0.2
PPOCRLabelv2 is a semi-automatic graphic annotation tool suitable for OCR field, with built-in PP...
2 versions - Latest release: over 2 years ago - 26 downloads last month - 56,061 stars on GitHub - 1 maintainer
paddleocrwordleveldetection 2.6.1.0
Awesome OCR toolkits based on PaddlePaddle (8.6M ultra-lightweight pre-trained model, support tra...
1 version - Latest release: over 2 years ago - 17 downloads last month - 56,061 stars on GitHub - 1 maintainer
Top 1.5% on pypi.org
unstructured 0.18.15
A library that prepares raw documents for downstream ML tasks.
193 versions - Latest release: 15 days ago - 113 dependent packages - 3,374 dependent repositories - 3.94 million downloads last month - 12,775 stars on GitHub - 1 maintainer
unstructured-cpu 0.15.1
A library that prepares raw documents for downstream ML tasks.
13 versions - Latest release: about 1 year ago - 73 downloads last month - 12,775 stars on GitHub - 1 maintainer
doculyzer 0.41.0
Universal, Searchable, Structured Document Manager
37 versions - Latest release: 4 months ago - 202 downloads last month - 1 stars on GitHub - 1 maintainer
tikara 0.1.6
The metadata and text content extractor for almost every file type.
6 versions - Latest release: 8 months ago - 55 downloads last month - 4 stars on GitHub - 1 maintainer
doc23 0.1.2
Powerful Python library to convert documents (PDF, DOCX, TXT) into structured JSON trees for lega...
3 versions - Latest release: 5 months ago - 90 downloads last month - 0 stars on GitHub - 1 maintainer
llm_etl_pipeline 0.1.0
LLM extraction from documents
1 version - Latest release: 3 months ago - 17 downloads last month - 1 stars on GitHub - 1 maintainer
flexidata 0.0.19
FlexiData is an open-source Python package designed for processing unstructured data.
19 versions - Latest release: over 1 year ago - 61 downloads last month - 0 stars on GitHub - 1 maintainer
docling-google-ocr 2.13.1
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...
2 versions - Latest release: 8 months ago - 72 downloads last month - 38,808 stars on GitHub - 1 maintainer
invaro 0.0.4
Python SDK for Invaro's document parsing and unified accounting APIs
4 versions - Latest release: 4 months ago - 23 downloads last month - 1 maintainer
pdf-bank-statement-parser 0.1.1
Command-line tool for converting PDF bank statements into CSV
2 versions - Latest release: 11 months ago - 19 downloads last month - 4 stars on GitHub - 1 maintainer
anyparser-crewai 0.0.2
Anyparser CrewAI Integration
2 versions - Latest release: 8 months ago - 29 downloads last month - 1 stars on GitHub - 1 maintainer
extended-docling 2.12.1
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...
1 version - Latest release: 9 months ago - 19 downloads last month - 38,808 stars on GitHub - 1 maintainer
pyreparse 0.0.3
Build Efficient RegExp Parsing Engines.
1 version - Latest release: about 3 years ago - 8 downloads last month - 0 stars on GitHub - 1 maintainer
llama-cloud-services 0.6.69
Tailored SDK clients for LlamaCloud services.
68 versions - Latest release: 10 days ago - 12.5 million downloads last month - 3,956 stars on GitHub - 1 maintainer
llama-index-node-parser-docling 0.4.1
llama-index node_parser docling integration
7 versions - Latest release: 24 days ago - 7.64 thousand downloads last month - 28,777 stars on GitHub - 1 maintainer
llama-index-readers-docling 0.4.1
llama-index readers docling integration
8 versions - Latest release: 24 days ago - 13.9 thousand downloads last month - 27,013 stars on GitHub - 1 maintainer
llama-index-readers-llama-parse 0.5.0
llama-index readers llama-parse integration
9 versions - Latest release: 2 months ago - 6 dependent packages - 2.22 million downloads last month - 3,956 stars on GitHub - 1 maintainer
llm-parse 0.1.5
Parse data from documents optimised for downstream llm tasks.
6 versions - Latest release: 3 months ago - 103 downloads last month - 3,859 stars on GitHub - 1 maintainer
documiner 0.8.2
Advanced tool designed for text analysis and data mining in documents
1 version - Latest release: 3 months ago - 1 maintainer
Related Keywords
pdf 16 document-parser 13 pdf-to-json 12 pdf-to-text 11 document 10 docx 10 tables 10 ai 9 pptx 8 ocr 8 llm 7 markdown 7 structured-data 7 documents 6 pdf-converter 6 html 6 machine-learning 5 document-understanding 5 parsing 5 pdf-to-markdown 5 document-processing 5 xlsx 5 nlp 5 convert 5 artificial-intelligence 4 information-extraction 4 PDF 4 document-intelligence 4 rag 4 document-extraction 4 pdf-parser 4 document-analysis 4 data-extraction 4 textrecognition 3 ppt-to-markdown 3 paddleocr 3 east 3 crnn 3 textdetection 3 text-extraction 3 langchain 3 zero-shot 3 unstructured-data 3 text-processing 3 structured-data-extraction 3 ppt-to-json 3 pdf-to-excel 3 pdf-document-processor 3 docx-to-markdown 3 python 3 natural-language-processing 3 ml 3 NLP 3 pp-structure 3 pp-ocr 3 pdf2markdown 3 kie 3 document-translation 3 chinesetextrecognition 3 chinesetextdetection 3 chineseocr 3 db 3 ocrlite 3 rosetta 3 star-net 3 table former 3 content-extraction 3 table structure 3 segmentation 3 document-qa 3 layout model 3 llm-library 3 multilingual 3 docling 3 document-pipeline 3 llm-reasoning 3 generative-ai 3 multimodal 3 neural-segmentation 3 llm-framework 3 insights-extraction 3 automated-prompting 3 knowledge-extraction 3 large-language-models 3 legaltech 3 llm-extraction 3 contract-management 2 contract-intelligence 2 contract-automation 2 pdf-parsing 2 contract-parsing 2 contract-review 2 entity-extraction 2 extraction-justifications 2 extraction-pipeline 2 fintech 2 retrieval-augmented-generation 2 aspect-extraction 2 document-management 2 concept-extraction 2