An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "pdf-to-text" keyword

View the packages on the pypi.org package registry that are tagged with the "pdf-to-text" keyword.

unstructured-cpu 0.15.1
A library that prepares raw documents for downstream ML tasks.
13 versions - Latest release: 8 months ago - 368 downloads last month - 10,877 stars on GitHub - 1 maintainer
llama-cloud-services 0.6.12
Tailored SDK clients for LlamaCloud services.
13 versions - Latest release: 8 days ago - 2.54 million downloads last month - 3,878 stars on GitHub - 1 maintainer
llama-index-readers-docling 0.3.2
llama-index readers docling integration
5 versions - Latest release: about 1 month ago - 5.73 thousand downloads last month - 27,013 stars on GitHub - 1 maintainer
bangla-pdf-ocr 0.1.1
A package to extract Bengali text from PDFs using OCR
2 versions - Latest release: 6 months ago - 137 downloads last month - 8 stars on GitHub - 1 maintainer
llm-parse 0.1.4
Parse data from documents optimised for downstream llm tasks.
5 versions - Latest release: 7 months ago - 258 downloads last month - 3,859 stars on GitHub - 1 maintainer
extended-docling 2.12.1
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...
1 version - Latest release: 4 months ago - 74 downloads last month - 26,056 stars on GitHub - 1 maintainer
docling-google-ocr 2.13.1
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...
2 versions - Latest release: 3 months ago - 117 downloads last month - 26,056 stars on GitHub - 1 maintainer
llama-index-node-parser-docling 0.3.1
llama-index node_parser docling integration
4 versions - Latest release: 2 months ago - 3.86 thousand downloads last month - 26,056 stars on GitHub - 1 maintainer
llama-index-readers-llama-parse 0.4.0
llama-index readers llama-parse integration
8 versions - Latest release: 5 months ago - 6 dependent packages - 2.04 million downloads last month - 3,827 stars on GitHub - 1 maintainer
tikara 0.1.6
The metadata and text content extractor for almost every file type.
6 versions - Latest release: 3 months ago - 214 downloads last month - 1 stars on GitHub - 1 maintainer
markdrop 0.3.1
A comprehensive PDF processing toolkit that converts PDFs to markdown with advanced AI-powered fe...
19 versions - Latest release: 3 months ago - 886 downloads last month - 84 stars on GitHub - 1 maintainer
pcu-io 1.2.2
IO management for PCU project
2 versions - Latest release: over 6 years ago - 1 dependent repositories - 89 downloads last month - 0 stars on GitHub - 1 maintainer
pcu-pdf 1.2.2
PDF parser component (Apache Tika) for PCU project
2 versions - Latest release: over 6 years ago - 1 dependent repositories - 111 downloads last month - 1 stars on GitHub - 1 maintainer
Top 1.5% on pypi.org
unstructured 0.17.0
A library that prepares raw documents for downstream ML tasks.
181 versions - Latest release: about 1 month ago - 113 dependent packages - 3,374 dependent repositories - 2.7 million downloads last month - 9,368 stars on GitHub - 1 maintainer
clearedge 0.1.17
Build a RAG preprocessing pipeline
18 versions - Latest release: about 1 year ago - 473 downloads last month - 11 stars on GitHub - 1 maintainer