An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "table-extraction" keyword

View the packages on the pypi.org package registry that are tagged with the "table-extraction" keyword.

Top 1.4% on pypi.org
pymupdf 1.26.4
A high performance Python library for data extraction, analysis, conversion & manipulation of PDF...
135 versions - Latest release: 13 days ago - 206 dependent packages - 1,798 dependent repositories - 15.1 million downloads last month - 7,909 stars on GitHub - 1 maintainer
Top 0.8% on pypi.org
pdfplumber 0.11.7
Plumb a PDF for detailed information about each char, rectangle, and line.
73 versions - Latest release: 3 months ago - 118 dependent packages - 1,210 dependent repositories - 5.9 million downloads last month - 8,213 stars on GitHub - 1 maintainer
quipucamayoc 0.1.2
Tools to extract information from digitized historical documents
2 versions - Latest release: over 3 years ago - 1 dependent repositories - 8 downloads last month - 30 stars on GitHub - 1 maintainer
kreuzberg 3.13.0
Document intelligence framework for Python - Extract text, metadata, and structured data from div...
46 versions - Latest release: 3 days ago - 179 thousand downloads last month - 2,329 stars on GitHub - 1 maintainer
tablecv 0.1.1
Table extraction from image.
2 versions - Latest release: almost 2 years ago - 175 downloads last month - 10 stars on GitHub - 1 maintainer
docstrange 1.1.5
Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, J...
16 versions - Latest release: 5 days ago - 1.93 thousand downloads last month - 493 stars on GitHub - 1 maintainer
Top 1.8% on pypi.org
pymupdfb 1.24.10
MuPDF shared libraries for PyMuPDF.
18 versions - Latest release: about 1 year ago - 4 dependent packages - 133 dependent repositories - 1.62 million downloads last month - 7,854 stars on GitHub - 1 maintainer
aqpymupdf 1.23.7
A high performance Python library for data extraction, analysis, conversion & manipulation of PDF...
1 version - Latest release: over 1 year ago - 44 downloads last month - 7,854 stars on GitHub - 1 maintainer
pdfplumber-aemc 0.11.3
Plumb a PDF for detailed information about each char, rectangle, and line.
16 versions - Latest release: over 1 year ago - 1 dependent repositories - 133 downloads last month - 8,179 stars on GitHub - 1 maintainer
llm-data-converter 2.2.0
Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPo...
23 versions - Latest release: about 1 month ago - 297 downloads last month - 3 stars on GitHub - 1 maintainer
document-data-extractor 1.0.4
Best open-source document to markdown extractor for LLM training data. Convert PDF, Word, PowerPo...
5 versions - Latest release: about 1 month ago - 78 downloads last month - 3 stars on GitHub - 1 maintainer
Top 7.1% on pypi.org
img2table 1.4.2
img2table is a table identification and extraction Python Library for PDF and images, based on Op...
57 versions - Latest release: 28 days ago - 1 dependent package - 4 dependent repositories - 35.1 thousand downloads last month - 788 stars on GitHub - 1 maintainer
table-transformer 1.0.6
Table Transformer
5 versions - Latest release: 12 months ago - 499 downloads last month - 2,682 stars on GitHub - 1 maintainer
extractable 1.0.2
Extract tables from PDFs
124 versions - Latest release: over 1 year ago - 1 dependent repositories - 1.3 thousand downloads last month - 30 stars on GitHub - 1 maintainer
pyany2json 0.1.3
Python binding to Any2Json
4 versions - Latest release: over 1 year ago - 12 downloads last month - 0 stars on GitHub - 1 maintainer
pdfmod 0.1.5
A tool for PDF file manipulation.
1 version - Latest release: 10 months ago - 20 downloads last month - 7,114 stars on GitHub - 1 maintainer
docext 0.1.14
Onprem information extraction from documents
12 versions - Latest release: 2 months ago - 914 downloads last month - 1,660 stars on GitHub - 1 maintainer
mseep-kreuzberg 3.8.2
Document intelligence framework for Python - Extract text, metadata, and structured data from div...
1 version - Latest release: about 2 months ago
depdf 0.2.2
PDF table & paragraph extractor
4 versions - Latest release: about 5 years ago - 1 dependent repositories - 63 downloads last month - 11 stars on GitHub - 1 maintainer
extracttable 2.4.0
Extract table data from images and scanned PDFs. Easily convert image to excel, convert pdf to table
16 versions - Latest release: about 3 years ago - 1 dependent repositories - 1.24 thousand downloads last month - 279 stars on GitHub - 1 maintainer
pdftablr 0.1.0
Python3 implementation of Kyle Cronan's pdftable module, with unit tests
1 version - Latest release: almost 8 years ago - 1 dependent repositories - 47 downloads last month - 2 stars on GitHub - 1 maintainer
markdrop 3.5.0
A comprehensive PDF processing toolkit that converts PDFs to markdown with advanced AI-powered fe...
20 versions - Latest release: 2 months ago - 365 downloads last month - 116 stars on GitHub - 2 maintainers
krank 0.0.1
Fetch psychology datasets from remote sources.
2 versions - Latest release: almost 3 years ago - 1 dependent repositories - 13 downloads last month - 0 stars on GitHub - 1 maintainer
Related Keywords
pdf 13 ocr 12 python 9 rag 6 document-processing 6 tesseract 6 text-extraction 5 llm 4 markdown 4 image-processing 4 pdf-to-markdown 4 xps 4 data-science 4 epub 4 extract-data 4 font 4 mupdf 4 pdf-documents 4 pymupdf 4 text-shaping 4 text-processing 4 document-conversion 3 batch-document-processing 3 word-to-markdown 3 powerpoint-to-markdown 3 intelligent-document-processing 3 document-understanding 3 ai-training-data 3 unstructured-alternative 3 docling-alternative 3 marker-alternative 3 markitdown-alternative 3 mineru-alternative 3 paddleocr-alternative 3 excel-to-markdown 3 tesseract-alternative 3 document-to-markdown 3 html-to-markdown 3 local-document-processing 3 structured-data-extraction 3 layout-detection 3 llm-ready-data 3 document-ai 3 document-analysis 3 pdf-parsing 3 structured-data 3 async 2 document-intelligence 2 extensible 2 information-extraction 2 mcp 2 metadata-extraction 2 model-context-protocol 2 pandoc 2 pdf-extraction 2 pdfium 2 plugin-architecture 2 retrieval-augmented-generation 2 opencv 2 tables 2 offline-document-extractor 2 ppt-to-markdown 2 ai 2 offline-document-converter 2 image-table-recognition 1 extracttable 1 pdftk 1 pdf-table-extract 1 tabular-data 1 python3 1 csv 1 liwc 1 paragraph-extraction 1 pdf-to-html 1 mseep 1 vlms 1 unstructured-data 1 onpremise 1 onprem-vision 1 onprem-ocr 1 onprem 1 ocr-onpremise 1 ocr-benchmark 1 nlp 1 meta-analysis 1 LIWC 1 text 1 datasets 1 data 1 table-to-text 1 pypi-package 1 pdf-to-text 1 open-source 1 markitdown 1 marker 1 markdrop 1 image-to-text 1 docling 1 agents 1 openai 1