An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

Top 0.5% on pypi.org
Top 0.2% downloads on pypi.org
Top 0.2% dependent packages on pypi.org
Top 0.2% dependent repos on pypi.org
Top 1.5% forks on pypi.org
Top 0.3% docker downloads on pypi.org

pypi.org : pdfminer.six

PDF parser and analyzer

Registry - Source - Documentation - JSON
purl: pkg:pypi/pdfminer.six
Keywords: layout analysis , pdf converter , pdf parser , text mining , parser , pdf , python
License: MIT
Latest release: 3 days ago
First release: over 10 years ago
Dependent packages: 162
Dependent repositories: 2,496
Downloads: 7,750,129 last month
Stars: 5,784 on GitHub
Forks: 916 on GitHub
Docker dependents: 222
Docker downloads: 985,186,589
Total Commits: 833
Committers: 131
Average commits per author: 6.359
Development Distribution Score (DDS): 0.661
More commit stats: commits.ecosyste.ms
See more repository details: repos.ecosyste.ms
Last synced: about 15 hours ago

doctableflow removed
Fetch Tables from the documents
6 versions - 730 downloads last month - 1 maintainer
fastagi 0.0.27
Private AutoGPT Robot - Your private task assistant with GPT!
1 version - Latest release: 10 months ago - 53 downloads last month - 157 stars on GitHub - 1 maintainer
allm 1.0.9
A simple and efficient python library for fast inference of GGUF Large Language Models.
10 versions - Latest release: 10 months ago - 209 downloads last month - 1 maintainer
h2ogpt 0.2.1
Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports...
2 versions - Latest release: 11 months ago - 91 downloads last month - 11,767 stars on GitHub - 1 maintainer
getpaper 0.5.0
getpaper - papers download made easy!
41 versions - Latest release: 11 months ago - 1 dependent repositories - 1.82 thousand downloads last month - 1 maintainer
pyresumeparser 0.0.9
A package for parsing resume and extracting entities.
8 versions - Latest release: 11 months ago - 344 downloads last month - 1 maintainer
dp-pdf-crawler 1.0.3
A custom Flask package with PDF processing tools
7 versions - Latest release: 11 months ago - 275 downloads last month - 1 maintainer
copy-spotter 0.1.16
Make plagiarism detection easier. This package will find similar sentences between given files an...
16 versions - Latest release: 11 months ago - 334 downloads last month - 38 stars on GitHub - 1 maintainer
platform-gen-ai 0.1.1
This is pipeline code for accelerating solution accelerators
2 versions - Latest release: 12 months ago - 109 downloads last month - 1 maintainer
ant-fin-agent-framework 0.0.2
AntFinAgentFramework is a framework for developing applications powered by multi-agent base on la...
3 versions - Latest release: about 1 year ago - 146 downloads last month - 1 maintainer
reviewboardpowerpack 5.2.3
Enhances Review Board with PDF review and diffing, reports and analytics, new source code managem...
35 versions - Latest release: about 1 year ago - 1 dependent repositories - 1.09 thousand downloads last month - 2 maintainers
yosemite 0.1.15
yosemite
17 versions - Latest release: about 1 year ago - 629 downloads last month - 1 maintainer
preprocess-docs 0.0.6
An open source document preprocessor for AI.
6 versions - Latest release: about 1 year ago - 199 downloads last month - 0 stars on GitHub - 1 maintainer
os-copilot 0.1.0
An self-improving embodied conversational agents seamlessly integrated into the operating system ...
1 version - Latest release: about 1 year ago - 51 downloads last month - 1,425 stars on GitHub - 1 maintainer
lamatic-airbyte-cdk 0.1.0
A framework for writing Airbyte Connectors.
1 version - Latest release: about 1 year ago - 53 downloads last month - 16,769 stars on GitHub - 1 maintainer
suzuka 0.0.1
ML
1 version - Latest release: about 1 year ago - 2 dependent packages - 44 downloads last month - 1 maintainer
hammadml 0.1.10
ML
16 versions - Latest release: about 1 year ago - 550 downloads last month - 2 stars on GitHub - 1 maintainer
datarxiv 0.0.0
Tools for data in Python
1 version - Latest release: about 1 year ago - 51 downloads last month - 1 maintainer
friday-agent 0.1.0
An self-improving embodied conversational agent seamlessly integrated into the operating system t...
1 version - Latest release: about 1 year ago - 47 downloads last month - 1,631 stars on GitHub - 1 maintainer
Top 8.1% on pypi.org
pdf2doi 1.5.1
A python library/command-line tool to extract the DOI or other identifiers of a scientific paper...
50 versions - Latest release: about 1 year ago - 5 dependent packages - 3 dependent repositories - 3.21 thousand downloads last month - 90 stars on GitHub - 1 maintainer
hotpdf 0.5.2
Fast PDF Data Extraction library
27 versions - Latest release: about 1 year ago - 2.44 thousand downloads last month - 186 stars on GitHub - 1 maintainer
ptol 0.1.2
A Pipeline for Obtaining Relevant Literature Based on Given Keywords
3 versions - Latest release: about 1 year ago - 144 downloads last month - 0 stars on GitHub - 1 maintainer
readpdffiles 1.0.5
A Python package for reading PDF files.
6 versions - Latest release: about 1 year ago - 248 downloads last month - 1 maintainer
fetchit 0.0.2
Tools for data in Python
2 versions - Latest release: about 1 year ago - 98 downloads last month - 1 maintainer
officeparserpy 1.0.10
A Python library to parse text out of any office file. Currently supports docx, pptx, xlsx, odt, ...
5 versions - Latest release: about 1 year ago - 678 downloads last month - 3 stars on GitHub - 1 maintainer
pdfscraper 1.1.9
PDF text and table search
23 versions - Latest release: about 1 year ago - 1 dependent repositories - 768 downloads last month - 1,722 stars on GitHub - 1 maintainer
indegreeparser 1.0.1
A modified resume parser built on the pyresparse library used for extracting information from res...
2 versions - Latest release: about 1 year ago - 66 downloads last month - 1 maintainer
indegeparser 1.0.0 removed
A modified resume parser built on the pyresparse library used for extracting information from res...
1 version - Latest release: about 1 year ago - 1 maintainer
indegparser 1.0.0 removed
A modified resume parser built on the pyresparse library used for extracting information from res...
1 version - Latest release: about 1 year ago - 1 maintainer
pydoxtools 0.8.0
This library contains a set of tools in order to extract and synthesize structured information fr...
12 versions - Latest release: about 1 year ago - 251 downloads last month - 55 stars on GitHub - 1 maintainer
analysta-indexer 0.1.0 removed
Extension of Langchain loaders, llms and retrievers for Analysta
1 version - Latest release: about 1 year ago - 1 maintainer
Top 9.1% on pypi.org
autollm 0.1.10
Ship RAG based LLM Web API's, in seconds.
33 versions - Latest release: about 1 year ago - 1 dependent repositories - 1.16 thousand downloads last month - 971 stars on GitHub - 1 maintainer
ragchain 0.2.6
Build advanced RAG workflows with LLM, compatible with Langchain
13 versions - Latest release: over 1 year ago - 462 downloads last month - 279 stars on GitHub - 1 maintainer
beancount-ce 1.2.0
Beancount statements (pdf and csv) importer for Caisse d'Epargne bank
10 versions - Latest release: over 1 year ago - 1 dependent repositories - 341 downloads last month - 5 stars on GitHub - 1 maintainer
toto-pdfquery 0.1.3
Concise and friendly PDF scraper using JQuery or XPath selectors. Forked from Jack Cushman's pdfq...
4 versions - Latest release: over 1 year ago - 150 downloads last month - 0 stars on GitHub - 1 maintainer
pydparser 1.0.4
A simple resume and job description parser used for extracting information from resumes and job d...
2 versions - Latest release: over 1 year ago - 151 downloads last month - 8 stars on GitHub - 1 maintainer
pdftextsplitter 2.1.4
This packages can read PDF documents and automatically recognise chapter-titles, enumerations and...
23 versions - Latest release: over 1 year ago - 2 dependent packages - 809 downloads last month - 1 maintainer
ocrversion2 0.2.0
1 version - Latest release: over 1 year ago - 54 downloads last month - 1 maintainer
ocrversion1 0.1.0
1 version - Latest release: over 1 year ago - 52 downloads last month - 1 maintainer
ocr-v1 0.1.0 removed
1 version - Latest release: over 1 year ago - 1 maintainer
uniloaders 0.2.1
Loading everything from URLs to powerpoints
17 versions - Latest release: over 1 year ago - 529 downloads last month - 2 maintainers
bustercp 0.0.2
Buster Chunking Pipeline πŸ€–βœ‚οΈ
2 versions - Latest release: over 1 year ago - 82 downloads last month - 1 maintainer
textract-edited-dependencies 0.0.2
extract text from any document. no muss. no fuss.
2 versions - Latest release: over 1 year ago - 98 downloads last month - 4,072 stars on GitHub - 1 maintainer
sciassist 0.1.4
A toolkit for Scientific Document Processing
47 versions - Latest release: over 1 year ago - 364 downloads last month - 17 stars on GitHub - 2 maintainers
Top 2.1% on pypi.org
invoice2data 0.4.5
Python parser to extract data from pdf invoice
102 versions - Latest release: over 1 year ago - 5 dependent packages - 28 dependent repositories - 8.85 thousand downloads last month - 1,826 stars on GitHub - 2 maintainers
Top 6.8% on pypi.org
evadb 0.3.9
EvaDB AI-Relational Database System
44 versions - Latest release: over 1 year ago - 1 dependent repositories - 1.26 thousand downloads last month - 2,631 stars on GitHub - 1 maintainer
patternfork-nosql-fix 3.6
Web mining module for Python.
1 version - Latest release: over 1 year ago - 1 dependent package - 70 downloads last month - 1 maintainer
txd 0.0.1
Tools for data in Python
1 version - Latest release: over 1 year ago - 65 downloads last month - 1 maintainer
paguro 0.0.1
Tools for data in Python
1 version - Latest release: over 1 year ago - 64 downloads last month - 1 maintainer
arpoon 0.0.1
Tools for data in Python
1 version - Latest release: over 1 year ago - 61 downloads last month - 1 maintainer
pydrad 0.0.1
Tools for data in Python
1 version - Latest release: over 1 year ago - 57 downloads last month - 1 maintainer
gpt-pdf-md 0.3
A Python package that utilizes GPT-4V and other tools to convert PDFs into Markdown files.
3 versions - Latest release: over 1 year ago - 148 downloads last month - 50 stars on GitHub - 1 maintainer
gpt-pdf-reader 1.5
A Python package that utilizes GPT-4V and other tools to extract and process information from PDF...
15 versions - Latest release: over 1 year ago - 768 downloads last month - 1 maintainer
pair-ai 0.5.16
20 versions - Latest release: over 1 year ago - 609 downloads last month - 1 maintainer
vsslite 0.6.1
A vector similarity search engine for humansπŸ₯³
11 versions - Latest release: over 1 year ago - 137 downloads last month - 15 stars on GitHub - 1 maintainer
readyocr 0.0.43
A nice package OCR for Amazon Textract and Google Document AI
39 versions - Latest release: over 1 year ago - 902 downloads last month - 0 stars on GitHub - 1 maintainer
nifigator 0.2.1
Nifigator is a pure Python package for working with NLP in RDF/NIF
20 versions - Latest release: over 1 year ago - 313 downloads last month - 0 stars on GitHub - 1 maintainer
Top 7.1% on pypi.org
nougat-ocr 0.1.17
Nougat: Neural Optical Understanding for Academic Documents
19 versions - Latest release: over 1 year ago - 9 dependent packages - 1 dependent repositories - 21.3 thousand downloads last month - 9,367 stars on GitHub - 1 maintainer
lucidtech-synthetic 0.6.1
PDF anonymizer/synthesizer for Cradl
25 versions - Latest release: over 1 year ago - 2 dependent repositories - 465 downloads last month - 2 stars on GitHub - 1 maintainer
pdf-miner-parser 0.0.1 removed
Additional parser functionality for the pdf_miner library.
1 version - Latest release: over 1 year ago - 1 maintainer
django-marion 0.6.0
The documents factory
10 versions - Latest release: over 1 year ago - 1 dependent package - 1 dependent repositories - 244 downloads last month - 17 stars on GitHub - 1 maintainer
Top 9.7% on pypi.org
chatdocs 0.2.6
Chat with your documents offline using AI.
11 versions - Latest release: over 1 year ago - 1 dependent repositories - 368 downloads last month - 720 stars on GitHub - 1 maintainer
gorpy 2.0.4
Grep tool with extensions for reading files in many different ways
9 versions - Latest release: over 1 year ago - 1 dependent repositories - 243 downloads last month - 0 stars on GitHub - 1 maintainer
nafigator 0.1.64
Python package to convert spaCy and Stanza documents to NLP Annotation Format (NAF)
65 versions - Latest release: over 1 year ago - 4 dependent repositories - 1.02 thousand downloads last month - 2 stars on GitHub - 3 maintainers
pdferli 0.11
Convert PDFs into pandas DataFrames, remove restrictions, put/crack PDF passwords
2 versions - Latest release: over 1 year ago - 362 downloads last month - 7 stars on GitHub - 1 maintainer
pdf2md 0.1.0 removed
Convert PDF files into markdown files
1 version - Latest release: over 1 year ago - 1 maintainer
ruppell 1.0.1
Ruppell is a Python package to help in text extraction from documents.
7 versions - Latest release: almost 2 years ago - 1 dependent repositories - 192 downloads last month - 12 stars on GitHub - 1 maintainer
linkedinpdfextractor 1.3.0 πŸ’°
Add a short description here!
15 versions - Latest release: almost 2 years ago - 366 downloads last month - 2,100 stars on GitHub - 1 maintainer
Top 6.7% on pypi.org
pdf2txt 0.7.14
A better pdf to text extraction toolkit
61 versions - Latest release: almost 2 years ago - 1 dependent package - 3 dependent repositories - 2.25 thousand downloads last month - 1 maintainer
pautobot 0.0.27
Private AutoGPT Robot - Your private task assistant with GPT!
22 versions - Latest release: almost 2 years ago - 465 downloads last month - 156 stars on GitHub - 1 maintainer
anyllm 0.0.26
Private AutoGPT Robot - Your private task assistant with GPT!
1 version - Latest release: almost 2 years ago - 76 downloads last month - 149 stars on GitHub - 1 maintainer
localai 0.0.26
Private AutoGPT Robot - Your private task assistant with GPT!
1 version - Latest release: almost 2 years ago - 144 downloads last month - 157 stars on GitHub - 1 maintainer
localgpt 0.0.26
Private AutoGPT Robot - Your private task assistant with GPT!
1 version - Latest release: almost 2 years ago - 67 downloads last month - 157 stars on GitHub - 1 maintainer
superagi 0.0.26
Private AutoGPT Robot - Your private task assistant with GPT!
1 version - Latest release: almost 2 years ago - 129 downloads last month - 156 stars on GitHub - 2 maintainers
privategpt 0.0.26
Private AutoGPT Robot - Your private task assistant with GPT!
1 version - Latest release: almost 2 years ago - 137 downloads last month - 156 stars on GitHub - 1 maintainer
documentinsightsgenerator 0.1
A package to generate comprehensive insights from documents using NLP techniques.
1 version - Latest release: almost 2 years ago - 56 downloads last month - 0 stars on GitHub - 1 maintainer
screenplay-pdf-to-json-storia 0.1.0
1 version - Latest release: almost 2 years ago - 38 downloads last month - 1 maintainer
polygon-finance 1.0.1 removed
The real-time market analysis library that utilizes the real-time rest apis and websockets from p...
2 versions - Latest release: almost 2 years ago - 1 maintainer
chemdataextractor-c 1.0.0
A toolkit for extracting chemical information from the scientific literature.
1 version - Latest release: almost 2 years ago - 66 downloads last month - 322 stars on GitHub - 1 maintainer
polygon-sdk 1.1.0
Analyze, query, and fetch market data utilizing Polygon.io's suite of services for simulated and ...
3 versions - Latest release: almost 2 years ago - 84 downloads last month - 26 stars on GitHub - 1 maintainer
pdftotree-mercurial 1.3
Convert PDF into hOCR with text, tables, and figures being recognized and preserved. (Without skl...
4 versions - Latest release: almost 2 years ago - 159 downloads last month - 441 stars on GitHub - 1 maintainer
paper2cmap 0.1.3
A package that automatically generates a concept map for a PDF document using LLM.
4 versions - Latest release: almost 2 years ago - 145 downloads last month - 13 stars on GitHub - 1 maintainer
megabots 0.0.11
πŸ€– Megabots provides State-of-the-art, production ready bots made mega-easy, so you don't have to ...
5 versions - Latest release: almost 2 years ago - 1 dependent repositories - 274 downloads last month - 341 stars on GitHub - 1 maintainer
formfyxer 0.2.0
A tool for learning about and pre-processing pdf forms.
14 versions - Latest release: almost 2 years ago - 1 dependent package - 1 dependent repositories - 559 downloads last month - 11 stars on GitHub - 2 maintainers
linkedin-pdf-extractor 1.0.0 πŸ’°
Add a short description here!
2 versions - Latest release: almost 2 years ago - 1,767 stars on GitHub
saltgang 0.1.17 πŸ’°
Add a short description here!
16 versions - Latest release: about 2 years ago - 1 dependent repositories - 504 downloads last month - 2,100 stars on GitHub - 1 maintainer
qnabot 0.0.6
Create a question answering over docs bot with one line of code.
6 versions - Latest release: about 2 years ago - 1 dependent repositories - 190 downloads last month - 349 stars on GitHub - 1 maintainer
knowledgegpt 0.0.7b0
A package for extracting and querying knowledge using GPT models
7 versions - Latest release: about 2 years ago - 196 downloads last month - 2 maintainers
tungsten-sds 0.8.0
An MSDS parser.
10 versions - Latest release: about 2 years ago - 1 dependent repositories - 414 downloads last month - 6 stars on GitHub - 1 maintainer
pyresumize 0.2.2
Resume Parser Written in Python3 . The module supports .pdf and .docx files
21 versions - Latest release: about 2 years ago - 467 downloads last month - 2 stars on GitHub - 1 maintainer
easypdfheading 2.0.3
PDF subheadings finder with text.A package that allows to find subheadings in a PDF.
1 version - Latest release: about 2 years ago - 41 downloads last month - 1 maintainer
pdf-subheadings 2.0
1 version - Latest release: about 2 years ago - 57 downloads last month - 1 maintainer
paperview 0.1.0
1 version - Latest release: over 2 years ago - 42 downloads last month - 1 maintainer
Top 0.9% on pypi.org
textract 1.6.5
extract text from any document. no muss. no fuss.
18 versions - Latest release: about 3 years ago - 23 dependent packages - 739 dependent repositories - 207 thousand downloads last month - 3,754 stars on GitHub - 1 maintainer
Top 3.4% on pypi.org
pdfx 1.4.1 πŸ’°
Extract metadata and URLs from PDF files, and download all referenced PDFs
11 versions - Latest release: about 4 years ago - 1 dependent package - 40 dependent repositories - 3.46 thousand downloads last month - 1,051 stars on GitHub - 1 maintainer