pypi.org "documents" keyword
View the packages on the pypi.org package registry that are tagged with the "documents" keyword.
Top 1.7% on pypi.org
23 versions - Latest release: about 2 months ago - 93 dependent packages - 1,712 dependent repositories - 3.25 million downloads last month - 315 stars on GitHub - 2 maintainers
svglib 1.6.0
A pure-Python library for reading and converting SVG23 versions - Latest release: about 2 months ago - 93 dependent packages - 1,712 dependent repositories - 3.25 million downloads last month - 315 stars on GitHub - 2 maintainers
pyugt 1.0.10
Universal Game Translator from on-screen text in Python32 versions - Latest release: almost 2 years ago - 1 dependent repositories - 151 downloads last month - 32 stars on GitHub - 1 maintainer
Top 1.6% on pypi.org
30 versions - Latest release: about 2 years ago - 5 dependent packages - 255 dependent repositories - 29.1 thousand downloads last month - 7,831 stars on GitHub - 1 maintainer
pyocr 0.8.5
A Python wrapper for OCR engines (Tesseract, Cuneiform, etc)30 versions - Latest release: about 2 years ago - 5 dependent packages - 255 dependent repositories - 29.1 thousand downloads last month - 7,831 stars on GitHub - 1 maintainer
docler 1.0.2 💰
Abstractions & Tools for OCR / document processing30 versions - Latest release: about 1 month ago - 160 downloads last month - 2 stars on GitHub - 1 maintainer
Top 1.8% on pypi.org
10 versions - Latest release: about 1 year ago - 214 dependent packages - 1,548 dependent repositories - 60 million downloads last month - 702 stars on GitHub - 2 maintainers
jsonpath-ng 1.7.0
A final implementation of JSONPath for Python that aims to be standard compliant, including arith...10 versions - Latest release: about 1 year ago - 214 dependent packages - 1,548 dependent repositories - 60 million downloads last month - 702 stars on GitHub - 2 maintainers
Top 1.0% on pypi.org
30 versions - Latest release: 9 days ago - 140 dependent packages - 2,028 dependent repositories - 5.3 million downloads last month - 3,114 stars on GitHub - 2 maintainers
cerberus 1.3.8 💰
Lightweight, extensible schema and data validation tool for Pythondictionaries.30 versions - Latest release: 9 days ago - 140 dependent packages - 2,028 dependent repositories - 5.3 million downloads last month - 3,114 stars on GitHub - 2 maintainers
opendataloader-pdf 1.2.0
A Python wrapper for the opendataloader-pdf Java CLI.25 versions - Latest release: about 23 hours ago - 1.37 thousand downloads last month - 702 stars on GitHub - 1 maintainer
llama-index-readers-docling 0.4.1
llama-index readers docling integration8 versions - Latest release: 2 months ago - 13.9 thousand downloads last month - 27,013 stars on GitHub - 1 maintainer
llama-index-node-parser-docling 0.4.1
llama-index node_parser docling integration7 versions - Latest release: 2 months ago - 10.2 thousand downloads last month - 28,777 stars on GitHub - 1 maintainer
unoserver-appx 1.0.1
A server for file conversions with Libre Office2 versions - Latest release: over 1 year ago - 8 downloads last month - 778 stars on GitHub - 1 maintainer
docdocgo 0.1.0
Insert pandas DataFrames as tables into Microsoft Word documents1 version - Latest release: about 2 months ago - 28 downloads last month - 1 maintainer
unstructured-platform 0.4.3
Python SDK for the Unstructured Platform API4 versions - Latest release: 10 months ago - 13 downloads last month - 1 maintainer
backboard-sdk 1.4.3
Python SDK for the Backboard API - Build conversational AI applications with persistent memory an...17 versions - Latest release: 5 days ago - 373 downloads last month - 1 maintainer
pr2md 1.0.10 💰
Pull Request Markdown Generator9 versions - Latest release: 5 days ago - 747 downloads last month - 0 stars on GitHub - 1 maintainer
docsumm-ai 0.1.0
Audience-aware document summarizer for PDF/DOCX/TXT — optimized for context retention, not token ...1 version - Latest release: about 1 month ago - 56 downloads last month - 0 stars on GitHub - 1 maintainer
whakerkit 2.0
The implementation of a WhakerKit project website.4 versions - Latest release: about 1 month ago - 83 downloads last month - 1 maintainer
expose-text 0.1.6
A Python module that exposes text for modification in multiple file types.4 versions - Latest release: about 5 years ago - 1 dependent repositories - 24 downloads last month - 18 stars on GitHub - 1 maintainer
esg-classification 0.1.3
Pipeline for classifying sustainability reports with heuristics and LLM backends.4 versions - Latest release: about 1 month ago - 73 downloads last month - 1 maintainer
nr-documents-app 1.0.8
Application package for NR documents site9 versions - Latest release: over 3 years ago - 1 dependent repositories - 16 downloads last month - 0 stars on GitHub - 1 maintainer
frat 2.0.7
Fast Rectangle Annotation Tool6 versions - Latest release: over 2 years ago - 1 dependent repositories - 21 downloads last month - 9 stars on GitHub - 1 maintainer
gdoc-down 0.0.10
Download Google documents to files5 versions - Latest release: over 5 years ago - 1 dependent repositories - 37 downloads last month - 15 stars on GitHub - 2 maintainers
docling-google-ocr 2.13.1
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...2 versions - Latest release: 10 months ago - 109 downloads last month - 38,808 stars on GitHub - 1 maintainer
gradio-test2 0.0.2
This is a test component1 version - Latest release: about 2 years ago - 13 downloads last month - 1 maintainer
pwr 2.6
This program helps you write html pages like documents in word processors3 versions - Latest release: about 12 years ago - 4 dependent repositories - 42 downloads last month - 0 stars on GitHub - 1 maintainer
akoma2md 2.0.8
Convertitore da XML Akoma Ntoso a formato Markdown con download automatico delle leggi citate e c...33 versions - Latest release: 10 days ago - 30.5 thousand downloads last month
dsw-tdk 4.24.0 💰
Data Stewardship Wizard Template Development Toolkit121 versions - Latest release: 11 days ago - 1 dependent repositories - 1.03 thousand downloads last month - 4 stars on GitHub - 1 maintainer
peslac 0.1.4
A Python package for the Peslac API5 versions - Latest release: 10 months ago - 22 downloads last month - 0 stars on GitHub - 1 maintainer
scipreprocess 0.1.2
A modular pipeline for preprocessing scientific documents (PDF, DOCX, TEX, XML, TXT)3 versions - Latest release: about 1 month ago - 180 downloads last month - 1 maintainer
llama-index-readers-preprocess 0.4.1
llama-index readers preprocess integration10 versions - Latest release: 2 months ago - 40 downloads last month - 4 stars on GitHub - 1 maintainer
modoboa-pdfcredentials 1.5.0 💰
Generate PDF documents containing user credentials15 versions - Latest release: over 3 years ago - 2 dependent repositories - 130 downloads last month - 8 stars on GitHub - 1 maintainer
scrapontologies 1.1.0 💰
Library for extracting schemas and building ontologies from documents using LLM2 versions - Latest release: about 1 year ago - 11 downloads last month - 18,931 stars on GitHub - 2 maintainers
django-docmgr 0.6.2
A pluggable Django app to reference documents (files) in models.2 versions - Latest release: about 2 months ago - 35 downloads last month - 1 maintainer
ffmulticonverter 1.7.1
GUI File Format Converter13 versions - Latest release: about 2 years ago - 2 dependent repositories - 8 downloads last month - 1 maintainer
docman 0.0.6
Document Manager6 versions - Latest release: over 1 year ago - 33 downloads last month - 0 stars on GitHub - 1 maintainer
adamsnrc 2.0.13
A robust Python client for the Nuclear Regulatory Commission's ADAMS API1 version - Latest release: about 2 months ago - 1 maintainer
docsvault 0.1.4
Web app used to securely version your documents on git6 versions - Latest release: about 5 years ago - 1 dependent repositories - 31 downloads last month - 1 maintainer
snaketex 0.1.3
A LaTeX template system for large and multi-user projects.5 versions - Latest release: over 9 years ago - 1 dependent repositories - 14 downloads last month - 1 stars on GitHub - 1 maintainer
cognify-sdk 0.1.0
Python SDK for Cognify AI platform1 version - Latest release: 5 months ago - 22 downloads last month - 1 maintainer
docling 2.59.0
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...134 versions - Latest release: 16 days ago - 1.52 million downloads last month - 78 stars on GitHub - 1 maintainer
docling-enhanced 2.32.0
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...1 version - Latest release: 6 months ago - 34 downloads last month - 42,147 stars on GitHub - 1 maintainer
unoserver-fork 1.3.1
A server for file conversions with Libre Office. With function to update index2 versions - Latest release: about 3 years ago - 9 downloads last month - 0 stars on GitHub - 1 maintainer
docling-sdg 0.4.0
Docling for Synthetic Data Generation (SDG) provides a set of tools to create artificial data fro...6 versions - Latest release: 3 months ago - 183 downloads last month - 31 stars on GitHub - 1 maintainer
smartchunkllm 0.1.7
Advanced Legal Document Semantic Chunking System8 versions - Latest release: 3 months ago - 39 downloads last month - 1 maintainer
doxx 0.9.4
A Simple, Flexible Text Templating, Build, & Project Distribution System5 versions - Latest release: over 10 years ago - 2 dependent repositories - 50 downloads last month - 1 maintainer
ffconverter 2.4.6
File Format Converter with Qt GUI22 versions - Latest release: about 1 year ago - 196 downloads last month - 5 stars on GitHub - 1 maintainer
pluggedinkit 1.0.1
Official Python SDK for Plugged.in Library API2 versions - Latest release: about 2 months ago - 33 downloads last month - 0 stars on GitHub - 1 maintainer
dict-curation 0.0.3
A package for curating dictionaries (esp in babylon and stardict formats).2 versions - Latest release: almost 5 years ago - 1 dependent repositories - 16 downloads last month - 1 maintainer
sleepyconvert 1.0.5
Converts data files, images and documents to different formats9 versions - Latest release: 7 months ago - 35 downloads last month - 0 stars on GitHub - 1 maintainer
contextf 0.0.1
Efficient context builder1 version - Latest release: 18 days ago - 0 stars on GitHub - 1 maintainer
Top 8.1% on pypi.org
12 versions - Latest release: over 1 year ago - 1 dependent repositories - 76 downloads last month - 845 stars on GitHub - 1 maintainer
stencila 2.0.0b2
Python SDK for Stencila12 versions - Latest release: over 1 year ago - 1 dependent repositories - 76 downloads last month - 845 stars on GitHub - 1 maintainer
stencila_types 2.0.0b2
Python types for Stencila7 versions - Latest release: over 1 year ago - 53 downloads last month - 845 stars on GitHub - 1 maintainer
stencila_plugin 2.0.0b3
Library for building Stencila Plugins12 versions - Latest release: over 1 year ago - 57 downloads last month - 845 stars on GitHub - 1 maintainer
ashtadhyayi-data 0.0.3 💰
A package for curating doc file collections, with ability to sync with youtube and archive.org do...2 versions - Latest release: about 3 years ago - 20 downloads last month - 0 stars on GitHub - 1 maintainer
papermerge-core 2.1.5
Open source document management system for digital archives77 versions - Latest release: almost 3 years ago - 4 dependent repositories - 280 downloads last month - 388 stars on GitHub - 1 maintainer
mimeogram 1.5
Exchange of file collections with LLMs.19 versions - Latest release: 4 months ago - 69 downloads last month - 4 stars on GitHub - 1 maintainer
svglibwheel 0.1
A pure-Python library for reading and converting SVG1 version - Latest release: over 1 year ago - 9 downloads last month - 347 stars on GitHub - 1 maintainer
stencila-pyla 0.3.1
Python interpreter for executable documents9 versions - Latest release: almost 5 years ago - 1 dependent repositories - 28 downloads last month - 2 stars on GitHub - 1 maintainer
docowling 1.0.17
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...17 versions - Latest release: 10 months ago - 73 downloads last month - 3 stars on GitHub - 1 maintainer
wdoc 4.1.1
A perfect AI powered RAG for document query and summary. Supports ~all LLM and ~all filetypes (ur...112 versions - Latest release: 19 days ago - 1.11 thousand downloads last month - 46 stars on GitHub - 1 maintainer
similar-documents 0.1.4
Generate similarity scores for documents from cli6 versions - Latest release: over 4 years ago - 1 dependent repositories - 22 downloads last month - 8 stars on GitHub - 1 maintainer
nr-documents-records-model-builder 1.0.0
OARepo model builder extension for NR document records2 versions - Latest release: about 3 years ago - 15 downloads last month - 1 maintainer
libreserver 0.1.1
A server for file conversions with Libre Office2 versions - Latest release: almost 2 years ago - 106 downloads last month - 0 stars on GitHub - 1 maintainer
doc-curation 0.1.20 💰
A package for curating doc file collections, with ability to sync with youtube and archive.org do...28 versions - Latest release: almost 2 years ago - 1 dependent package - 2 dependent repositories - 967 downloads last month - 7 stars on GitHub - 1 maintainer
doctoolsllm 0.99.0
(Now winston_doc) A perfect AI powered RAG for document query and summary. Supports ~all LLM and ...21 versions - Latest release: over 1 year ago - 128 downloads last month - 480 stars on GitHub - 1 maintainer
unstructured-expanded 0.17.2
Expansion to the unstructured package, adding support for image extraction.10 versions - Latest release: 6 months ago - 147 downloads last month - 0 stars on GitHub - 1 maintainer
redacted-py 1.0.8
Redacting classified documents4 versions - Latest release: 10 months ago - 32 downloads last month - 3 stars on GitHub - 1 maintainer
cdpdumpingutils 0.2.1
Download all your courses filles from cahier de prepa2 versions - Latest release: over 2 years ago - 24 downloads last month - 18 stars on GitHub - 1 maintainer
docdump 1.0.4
A package to extract text from common document types.5 versions - Latest release: almost 5 years ago - 1 dependent repositories - 45 downloads last month - 0 stars on GitHub - 1 maintainer
tei-chunker 0.1.0
Hierarchical document chunking for TEI XML documents1 version - Latest release: 9 months ago - 7 downloads last month - 1 maintainer
classify-bills 1.0.1
Automatically sort and archive PDF bills and statements2 versions - Latest release: over 6 years ago - 1 dependent repositories - 10 downloads last month - 3 stars on GitHub - 1 maintainer
docuver 0.1.0
A meta tool for version control of Office documents (docx, xlsx, pptx, odt, ods, odp)1 version - Latest release: 3 months ago - 15 downloads last month
extended-docling 2.12.1
SDK and CLI for parsing PDF, DOCX, HTML, and more, to a unified document representation for power...1 version - Latest release: 11 months ago - 22 downloads last month - 42,147 stars on GitHub - 1 maintainer
docx-footer-extractor 1.0.0
A Python library for extracting metadata from DOCX file footers using parallel processing1 version - Latest release: 4 months ago - 83 downloads last month - 0 stars on GitHub - 1 maintainer
quickdocs 1.6.3
Creates HTML docs from a project's readme and sphinx-apidoc.10 versions - Latest release: over 4 years ago - 1 dependent package - 1 dependent repositories - 38 downloads last month - 1 stars on GitHub - 1 maintainer
dedoc 2.3.2
Extract content and logical tree structure from textual documents32 versions - Latest release: 11 months ago - 1.3 thousand downloads last month - 592 stars on GitHub - 1 maintainer
jsonpath-ig 1.5.3
A final implementation of JSONPath for Python that aims to be standard compliant, including arith...2 versions - Latest release: over 4 years ago - 1 dependent repositories - 20 downloads last month - 589 stars on GitHub - 1 maintainer
jsonpath-ng-i 1.0.3
A final implementation of JSONPath for Python that aims to be standard compliant, including arith...4 versions - Latest release: over 4 years ago - 1 dependent repositories - 25 downloads last month - 590 stars on GitHub - 1 maintainer
nr-documents-records 1.0.0
NR documents data model2 versions - Latest release: about 3 years ago - 14 downloads last month - 1 maintainer
docoskin 0.1.0
"Onion-skin" visual differences between a reference document image and a scanned copy1 version - Latest release: almost 6 years ago - 1 dependent repositories - 14 downloads last month - 2 stars on GitHub - 1 maintainer
blazingdocs 1.0.1
BlazingDocs Python client1 version - Latest release: about 4 years ago - 1 dependent repositories - 24 downloads last month - 2 stars on GitHub - 1 maintainer
vecsync 0.7.0
A simple command-line utility for synchronizing documents to vector storage for LLM interaction.8 versions - Latest release: 4 months ago - 41 downloads last month - 0 stars on GitHub - 1 maintainer
oldp 0.8.0
Open Legal Data Platform2 versions - Latest release: over 6 years ago - 1 dependent repositories - 16 downloads last month - 121 stars on GitHub - 1 maintainer
egnyte-langchain-connector 0.0.4
LangChain retriever integration for Egnyte document search and retrieval4 versions - Latest release: 24 days ago - 269 downloads last month - 1 maintainer
edocuments 1.1.0
eDocuments - a simple and productive personal documents library10 versions - Latest release: over 7 years ago - 2 dependent repositories - 58 downloads last month - 2 stars on GitHub - 1 maintainer
zen_document_parser 0.11
A library for parsing various government documents as well as general PDFs1 version - Latest release: over 9 years ago - 2 dependent repositories - 10 downloads last month - 2 stars on GitHub - 1 maintainer
Top 6.3% on pypi.org
5 versions - Latest release: almost 3 years ago - 1 dependent package - 3 dependent repositories - 8.27 thousand downloads last month - 111 stars on GitHub - 1 maintainer
boxdetect 1.0.2 💰
boxdetect is a Python package based on OpenCV which allows you to easily detect rectangular shape...5 versions - Latest release: almost 3 years ago - 1 dependent package - 3 dependent repositories - 8.27 thousand downloads last month - 111 stars on GitHub - 1 maintainer
invenio-documents 0.1.0.post1
Invenio module that adds filesystem abstraction.3 versions - Latest release: about 2 years ago - 1 dependent package - 2 dependent repositories - 18 downloads last month - 1 stars on GitHub - 4 maintainers
draftable-compare-api 1.4.3
Client library for the Draftable document comparison API20 versions - Latest release: 3 months ago - 2 dependent repositories - 2.95 thousand downloads last month - 9 stars on GitHub - 2 maintainers
smog 0.0.4 removed
simple media organizer4 versions - Latest release: about 3 years ago - 19 downloads last month - 1 stars on GitHub - 1 maintainer
marshmallow-br 0.1.1
An unofficial extension to Marshmallow fields and validators for Brazilian documents3 versions - Latest release: almost 3 years ago - 20 downloads last month - 0 stars on GitHub - 1 maintainer
mongoengine_fuel 1.0.3
Factory for MongoDB documents created with mongoengine4 versions - Latest release: about 2 years ago - 2 dependent repositories - 4 downloads last month - 25 stars on GitHub - 1 maintainer
srsparser 1.4.9
A library that translates semi-structured documents into a structured form and contains natural l...64 versions - Latest release: over 3 years ago - 1 dependent repositories - 73 downloads last month - 2 stars on GitHub - 1 maintainer
rustpy-toolkit 0.1.1
High-performance Polars expressions for Brazilian document validation and text processing2 versions - Latest release: 5 months ago - 32 downloads last month - 1 maintainer
ragdex 0.2.3
RAG-powered document indexing and search for MCP (Model Context Protocol)8 versions - Latest release: 24 days ago - 804 downloads last month - 0 stars on GitHub
iddt 0.1.13
Internet Document Discovery Tool12 versions - Latest release: almost 10 years ago - 2 dependent repositories - 15 downloads last month - 0 stars on GitHub - 1 maintainer
dotloop 1.2.0
Python wrapper for Dotloop API - Real estate transaction management and document handling4 versions - Latest release: 3 months ago - 113 downloads last month - 1 maintainer
kindlepy 1.0.0
CLI tool for mailing your documents to your kindle device.1 version - Latest release: almost 9 years ago - 1 dependent repositories - 7 downloads last month - 2 stars on GitHub - 1 maintainer
epaper 0.0.0
A simple and productive personal documents library1 version - Latest release: about 2 years ago - 2 dependent repositories - 2 stars on GitHub - 1 maintainer
word2html 0.3.0
A quick and dirty script to convert a Word (docx) document to html.5 versions - Latest release: over 4 years ago - 1 dependent repositories - 27 downloads last month - 53 stars on GitHub - 1 maintainer
bundestag-api 1.3.0
Python wrapper for the official Bundestag-API10 versions - Latest release: 26 days ago - 82 downloads last month - 3 stars on GitHub - 1 maintainer
Related Keywords
pdf
28
python
23
docx
21
ai
20
html
16
document
15
llm
14
rag
12
api
11
markdown
10
search
10
nlp
10
convert
9
tables
9
pptx
7
sdk
7
openai
7
cli
6
conversion
6
unoconv
6
xlsx
6
ocr
6
pdf-to-json
6
pdf-converter
6
document-parsing
6
document-parser
6
pdf-to-text
5
table former
5
langchain
5
table structure
5
doc
5
data
5
question-answering
5
segmentation
5
layout model
5
json
5
text
5
images
5
docling
5
machine learning
4
template
4
libreoffice
4
executable
4
office
4
uno
4
natural language processing
4
library
4
files
4
chunking
4
converter
4
embeddings
4
AI
4
command-line
3
xml
3
scanned-documents
3
legal
3
artificial intelligence
3
OCR
3
SDK
3
interactive
3
python3
3
reproducible
3
books
3
programmable
3
internet-archive
3
LLM
3
opencv
3
scan
3
filter
3
jsonpath
3
odt
3
path
3
query
3
xpath
3
validation
3
PDF
3
extraction
3
parser
3
word
3
unstructured
3
compare
2
web scraping
2
web scraping library
2
web scraping tool
2
webscraping
2
automated-scraper
2
open-data
2
scrapingweb
2
scraping-python
2
sc
2
gpt-3
2
gpt-4
2
llama3
2
machine-learning
2
Invenio
2
unstructured data
2
structured data
2
scraping
2
scrapegraphai
2
scrapegraph
2