An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "document-processing" keyword

pdf-oxide 0.3.17 💰
The fastest Python PDF library: 0.8ms mean, 5× faster than PyMuPDF. Text extraction, markdown con...
26 versions - Latest release: about 24 hours ago - 3.76 thousand downloads last month - 154 stars on GitHub - 1 maintainer
optical-context-mcp 0.1.4
MCP server that compresses OCR-heavy PDFs into dense packed images for AI agent workflows.
4 versions - Latest release: 2 days ago - 167 downloads last month - 0 stars on GitHub - 1 maintainer
pdf-mcp 1.3.0
Production-ready MCP server for PDF processing with intelligent caching. Extract text, search, an...
6 versions - Latest release: 1 day ago - 681 downloads last month - 1 maintainer
spanish-pdf-parser 0.1.0
A Python package for processing PDFs with header and footer detection
1 version - Latest release: about 1 year ago - 9 downloads last month - 1 maintainer
rossum-agent-client 1.1.0
Python client for Rossum Agent API - AI-powered document processing assistant
3 versions - Latest release: about 1 month ago - 73 downloads last month - 10 stars on GitHub - 1 maintainer
rossum-agent 1.3.6
AI agent toolkit for Rossum: document workflows conversationally, debug pipelines automatically, ...
22 versions - Latest release: 8 days ago - 1.53 thousand downloads last month - 10 stars on GitHub - 1 maintainer
quanta-pdf 1.0.5
Advanced PDF layout analysis engine for extracting figures, tables, and structured content
5 versions - Latest release: 3 months ago - 73 downloads last month - 2 stars on GitHub - 1 maintainer
caas-core 0.3.0
A pure, logic-only library for routing context, handling RAG fallacies, and managing context wind...
2 versions - Latest release: about 1 month ago - 50 downloads last month - 1 maintainer
chunking-strategy 0.4.1
A comprehensive chunking library for text, documents, audio, video, and data streams (Linux and m...
4 versions - Latest release: 5 months ago - 85 downloads last month - 2 stars on GitHub - 1 maintainer
contextifier 0.2.2
Convert raw documents into AI-understandable context with intelligent text extraction, table dete...
9 versions - Latest release: about 2 months ago - 305 downloads last month - 2 stars on GitHub - 1 maintainer
kreuzberg 4.4.2
High-performance document intelligence library for Python. Extract text, metadata, and structured...
131 versions - Latest release: 6 days ago - 81.5 thousand downloads last month - 6,130 stars on GitHub - 1 maintainer
docs2db 0.4.3
Repository of docling documents for RAG
8 versions - Latest release: about 2 months ago - 186 downloads last month - 1 stars on GitHub - 1 maintainer
extract-monster 0.1.0
Python SDK for Extract Monster - Extract structured data from files and text using AI
1 version - Latest release: 5 months ago - 15 downloads last month - 1 maintainer
docforge 0.1.0
Forge perfect documents from any format with precision, power, and simplicity
1 version - Latest release: 8 months ago - 24 downloads last month - 0 stars on GitHub - 1 maintainer
deepseek-ocr 0.3.0
A simple and efficient Python SDK for DeepSeek-OCR API
4 versions - Latest release: 3 months ago - 666 downloads last month - 2 stars on GitHub - 1 maintainer
kallia 0.1.6
Semantic Document Processing Library
7 versions - Latest release: 3 months ago - 47 downloads last month - 0 stars on GitHub - 1 maintainer
Top 7.3% on pypi.org
pdf-structify 0.1.18
Extract structured data from PDFs using LLMs with sklearn-like API
19 versions - Latest release: about 2 months ago - 433 downloads last month - 1 maintainer
docuglean 1.1.0
An SDK for intelligent document processing using SOTA VLLM models
1 version - Latest release: 4 months ago - 27 downloads last month - 1 maintainer
parselabs 0.1.2
Extract structured lab test results from medical documents with AI precision
1 version - Latest release: 9 days ago - 86 downloads last month - 1 maintainer
parsemedicalexams 0.1.3
Extract and summarize medical exam reports (X-rays, MRIs, ultrasounds, etc.) with AI precision
1 version - Latest release: 9 days ago - 85 downloads last month - 1 maintainer
docdigitizer 0.1.0
Official Python SDK for the DocDigitizer document processing API
1 version - Latest release: 9 days ago - 1 maintainer
pdf-splitter-cli 0.1.3
A modern command-line tool to split PDF files into smaller chunks with progress bars and automati...
4 versions - Latest release: 8 months ago - 42 downloads last month - 1 maintainer
ragger-python-sdk 0.1.3
Python SDK for ragger.ai RAG API
4 versions - Latest release: 5 months ago - 46 downloads last month - 1 maintainer
multi-ocr-sdk 0.6.1
A simple and efficient Python SDK for multi OCR API, such as deepseek OCR, VLM(qwenvl).
6 versions - Latest release: 12 days ago - 320 downloads last month - 8 stars on GitHub - 1 maintainer
smartloop 1.3.3
Smartloop Command Line interface to process documents using LLM
33 versions - Latest release: 4 months ago - 118 downloads last month - 2 stars on GitHub - 2 maintainers
kiss-ai-stack-core 0.1.0
KISS AI Stack's RAG builder core
26 versions - Latest release: over 1 year ago - 31 downloads last month - 1 stars on GitHub - 1 maintainer
fitz-ai 0.10.1
Intelligent, honest knowledge retrieval in 5 minutes. No infrastructure. No boilerplate.
19 versions - Latest release: 10 days ago - 908 downloads last month - 7 stars on GitHub - 1 maintainer
byteit 0.1.2
AI-powered document intelligence platform - Turn your data into structured data with a single lin...
3 versions - Latest release: about 1 month ago - 273 downloads last month - 1 maintainer
qdrant-loader-mcp-server 0.7.6
A Model Context Protocol (MCP) server that provides RAG capabilities to Cursor using Qdrant.
24 versions - Latest release: about 2 months ago - 378 downloads last month - 30 stars on GitHub - 1 maintainer
qdrant-loader-core 0.7.6
Shared core for provider-agnostic LLM support and configuration mapping for qdrant-loader ecosystem
5 versions - Latest release: about 2 months ago - 367 downloads last month - 20 stars on GitHub - 1 maintainer
qdrant-loader 0.7.6
A tool for collecting and vectorizing technical content from multiple sources and storing it in a...
31 versions - Latest release: about 2 months ago - 406 downloads last month - 20 stars on GitHub - 1 maintainer
julee 0.1.9
Julee - Clean architecture for accountable and transparent digital supply chains
10 versions - Latest release: 14 days ago - 305 downloads last month - 0 stars on GitHub - 1 maintainer
ocrxdoc 1.0.0
Python Framework for OCR using Qwen3-VL Models
1 version - Latest release: 4 months ago - 22 downloads last month - 0 stars on GitHub - 1 maintainer
docling-enhanced-onnx 1.0.0
Enhanced Docling Models with ONNX Auto-Detection and Air-Gapped Support
1 version - Latest release: 6 months ago - 45 downloads last month - 0 stars on GitHub - 1 maintainer
freecrawl-mcp 0.1.2
FreeCrawl MCP Server - Self-hosted web scraping and document processing as a Firecrawl replacement
3 versions - Latest release: 7 months ago - 55 downloads last month - 1 stars on GitHub - 1 maintainer
pdflinkcheck 1.3.44
A purpose-built PDF link analysis and reporting tool with GUI and CLI.
157 versions - Latest release: about 1 month ago - 3.96 thousand downloads last month - 1 stars on GitHub - 1 maintainer
kiss-ai-stack-server 0.1.0a17
KISS AI Stack's Server stub - Simplify AI Agent Development
18 versions - Latest release: about 1 year ago - 70 downloads last month - 1 stars on GitHub - 1 maintainer
docling-analysis-framework 2.0.0
AI-ready analysis framework for PDF and Office documents using Docling for content extraction - p...
4 versions - Latest release: 4 months ago - 38 downloads last month - 0 stars on GitHub - 1 maintainer
atai-gemma3-tool 0.0.3
CLI tool for generating text from images using the Gemma 3 model.
3 versions - Latest release: 12 months ago - 87 downloads last month - 0 stars on GitHub - 1 maintainer
contextgem 0.21.0
Effortless LLM extraction from documents
44 versions - Latest release: 15 days ago - 2.02 thousand downloads last month - 1,697 stars on GitHub - 1 maintainer
kita 1.1.0
Official Python SDK for Kita Document Processing API
3 versions - Latest release: about 2 months ago - 177 downloads last month - 0 stars on GitHub - 1 maintainer
purrfectkit 0.2.6
**PurrfectKit** is a Python library for effortless Retrieval-Augmented Generation (RAG) workflows.
6 versions - Latest release: 2 months ago - 295 downloads last month - 1 stars on GitHub
many-ocr-sdk 0.4.0
A simple and efficient Python SDK for DeepSeek-OCR API
1 version - Latest release: 3 months ago - 16 downloads last month - 2 stars on GitHub - 1 maintainer
raggy 0.3.5 💰
scraping stuff
25 versions - Latest release: 7 months ago - 1 dependent package - 9.05 thousand downloads last month - 24 stars on GitHub - 1 maintainer
ai-chunking 0.1.9
A powerful Python library for semantic document chunking and enrichment using AI
8 versions - Latest release: 12 months ago - 291 downloads last month - 120 stars on GitHub - 1 maintainer
docx-mcp 0.1.8
DOCX MCP处理器 - 完整的Word文档处理工具,支持图片编辑和表格操作
9 versions - Latest release: 4 months ago - 1.07 thousand downloads last month - 12 stars on GitHub - 1 maintainer
doc-extraction 2.5.0
Multi-format document extraction library for EPUB, PDF, HTML, Markdown, and JSON documents
2 versions - Latest release: about 1 month ago - 148 downloads last month - 1 maintainer
smart-llm-loader 0.1.0
A powerful PDF processing toolkit that seamlessly integrates with LLMs for intelligent document c...
1 version - Latest release: about 1 year ago - 22 downloads last month - 66 stars on GitHub - 1 maintainer
eless 1.0.3
Evolving Low-resource Embedding and Storage System - A resilient RAG data processing pipeline wit...
4 versions - Latest release: 4 months ago - 41 downloads last month - 1 maintainer
magicconvert 0.1.3
MagicConvert is a Python library that converts various document formats (PDF, DOCX, XLSX, PPTX, H...
3 versions - Latest release: 10 months ago - 15 downloads last month - 2 stars on GitHub - 1 maintainer
pdfrest 1.0.1
Python client library for interacting with the PDFRest API
2 versions - Latest release: 17 days ago - 150 downloads last month - 1 maintainer
deeplightrag 1.0.22
DeepLightRAG: High-performance Document Indexing and Retrieval System (use with any LLM)
20 versions - Latest release: 3 months ago - 285 downloads last month - 0 stars on GitHub - 1 maintainer
mac-letterhead 0.14.0
A macOS utility to merge letterhead with PDF documents using a drag-and-drop interface
100 versions - Latest release: 5 months ago - 704 downloads last month - 0 stars on GitHub - 1 maintainer
kiss-ai-stack-types 0.1.0a4
KISS AI Stack's common object types
4 versions - Latest release: about 1 year ago - 21 downloads last month - 0 stars on GitHub - 1 maintainer
arag 0.1.0
A CLI tool for creating, managing, and querying .arag files for RAG applications
1 version - Latest release: about 1 year ago - 37 downloads last month - 1 stars on GitHub - 1 maintainer
documiner 0.8.2
Advanced tool designed for text analysis and data mining in documents
1 version - Latest release: 8 months ago - 1 maintainer
document-data-extractor 1.0.4
Best open-source document to markdown extractor for LLM training data. Convert PDF, Word, PowerPo...
5 versions - Latest release: 7 months ago - 68 downloads last month - 3 stars on GitHub - 1 maintainer
stache-ai-documents 0.1.0
Document format loaders for Stache AI (EPUB, DOCX, PPTX)
1 version - Latest release: 2 months ago - 27 downloads last month - 1 maintainer
mcp-gosling 0.1.0
MCP Gosling - Advanced document processing server for Goose AI using IBM's Docling library
1 version - Latest release: 7 months ago - 6 downloads last month - 1 maintainer
wizarddocx 1.0.0
Text extraction from Microsoft Word files. Parses Word documents natively and can optionally run ...
1 version - Latest release: 6 months ago - 25 downloads last month - 1 stars on GitHub - 1 maintainer
pdf-ocr-processor 2.0.3
Advanced PDF OCR processing with AI-powered text extraction and selectable text overlays
1 version - Latest release: 8 months ago - 17 downloads last month - 1 maintainer
saara-ai 1.6.4
🧠 SAARA - Autonomous Document-to-LLM Data Engine with Pre-training, Cloud Runtime & AI Tokenizer
26 versions - Latest release: about 1 month ago - 680 downloads last month - 1 maintainer
smart-ocr 0.2.7
Multi-engine document OCR with cascading fallback
11 versions - Latest release: about 1 month ago - 762 downloads last month - 1 maintainer
powerrag-sdk 0.3.0
A Python SDK for PowerRAG API, providing easy-to-use interfaces for knowledge base management, do...
2 versions - Latest release: about 2 months ago - 61 downloads last month
doclayer-cli 1.2.1
Doclayer Command-Line Interface - Document Intelligence Platform CLI
7 versions - Latest release: 3 months ago - 106 downloads last month - 1 maintainer
docling-onnx-models 0.1.3
ONNX Runtime implementations for Docling AI models
3 versions - Latest release: 6 months ago - 126 downloads last month - 0 stars on GitHub - 1 maintainer
pdfmcp-tools 0.1.1
MCP server for comprehensive PDF processing with 18 specialized tools
2 versions - Latest release: 6 months ago - 11 downloads last month - 1 stars on GitHub - 1 maintainer
mseep-kreuzberg 3.13.5
Document intelligence framework for Python - Extract text, metadata, and structured data from div...
4 versions - Latest release: 6 months ago - 43 downloads last month - 2,454 stars on GitHub - 1 maintainer
unifydoc 0.1.0
Unified document processing with AI-powered OCR
1 version - Latest release: 9 months ago
docuglean-ocr 1.0.0
An SDK for intelligent document processing using SOTA VLLM models
1 version - Latest release: 6 months ago - 19 downloads last month - 1 stars on GitHub - 1 maintainer
kiss-ai-stack-client 0.1.0a2
KISS AI Stack's Python Client SDK - Simplify AI Agent Development
3 versions - Latest release: about 1 year ago - 18 downloads last month - 1 stars on GitHub - 1 maintainer
llm_etl_pipeline 0.1.0
LLM extraction from documents
1 version - Latest release: 9 months ago - 6 downloads last month - 1 stars on GitHub - 1 maintainer
deepseek-ocr-cli 0.3.2
CLI tool for OCR using DeepSeek-OCR model via Ollama
9 versions - Latest release: about 2 months ago - 258 downloads last month - 5 stars on GitHub - 1 maintainer
gemini-ocr-cli 0.2.1
CLI tool for OCR processing using Google Gemini's vision capabilities
2 versions - Latest release: 2 months ago - 42 downloads last month - 1 maintainer
atai-ebook-tool 0.0.6
A command-line tool for parsing ebooks (such as EPUB and MOBI) and converting them into a structu...
5 versions - Latest release: 12 months ago - 129 downloads last month - 0 stars on GitHub - 1 maintainer
rag-prep 0.1.4
A minimal, extensible framework for preparing documents for RAG/LLM workflows
3 versions - Latest release: 4 months ago - 25 downloads last month - 1 maintainer
asset-aware-mcp 0.2.10
Medical RAG with Asset-Aware MCP - Precise PDF asset retrieval (tables, figures, sections) for AI...
8 versions - Latest release: 29 days ago - 1 maintainer
omnidocs 0.2.8
Unified Python toolkit for visual document processing - think Transformers for document AI
11 versions - Latest release: 28 days ago - 807 downloads last month - 48,861 stars on GitHub - 1 maintainer
pyrhubarb-mcp 0.1.3
MCP server for Rhubarb document and video understanding capabilities
3 versions - Latest release: 4 months ago - 21 downloads last month - 97 stars on GitHub - 1 maintainer
bookbridge-mcp 1.0.2
A powerful Model Context Protocol (MCP) server for Chinese-to-English book translation and docume...
3 versions - Latest release: 5 months ago - 61 downloads last month - 0 stars on GitHub - 1 maintainer
sharepoint-to-text 0.9.0
Text extraction library for typical file formats found in SharePoint repositories
11 versions - Latest release: about 1 month ago - 2.67 thousand downloads last month - 0 stars on GitHub - 1 maintainer
unstructured-ingest-clickzetta 1.3.5
ClickZetta connector for Unstructured data pipeline - Enhanced ETL with SQL and Volume support
8 versions - Latest release: 6 months ago - 41 downloads last month - 0 stars on GitHub - 1 maintainer
unsiloed-sdk 0.1.4
Python SDK for Unsiloed Vision API - Parse, Extract, Classify, and Split documents
5 versions - Latest release: about 2 months ago - 209 downloads last month - 1 maintainer
chunkana 0.1.6
Intelligent Markdown chunking library for RAG systems
7 versions - Latest release: about 2 months ago - 914 downloads last month - 1 maintainer
toprint 0.1.32
2print/toprint: Python library for printing and converting between HTML, PDF, ZPL, and image form...
6 versions - Latest release: 9 months ago - 50 downloads last month - 0 stars on GitHub - 1 maintainer
mseep-pptagent 0.2.15
PPTAgent, a tool for utilizing LLMs to generate PowerPoint presentations from documents.
1 version - Latest release: 3 months ago - 33 downloads last month - 3,340 stars on GitHub - 1 maintainer
dddocr-py 0.1.0
Python client for the 3DOCR.com OCR API
1 version - Latest release: 6 months ago - 22 downloads last month - 1 maintainer
pyrhubarb 0.0.7
A Python framework for multi-modal document understanding with generative AI
7 versions - Latest release: 9 months ago - 43.5 thousand downloads last month - 98 stars on GitHub - 1 maintainer
doc2mark 0.4.3
Unified document processing with AI-powered OCR
20 versions - Latest release: 3 months ago - 636 downloads last month - 39 stars on GitHub - 1 maintainer
stache-ai-ocr 0.1.2
OCR support for Stache AI document loaders
3 versions - Latest release: about 2 months ago - 98 downloads last month - 1 maintainer
llm-data-converter 2.2.0
Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPo...
23 versions - Latest release: 8 months ago - 133 downloads last month - 5 stars on GitHub - 1 maintainer
shrutiai 1.0.1
Python SDK for interacting with the shrutiAI API - your AI-powered assistant
2 versions - Latest release: 6 months ago - 19 downloads last month - 1 maintainer
deepcompress 1.4.5
Production-ready document compression library reducing LLM costs by 96% with DeepSeek-OCR integra...
39 versions - Latest release: 4 months ago - 324 downloads last month - 1 maintainer
iflow-mcp_docx-mcp 0.1.6
DOCX MCP处理器 - 完整的Word文档处理工具,支持图片编辑和表格操作
1 version - Latest release: 4 months ago - 57 downloads last month - 15 stars on GitHub - 1 maintainer
signlib 1.0.2
Automated signature placement for synthetic data generation - designed for creating ML training d...
2 versions - Latest release: 21 days ago - 188 downloads last month - 1 maintainer
science-ocr 0.3.0
Extract clean, structured text from scientific papers in PDF format
3 versions - Latest release: about 2 months ago - 57 downloads last month - 1 maintainer
markdrop 3.5.0
A comprehensive PDF processing toolkit that converts PDFs to markdown with advanced AI-powered fe...
20 versions - Latest release: 8 months ago - 409 downloads last month - 116 stars on GitHub - 2 maintainers
aimq 0.1.2
A robust message queue processor for Supabase pgmq with AI-powered document processing capabilities
3 versions - Latest release: 5 months ago - 112 downloads last month - 1 stars on GitHub - 1 maintainer
document-ai-toolkit 0.1.0
Comprehensive document processing toolkit for AI/ML applications
1 version - Latest release: 2 months ago - 36 downloads last month - 1 maintainer
flockparser 1.0.9
Distributed document RAG system with intelligent GPU/CPU orchestration
8 versions - Latest release: 4 months ago - 96 downloads last month - 3 stars on GitHub - 1 maintainer
Related Keywords
pdf 72 ocr 69 ai 66 llm 56 rag 53 nlp 33 text-extraction 33 machine-learning 28 chunking 19 markdown 18 mcp 18 semantic-search 17 python 16 openai 15 docx 15 embeddings 14 api 12 sdk 12 document 12 data-extraction 12 cli 11 vector-database 11 text-processing 10 table-extraction 10 document-extraction 10 image-processing 9 retrieval-augmented-generation 9 pdf-to-markdown 9 langchain 9 document-analysis 9 structured-data 9 document-intelligence 8 artificial-intelligence 8 extraction 8 gemini 8 document-ai 8 file-conversion 7 deepseek 7 docling 7 knowledge-base 7 document-understanding 7 document-parsing 7 pdf-processing 6 image-to-text 6 document-conversion 6 unstructured-data 6 xlsx 6 generative-ai 6 layout-analysis 6 information-extraction 6 agent 6 html 5 pdf-parser 5 word 5 pptx 5 fastmcp 5 tesseract 5 python3 5 pdf-extraction 5 structured-data-extraction 5 ppt 5 html-to-markdown 5 vector-search 5 intelligent-document-processing 5 ai-agent 5 mcp-server 5 vision 4 pdf-parsing 4 document-classification 4 developer-tools 4 powerpoint 4 multilingual 4 chromadb 4 retrieval 4 pdf-to-text 4 ml 4 semantic-analysis 4 model-context-protocol 4 claude 4 entity-extraction 4 llms 4 cli-tool 3 multi-project 3 confluence-integration 3 layout-detection 3 insights-extraction 3 unstructured-alternative 3 qdrant 3 local-document-processing 3 presentation 3 docling-alternative 3 marker-alternative 3 document-to-markdown 3 tesseract-alternative 3 paddleocr-alternative 3 mineru-alternative 3 gen-ai 3 boilerplate-application 3 ai-agents-framework 3 markitdown-alternative 3