pypi.org "document-processing" keyword
docflow-sdk 1.0.0
Docflow Python SDK1 version - Latest release: 1 day ago - 95 downloads last month - 1 maintainer
parselabs 0.1.2
Extract structured lab test results from medical documents with AI precision1 version - Latest release: about 1 month ago - 33 downloads last month - 1 maintainer
Top 7.3% on pypi.org
19 versions - Latest release: 2 months ago - 248 downloads last month - 0 stars on GitHub - 1 maintainer
pdf-structify 0.1.18
Extract structured data from PDFs using LLMs with sklearn-like API19 versions - Latest release: 2 months ago - 248 downloads last month - 0 stars on GitHub - 1 maintainer
docdigitizer 0.2.0
Official Python SDK for the DocDigitizer document processing API2 versions - Latest release: about 1 month ago - 275 downloads last month - 1 maintainer
docuglean 1.1.0
An SDK for intelligent document processing using SOTA VLLM models1 version - Latest release: 5 months ago - 13 downloads last month - 1 maintainer
parsemedicalexams 0.1.3
Extract and summarize medical exam reports (X-rays, MRIs, ultrasounds, etc.) with AI precision1 version - Latest release: about 1 month ago - 35 downloads last month - 1 maintainer
kreuzberg 4.6.3
High-performance document intelligence library for Python. Extract text, metadata, and structured...143 versions - Latest release: 7 days ago - 119 thousand downloads last month - 6,130 stars on GitHub - 1 maintainer
pyrhubarb 0.0.8
A Python framework for multi-modal document understanding with generative AI8 versions - Latest release: 11 days ago - 43.5 thousand downloads last month - 98 stars on GitHub - 1 maintainer
pyrhubarb-mcp 0.1.3
MCP server for Rhubarb document and video understanding capabilities3 versions - Latest release: 5 months ago - 22 downloads last month - 97 stars on GitHub - 1 maintainer
pptagent 1.1.32
An Agentic Framework for Reflective PowerPoint Generation50 versions - Latest release: 4 days ago - 6.85 thousand downloads last month - 3,737 stars on GitHub - 1 maintainer
mseep-pptagent 0.2.15
PPTAgent, a tool for utilizing LLMs to generate PowerPoint presentations from documents.1 version - Latest release: 4 months ago - 33 downloads last month - 3,340 stars on GitHub - 1 maintainer
iflow-mcp_icip-cas-pptagent 0.2.22
PPTAgent, a tool for utilizing LLMs to generate PowerPoint presentations from documents.5 versions - Latest release: about 2 months ago - 1 maintainer
pdf-splitter-cli 0.1.3
A modern command-line tool to split PDF files into smaller chunks with progress bars and automati...4 versions - Latest release: 9 months ago - 22 downloads last month - 1 maintainer
bank-statement-separator 2.0.0 💰
AI-powered tool for separating multi-statement PDF files using LangChain and LangGraph6 versions - Latest release: 7 months ago - 34 downloads last month - 0 stars on GitHub - 1 maintainer
rossum-agent 1.8.3
AI agent toolkit for Rossum: document workflows conversationally, debug pipelines automatically, ...43 versions - Latest release: 3 days ago - 2.58 thousand downloads last month - 10 stars on GitHub - 1 maintainer
rossum-agent-client 1.1.0
Python client for Rossum Agent API - AI-powered document processing assistant3 versions - Latest release: 2 months ago - 48 downloads last month - 10 stars on GitHub - 1 maintainer
leapocr 2.0.1
Official Python SDK for LeapOCR - Transform documents into structured data using AI-powered OCR6 versions - Latest release: 12 days ago - 231 downloads last month - 1 maintainer
liteparse 1.2.1
Python wrapper for LiteParse - fast, lightweight PDF and document parsing5 versions - Latest release: 6 days ago - 7.02 thousand downloads last month - 2,392 stars on GitHub - 1 maintainer
qdrant-loader 0.9.0
A tool for collecting and vectorizing technical content from multiple sources and storing it in a...34 versions - Latest release: 7 days ago - 1.25 thousand downloads last month - 20 stars on GitHub - 1 maintainer
qdrant-loader-mcp-server 0.9.0
A Model Context Protocol (MCP) server that provides RAG capabilities to Cursor using Qdrant.27 versions - Latest release: 7 days ago - 1.14 thousand downloads last month - 30 stars on GitHub - 1 maintainer
qdrant-loader-core 0.9.0
Shared core for provider-agnostic LLM support and configuration mapping for qdrant-loader ecosystem8 versions - Latest release: 7 days ago - 1.24 thousand downloads last month - 20 stars on GitHub - 1 maintainer
asset-aware-mcp 0.6.3
Medical RAG with Asset-Aware MCP - Precise PDF asset retrieval (tables, figures, sections) for AI...15 versions - Latest release: 11 days ago - 987 downloads last month - 0 stars on GitHub - 1 maintainer
ragger-python-sdk 0.1.3
Python SDK for ragger.ai RAG API4 versions - Latest release: 6 months ago - 46 downloads last month - 1 maintainer
multi-ocr-sdk 0.6.1
A simple and efficient Python SDK for multi OCR API, such as deepseek OCR, VLM(qwenvl).6 versions - Latest release: about 1 month ago - 320 downloads last month - 8 stars on GitHub - 1 maintainer
smartloop 1.3.3
Smartloop Command Line interface to process documents using LLM33 versions - Latest release: 4 months ago - 194 downloads last month - 2 stars on GitHub - 2 maintainers
kiss-ai-stack-core 0.1.0
KISS AI Stack's RAG builder core26 versions - Latest release: over 1 year ago - 115 downloads last month - 1 stars on GitHub - 1 maintainer
docling-analysis-framework 2.0.0
AI-ready analysis framework for PDF and Office documents using Docling for content extraction - p...4 versions - Latest release: 5 months ago - 38 downloads last month - 0 stars on GitHub - 1 maintainer
byteit 1.0.0
AI-powered document intelligence platform - Turn your data into structured data with a single lin...4 versions - Latest release: 9 days ago - 355 downloads last month - 1 maintainer
smart-llm-loader 0.1.0
A powerful PDF processing toolkit that seamlessly integrates with LLMs for intelligent document c...1 version - Latest release: about 1 year ago - 15 downloads last month - 66 stars on GitHub - 1 maintainer
gemini-ocr-cli 0.3.1
CLI tool for OCR processing using Google Gemini's vision capabilities4 versions - Latest release: 20 days ago - 248 downloads last month - 1 maintainer
ocrxdoc 1.0.0
Python Framework for OCR using Qwen3-VL Models1 version - Latest release: 5 months ago - 9 downloads last month - 0 stars on GitHub - 1 maintainer
unifydoc 0.1.0
Unified document processing with AI-powered OCR1 version - Latest release: 10 months ago
smart-ocr 0.2.7
Multi-engine document OCR with cascading fallback11 versions - Latest release: 2 months ago - 211 downloads last month - 1 maintainer
julee 0.1.15
Julee - Clean architecture for accountable and transparent digital supply chains16 versions - Latest release: 9 days ago - 305 downloads last month - 0 stars on GitHub - 1 maintainer
mac-letterhead 0.15.8
A macOS utility to merge letterhead with PDF documents using a drag-and-drop interface112 versions - Latest release: 10 days ago - 1.86 thousand downloads last month - 0 stars on GitHub - 1 maintainer
fitz-ai 0.11.0
Intelligent, honest knowledge retrieval in 5 minutes. No infrastructure. No boilerplate.23 versions - Latest release: 13 days ago - 1.03 thousand downloads last month - 7 stars on GitHub - 1 maintainer
saara-ai 1.6.7
🧠 SAARA - Autonomous Document-to-LLM Data Engine with Pre-training, Cloud Runtime & AI Tokenizer29 versions - Latest release: 13 days ago - 680 downloads last month - 1 maintainer
docling-enhanced-onnx 1.0.0
Enhanced Docling Models with ONNX Auto-Detection and Air-Gapped Support1 version - Latest release: 7 months ago - 93 downloads last month - 0 stars on GitHub - 1 maintainer
pdf-ocr-processor 2.0.3
Advanced PDF OCR processing with AI-powered text extraction and selectable text overlays1 version - Latest release: 9 months ago - 16 downloads last month - 1 maintainer
kita 2.0.0
Official Python SDK for Kita Document Processing API8 versions - Latest release: about 2 months ago - 130 downloads last month - 0 stars on GitHub - 1 maintainer
contextgem 0.22.0
Effortless LLM extraction from documents45 versions - Latest release: 19 days ago - 2.02 thousand downloads last month - 1,697 stars on GitHub - 1 maintainer
eless 1.0.3
Evolving Low-resource Embedding and Storage System - A resilient RAG data processing pipeline wit...4 versions - Latest release: 5 months ago - 27 downloads last month - 1 maintainer
raggy 0.3.5 💰
scraping stuff25 versions - Latest release: 8 months ago - 1 dependent package - 92 downloads last month - 24 stars on GitHub - 1 maintainer
docuglean-ocr 1.0.0
An SDK for intelligent document processing using SOTA VLLM models1 version - Latest release: 7 months ago - 9 downloads last month - 6 stars on GitHub - 1 maintainer
unsiloed-sdk 0.1.7
Python SDK for Unsiloed Vision API - Parse, Extract, Classify, and Split documents8 versions - Latest release: 24 days ago - 645 downloads last month - 1 maintainer
docx-mcp 0.1.8
DOCX MCP处理器 - 完整的Word文档处理工具,支持图片编辑和表格操作9 versions - Latest release: 5 months ago - 3.55 thousand downloads last month - 12 stars on GitHub - 1 maintainer
credeed-pdf-to-markdown 0.1.0
Convert PDF to Markdown using Azure AI Document Intelligence and upload to S3. Provided by the Cr...1 version - Latest release: 11 months ago - 14 downloads last month - 0 stars on GitHub - 1 maintainer
akilan 0.0.3
Adaptive Knowledge Ingestion Layer for Analytics Nodes – PyMuPDF wrapper for AI-ready document pr...3 versions - Latest release: 22 days ago - 1 dependent repositories - 188 downloads last month - 1 maintainer
atai-gemma3-tool 0.0.3
CLI tool for generating text from images using the Gemma 3 model.3 versions - Latest release: about 1 year ago - 27 downloads last month - 0 stars on GitHub - 1 maintainer
pdflinkcheck 1.3.46
A purpose-built PDF link analysis and reporting tool with GUI and CLI.159 versions - Latest release: 23 days ago - 3.96 thousand downloads last month - 1 stars on GitHub - 1 maintainer
freecrawl-mcp 0.1.2
FreeCrawl MCP Server - Self-hosted web scraping and document processing as a Firecrawl replacement3 versions - Latest release: 8 months ago - 55 downloads last month - 1 stars on GitHub - 1 maintainer
kiss-ai-stack-server 0.1.0a17
KISS AI Stack's Server stub - Simplify AI Agent Development18 versions - Latest release: over 1 year ago - 70 downloads last month - 1 stars on GitHub - 1 maintainer
stache-ai-documents 0.1.0
Document format loaders for Stache AI (EPUB, DOCX, PPTX)1 version - Latest release: 3 months ago - 19 downloads last month - 5 stars on GitHub - 1 maintainer
chunkana 0.1.6
Intelligent Markdown chunking library for RAG systems7 versions - Latest release: 3 months ago - 742 downloads last month - 0 stars on GitHub - 1 maintainer
ai-chunking 0.1.9
A powerful Python library for semantic document chunking and enrichment using AI8 versions - Latest release: about 1 year ago - 121 downloads last month - 120 stars on GitHub - 1 maintainer
document-ai-toolkit 0.1.0
Comprehensive document processing toolkit for AI/ML applications1 version - Latest release: 3 months ago - 36 downloads last month - 1 maintainer
pdfrest 1.0.3
Python client library for interacting with the pdfRest API4 versions - Latest release: about 1 month ago - 259 downloads last month - 1 maintainer
documiner 0.8.2
Advanced tool designed for text analysis and data mining in documents1 version - Latest release: 9 months ago - 1 maintainer
purrfectkit 0.2.8
**PurrfectKit** is a Python library for effortless Retrieval-Augmented Generation (RAG) workflows.8 versions - Latest release: about 1 month ago - 117 downloads last month - 1 stars on GitHub - 1 maintainer
kiss-ai-stack-client 0.1.0a2
KISS AI Stack's Python Client SDK - Simplify AI Agent Development3 versions - Latest release: over 1 year ago - 18 downloads last month - 1 stars on GitHub - 1 maintainer
deepseek-ocr-cli 0.4.3
CLI tool for OCR using DeepSeek-OCR model via Ollama13 versions - Latest release: 20 days ago - 258 downloads last month - 5 stars on GitHub - 1 maintainer
kiss-ai-stack-types 0.1.0a4
KISS AI Stack's common object types4 versions - Latest release: over 1 year ago - 25 downloads last month - 0 stars on GitHub - 1 maintainer
atai-ebook-tool 0.0.6
A command-line tool for parsing ebooks (such as EPUB and MOBI) and converting them into a structu...5 versions - Latest release: about 1 year ago - 129 downloads last month - 0 stars on GitHub - 1 maintainer
toprint 0.1.32
2print/toprint: Python library for printing and converting between HTML, PDF, ZPL, and image form...6 versions - Latest release: 10 months ago - 31 downloads last month - 0 stars on GitHub - 1 maintainer
doc-extraction 2.5.0
Multi-format document extraction library for EPUB, PDF, HTML, Markdown, and JSON documents2 versions - Latest release: 2 months ago - 148 downloads last month - 1 maintainer
ragwire 1.2.1
RAGWire — Production-grade RAG toolkit for document ingestion and retrieval with hybrid search su...11 versions - Latest release: 10 days ago - 1 maintainer
medical-ocr 0.1.1
Medical document OCR pipeline: extract, structure, and export text from medical/legal PDFs.2 versions - Latest release: 11 days ago - 164 downloads last month - 1 maintainer
llm_etl_pipeline 0.1.0
LLM extraction from documents1 version - Latest release: 10 months ago - 10 downloads last month - 1 stars on GitHub - 1 maintainer
powerrag-sdk 0.3.0
A Python SDK for PowerRAG API, providing easy-to-use interfaces for knowledge base management, do...2 versions - Latest release: 3 months ago - 61 downloads last month
rnsr 0.3.3
Recursive Neural-Symbolic Retriever - Hierarchical document retrieval with font-based structure a...10 versions - Latest release: 11 days ago - 348 downloads last month - 12 stars on GitHub - 1 maintainer
docling-onnx-models 0.1.3
ONNX Runtime implementations for Docling AI models3 versions - Latest release: 7 months ago - 308 downloads last month - 0 stars on GitHub - 1 maintainer
many-ocr-sdk 0.4.0
A simple and efficient Python SDK for DeepSeek-OCR API1 version - Latest release: 4 months ago - 10 downloads last month - 2 stars on GitHub - 1 maintainer
mseep-kreuzberg 3.13.5
Document intelligence framework for Python - Extract text, metadata, and structured data from div...4 versions - Latest release: 7 months ago - 17 downloads last month - 2,454 stars on GitHub - 1 maintainer
bookbridge-mcp 1.0.2
A powerful Model Context Protocol (MCP) server for Chinese-to-English book translation and docume...3 versions - Latest release: 6 months ago - 58 downloads last month - 0 stars on GitHub - 1 maintainer
magicconvert 0.1.3
MagicConvert is a Python library that converts various document formats (PDF, DOCX, XLSX, PPTX, H...3 versions - Latest release: 10 months ago - 57 downloads last month - 2 stars on GitHub - 1 maintainer
dddocr-py 0.1.0
Python client for the 3DOCR.com OCR API1 version - Latest release: 7 months ago - 27 downloads last month - 0 stars on GitHub - 1 maintainer
wizarddocx 1.0.0
Text extraction from Microsoft Word files. Parses Word documents natively and can optionally run ...1 version - Latest release: 7 months ago - 30 downloads last month - 1 stars on GitHub - 1 maintainer
arag 0.1.0
A CLI tool for creating, managing, and querying .arag files for RAG applications1 version - Latest release: about 1 year ago - 40 downloads last month - 1 stars on GitHub - 1 maintainer
deeplightrag 1.0.22
DeepLightRAG: High-performance Document Indexing and Retrieval System (use with any LLM)20 versions - Latest release: 4 months ago - 158 downloads last month - 0 stars on GitHub - 1 maintainer
mcp-gosling 0.1.0
MCP Gosling - Advanced document processing server for Goose AI using IBM's Docling library1 version - Latest release: 7 months ago - 6 downloads last month - 1 maintainer
unstructured-ingest-clickzetta 1.3.5
ClickZetta connector for Unstructured data pipeline - Enhanced ETL with SQL and Volume support8 versions - Latest release: 6 months ago - 41 downloads last month - 0 stars on GitHub - 1 maintainer
rag-prep 0.1.4
A minimal, extensible framework for preparing documents for RAG/LLM workflows3 versions - Latest release: 4 months ago - 30 downloads last month - 1 maintainer
doclayer-cli 1.2.1
Doclayer Command-Line Interface - Document Intelligence Platform CLI7 versions - Latest release: 4 months ago - 106 downloads last month - 1 maintainer
pdfmcp-tools 0.1.1
MCP server for comprehensive PDF processing with 18 specialized tools2 versions - Latest release: 6 months ago - 11 downloads last month - 1 stars on GitHub - 1 maintainer
sharepoint-to-text 1.0.0
Text extraction library for typical file formats found in SharePoint repositories12 versions - Latest release: about 1 month ago - 2.67 thousand downloads last month - 0 stars on GitHub - 1 maintainer
pdftwin 0.1.0
Turn PDFs into editable JSON and visually matching replica PDFs.1 version - Latest release: 16 days ago - 102 downloads last month - 1 maintainer
stache-ai-ocr 0.1.2
OCR support for Stache AI document loaders3 versions - Latest release: 3 months ago - 47 downloads last month - 1 maintainer
iflow-mcp_docx-mcp 0.1.6
DOCX MCP处理器 - 完整的Word文档处理工具,支持图片编辑和表格操作1 version - Latest release: 4 months ago - 31 downloads last month - 15 stars on GitHub - 1 maintainer
preocr 1.8.1
A fast, layout-aware OCR decision engine for document processing pipelines. Detects whether files...31 versions - Latest release: about 1 month ago - 1 maintainer
aimq 0.1.2
A robust message queue processor for Supabase pgmq with AI-powered document processing capabilities3 versions - Latest release: 5 months ago - 44 downloads last month - 1 stars on GitHub - 1 maintainer
justpdf 0.0.2
A lightweight, high-performance PDF text extraction library with a pandas-style API.2 versions - Latest release: 17 days ago - 189 downloads last month - 1 maintainer
entr-adapter-core 1.3.3
ENTR Adapter Core - shared handler framework for tenant adapters6 versions - Latest release: 16 days ago - 475 downloads last month - 1 maintainer
isotope-rag 0.1.0
Reverse RAG database - index questions, not chunks1 version - Latest release: 3 months ago - 17 downloads last month - 1 stars on GitHub - 1 maintainer
shrutiai 1.0.1
Python SDK for interacting with the shrutiAI API - your AI-powered assistant2 versions - Latest release: 7 months ago - 19 downloads last month - 1 maintainer
science-ocr 0.3.0
Extract clean, structured text from scientific papers in PDF format3 versions - Latest release: 3 months ago - 57 downloads last month - 1 maintainer
doc2mark 0.5.1
Unified document processing with AI-powered OCR22 versions - Latest release: 17 days ago - 636 downloads last month - 39 stars on GitHub - 1 maintainer
tikara 0.1.6
The metadata and text content extractor for almost every file type.6 versions - Latest release: about 1 year ago - 244 downloads last month - 4 stars on GitHub - 1 maintainer
deepcompress 1.4.5
Production-ready document compression library reducing LLM costs by 96% with DeepSeek-OCR integra...39 versions - Latest release: 5 months ago - 358 downloads last month - 1 maintainer
signlib 1.0.2
Automated signature placement for synthetic data generation - designed for creating ML training d...2 versions - Latest release: about 2 months ago - 261 downloads last month - 1 maintainer
flockparser 1.0.9
Distributed document RAG system with intelligent GPU/CPU orchestration8 versions - Latest release: 5 months ago - 96 downloads last month - 3 stars on GitHub - 1 maintainer
Related Keywords
pdf
79
ocr
76
ai
67
llm
58
rag
54
text-extraction
36
nlp
35
machine-learning
28
python
20
chunking
19
mcp
19
semantic-search
18
markdown
17
docx
16
embeddings
16
openai
15
document
13
sdk
13
data-extraction
12
api
12
cli
12
vector-database
12
table-extraction
10
document-extraction
10
text-processing
10
retrieval-augmented-generation
9
image-processing
9
langchain
9
pdf-to-markdown
9
document-analysis
9
structured-data
9
gemini
9
document-intelligence
8
artificial-intelligence
8
extraction
8
deepseek
8
document-ai
8
agent
7
document-understanding
7
file-conversion
7
knowledge-base
7
document-parsing
7
docling
7
layout-analysis
7
document-conversion
6
pdf-parser
6
mcp-server
6
generative-ai
6
image-to-text
6
pdf-processing
6
xlsx
6
tesseract
6
information-extraction
6
unstructured-data
6
chromadb
5
fastmcp
5
python3
5
html
5
html-to-markdown
5
pdf-extraction
5
pptx
5
retrieval
5
vector-search
5
ai-agent
5
claude
5
word
5
ppt
5
intelligent-document-processing
5
structured-data-extraction
5
llms
4
etl
4
qdrant
4
semantic-analysis
4
model-context-protocol
4
pdf-parsing
4
powerpoint
4
medical
4
vision
4
developer-tools
4
question-answering
4
multilingual
4
anthropic
4
document-classification
4
pdf-to-text
4
entity-extraction
4
ml
4
zero-shot
3
gen-ai
3
llm-reasoning
3
llm-library
3
llm-framework
3
neural-segmentation
3
boilerplate-application
3
ai-agents-framework
3
text-analysis
3
batch-document-processing
3
word-to-markdown
3
powerpoint-to-markdown
3
excel-to-markdown
3
llm-ready-data
3