An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "document-processing" keyword

View the packages on the pypi.org package registry that are tagged with the "document-processing" keyword.

rossum-agent 1.1.2
AI agent toolkit for Rossum: document workflows conversationally, debug pipelines automatically, ...
11 versions - Latest release: about 8 hours ago - 1 maintainer
rossum-agent-client 1.1.0
Python client for Rossum Agent API - AI-powered document processing assistant
3 versions - Latest release: 15 days ago - 283 downloads last month - 1 maintainer
roset 0.1.2
Roset Python SDK -- Unstructured-to-Structured Transformation Engine
3 versions - Latest release: 3 days ago - 171 downloads last month - 1 maintainer
llm-text-splitter 0.2.0
A lightweight, rule-based text splitter for LLM context window management, handles multiple file ...
2 versions - Latest release: 7 months ago - 19 downloads last month - 0 stars on GitHub - 1 maintainer
kraken-captia 0.1.2
Official Python SDK for Kraken document processing API
1 version - Latest release: about 1 month ago - 99 downloads last month - 1 maintainer
neuraparse 0.3.6
Production-grade agentic document-to-dataset pipeline with GraphRAG support.
10 versions - Latest release: about 1 year ago - 31 downloads last month - 1 maintainer
fileseek 0.1.3
FileSeek – AI-Powered Local Document Archive&Search
3 versions - Latest release: about 1 year ago - 23 downloads last month - 1 maintainer
nemotron-table-structure-v1 1.0.0
Nemotron Table Structure v1 - A specialized object detection model for table structure extraction
1 version - Latest release: about 2 months ago - 196 downloads last month - 1 maintainer
ragctl 0.1.5
ragctl - Production-ready RAG toolkit with advanced OCR, semantic chunking, and intelligent docum...
5 versions - Latest release: about 1 month ago - 60 downloads last month - 0 stars on GitHub - 1 maintainer
xgen-doc2chunk 0.2.13
Convert raw documents into AI-understandable context with intelligent text extraction, table dete...
14 versions - Latest release: 2 days ago
vi-rag 0.1.4
Vietnamese Retrieval-Augmented Generation (RAG) Framework
5 versions - Latest release: 12 days ago
rnsr 0.2.0
Recursive Neural-Symbolic Retriever - Hierarchical document retrieval with font-based structure a...
5 versions - Latest release: 6 days ago - 1 maintainer
pdf-mcp 1.1.2
Production-ready MCP server for PDF processing with intelligent caching. Extract text, search, an...
4 versions - Latest release: 5 days ago - 1 maintainer
jw-my-rag 0.1.1
OCR Vector Database - Document parsing, semantic segmentation, and vector search
2 versions - Latest release: 17 days ago - 185 downloads last month - 1 maintainer
iflow-mcp_rockcj-docx_mcp_cj 0.1.6
DOCX MCP处理器 - 完整的Word文档处理工具,支持图片编辑和表格操作
1 version - Latest release: 5 days ago - 1 maintainer
iflow-mcp_icip-cas-pptagent 0.2.22
PPTAgent, a tool for utilizing LLMs to generate PowerPoint presentations from documents.
5 versions - Latest release: 4 days ago - 1 maintainer
iflow-mcp_anuragb7-mcp-rag 0.1.1
MCP-RAG system built with the Model Context Protocol (MCP) that handles large files (up to 200MB)...
2 versions - Latest release: 5 days ago - 1 maintainer
harvestor 0.0.1
Harvest intelligence from any document - AI-powered data extraction and validation
1 version - Latest release: 12 days ago - 110 downloads last month
elizaos-plugin-pdf 2.0.0a4
elizaOS PDF Plugin - PDF reading and text extraction
1 version - Latest release: 5 days ago
cosmic-chunker 1.1.0
COSMIC: Concept-aware Semantic Meta-chunking with Intelligent Classification
1 version - Latest release: 12 days ago - 159 downloads last month
content-understanding-sdk 0.0.2
Python SDK for Azure Content Understanding API
2 versions - Latest release: 15 days ago - 1 maintainer
clox-client 1.0.0
Python client SDK for Clox Document Processing API
1 version - Latest release: 8 days ago
agentone 0.1.0
Smart utilities for AI agents infrastructure.
1 version - Latest release: 15 days ago - 1 maintainer
mistral-ocr-cli 1.0.2
A clean command-line tool for OCR processing using Mistral AI's API
3 versions - Latest release: about 1 month ago - 111 downloads last month - 1 maintainer
blocknote-py 0.3.1 💰
🚀 BlockNote Python library - Convert BlockNote.js blocks to HTML, Markdown, PDF & JSON. Type-safe...
3 versions - Latest release: 4 months ago - 44 downloads last month - 1 maintainer
spanish-pdf-parser 0.1.0
A Python package for processing PDFs with header and footer detection
1 version - Latest release: about 1 year ago - 61 downloads last month - 1 maintainer
kallia 0.1.6
Semantic Document Processing Library
7 versions - Latest release: about 2 months ago - 96 downloads last month - 0 stars on GitHub - 1 maintainer
pdf-structify 0.1.18
Extract structured data from PDFs using LLMs with sklearn-like API
19 versions - Latest release: 19 days ago - 1.63 thousand downloads last month
contextifier 0.2.2
Convert raw documents into AI-understandable context with intelligent text extraction, table dete...
9 versions - Latest release: 20 days ago - 954 downloads last month
caas-core 0.3.0
A pure, logic-only library for routing context, handling RAG fallacies, and managing context wind...
2 versions - Latest release: 16 days ago - 205 downloads last month
docuglean 1.1.0
An SDK for intelligent document processing using SOTA VLLM models
1 version - Latest release: 3 months ago - 27 downloads last month - 1 maintainer
docs2db 0.4.3
Repository of docling documents for RAG
8 versions - Latest release: about 1 month ago - 186 downloads last month - 1 stars on GitHub - 1 maintainer
pdf-splitter-cli 0.1.3
A modern command-line tool to split PDF files into smaller chunks with progress bars and automati...
4 versions - Latest release: 7 months ago - 42 downloads last month - 1 maintainer
chunking-strategy 0.4.1
A comprehensive chunking library for text, documents, audio, video, and data streams (Linux and m...
4 versions - Latest release: 4 months ago - 52 downloads last month - 2 stars on GitHub - 1 maintainer
extract-monster 0.1.0
Python SDK for Extract Monster - Extract structured data from files and text using AI
1 version - Latest release: 4 months ago - 15 downloads last month - 1 maintainer
smartloop 1.3.3
Smartloop Command Line interface to process documents using LLM
33 versions - Latest release: 3 months ago - 246 downloads last month - 2 stars on GitHub - 2 maintainers
ragger-python-sdk 0.1.3
Python SDK for ragger.ai RAG API
4 versions - Latest release: 4 months ago - 46 downloads last month - 1 maintainer
deepseek-ocr 0.3.0
A simple and efficient Python SDK for DeepSeek-OCR API
4 versions - Latest release: 2 months ago - 666 downloads last month - 2 stars on GitHub - 1 maintainer
quanta-pdf 1.0.5
Advanced PDF layout analysis engine for extracting figures, tables, and structured content
5 versions - Latest release: 2 months ago - 73 downloads last month - 2 stars on GitHub - 1 maintainer
multi-ocr-sdk 0.5.1
A simple and efficient Python SDK for multi OCR API, such as deepseek OCR, VLM(qwenvl).
5 versions - Latest release: about 2 months ago - 93 downloads last month - 1 maintainer
kreuzberg 4.2.3 💰
High-performance document intelligence library for Python. Extract text, metadata, and structured...
107 versions - Latest release: 14 days ago - 18.6 thousand downloads last month - 3,123 stars on GitHub - 1 maintainer
kiss-ai-stack-core 0.1.0
KISS AI Stack's RAG builder core
26 versions - Latest release: about 1 year ago - 31 downloads last month - 1 stars on GitHub - 1 maintainer
byteit 0.1.1
AI-powered document intelligence platform - Turn your data into structured data with a single lin...
2 versions - Latest release: 19 days ago - 215 downloads last month
qdrant-loader-mcp-server 0.7.6
A Model Context Protocol (MCP) server that provides RAG capabilities to Cursor using Qdrant.
24 versions - Latest release: 21 days ago - 1.02 thousand downloads last month - 20 stars on GitHub - 1 maintainer
qdrant-loader 0.7.6
A tool for collecting and vectorizing technical content from multiple sources and storing it in a...
31 versions - Latest release: 21 days ago - 812 downloads last month - 20 stars on GitHub - 1 maintainer
qdrant-loader-core 0.7.6
Shared core for provider-agnostic LLM support and configuration mapping for qdrant-loader ecosystem
5 versions - Latest release: 21 days ago - 984 downloads last month - 20 stars on GitHub - 1 maintainer
julee 0.1.7
Julee - Clean architecture for accountable and transparent digital supply chains
8 versions - Latest release: about 2 months ago - 807 downloads last month - 0 stars on GitHub - 1 maintainer
ocrxdoc 1.0.0
Python Framework for OCR using Qwen3-VL Models
1 version - Latest release: 3 months ago - 26 downloads last month - 1 maintainer
freecrawl-mcp 0.1.2
FreeCrawl MCP Server - Self-hosted web scraping and document processing as a Firecrawl replacement
3 versions - Latest release: 6 months ago - 40 downloads last month - 1 stars on GitHub - 1 maintainer
pdflinkcheck 1.3.36
A purpose-built PDF link analysis and reporting tool with GUI and CLI.
150 versions - Latest release: 18 days ago - 8.63 thousand downloads last month - 1 stars on GitHub - 1 maintainer
docling-enhanced-onnx 1.0.0
Enhanced Docling Models with ONNX Auto-Detection and Air-Gapped Support
1 version - Latest release: 5 months ago - 19 downloads last month - 0 stars on GitHub - 1 maintainer
kiss-ai-stack-server 0.1.0a17
KISS AI Stack's Server stub - Simplify AI Agent Development
18 versions - Latest release: about 1 year ago - 149 downloads last month - 1 stars on GitHub - 1 maintainer
fitz-ai 0.6.2
A modular, production-ready knowledge engine platform with clean architecture and multi-paradigm ...
14 versions - Latest release: 19 days ago - 1.38 thousand downloads last month - 7 stars on GitHub - 1 maintainer
docling-analysis-framework 2.0.0
AI-ready analysis framework for PDF and Office documents using Docling for content extraction - p...
4 versions - Latest release: 4 months ago - 38 downloads last month - 0 stars on GitHub - 1 maintainer
atai-gemma3-tool 0.0.3
CLI tool for generating text from images using the Gemma 3 model.
3 versions - Latest release: 11 months ago - 37 downloads last month - 0 stars on GitHub - 1 maintainer
contextgem 0.19.4
Effortless LLM extraction from documents
42 versions - Latest release: about 2 months ago - 2.02 thousand downloads last month - 1,697 stars on GitHub - 1 maintainer
raggy 0.3.5 💰
scraping stuff
25 versions - Latest release: 6 months ago - 1 dependent package - 219 downloads last month - 24 stars on GitHub - 1 maintainer
smart-llm-loader 0.1.0
A powerful PDF processing toolkit that seamlessly integrates with LLMs for intelligent document c...
1 version - Latest release: 12 months ago - 22 downloads last month - 66 stars on GitHub - 1 maintainer
magicconvert 0.1.3
MagicConvert is a Python library that converts various document formats (PDF, DOCX, XLSX, PPTX, H...
3 versions - Latest release: 9 months ago - 121 downloads last month - 2 stars on GitHub - 1 maintainer
many-ocr-sdk 0.4.0
A simple and efficient Python SDK for DeepSeek-OCR API
1 version - Latest release: 2 months ago - 16 downloads last month - 2 stars on GitHub - 1 maintainer
ai-chunking 0.1.9
A powerful Python library for semantic document chunking and enrichment using AI
8 versions - Latest release: 11 months ago - 85 downloads last month - 120 stars on GitHub - 1 maintainer
doc-extraction 0.0.1
Multi-format document extraction library for EPUB, PDF, HTML, Markdown, and JSON documents
1 version - Latest release: 29 days ago - 102 downloads last month
eless 1.0.3
Evolving Low-resource Embedding and Storage System - A resilient RAG data processing pipeline wit...
4 versions - Latest release: 4 months ago - 41 downloads last month - 1 maintainer
deeplightrag 1.0.22
DeepLightRAG: High-performance Document Indexing and Retrieval System (use with any LLM)
20 versions - Latest release: about 2 months ago - 285 downloads last month - 0 stars on GitHub - 1 maintainer
powerrag-sdk 0.3.0
A Python SDK for PowerRAG API, providing easy-to-use interfaces for knowledge base management, do...
2 versions - Latest release: 30 days ago - 228 downloads last month
mcp-gosling 0.1.0
MCP Gosling - Advanced document processing server for Goose AI using IBM's Docling library
1 version - Latest release: 6 months ago - 17 downloads last month - 1 maintainer
docling-onnx-models 0.1.3
ONNX Runtime implementations for Docling AI models
3 versions - Latest release: 5 months ago - 56 downloads last month - 0 stars on GitHub - 1 maintainer
smart-ocr 0.1.2
Multi-engine document OCR with cascading fallback
3 versions - Latest release: about 1 month ago - 366 downloads last month - 1 maintainer
kita 1.1.0
Official Python SDK for Kita Document Processing API
3 versions - Latest release: 25 days ago - 177 downloads last month - 0 stars on GitHub - 1 maintainer
deepcompress 1.4.5
Production-ready document compression library reducing LLM costs by 96% with DeepSeek-OCR integra...
39 versions - Latest release: 3 months ago - 341 downloads last month - 1 maintainer
docuglean-ocr 1.0.0
An SDK for intelligent document processing using SOTA VLLM models
1 version - Latest release: 5 months ago - 19 downloads last month - 1 stars on GitHub - 1 maintainer
doc2mark 0.4.3
Unified document processing with AI-powered OCR
20 versions - Latest release: 2 months ago - 1.04 thousand downloads last month - 39 stars on GitHub - 1 maintainer
chunkana 0.1.6
Intelligent Markdown chunking library for RAG systems
7 versions - Latest release: about 1 month ago - 856 downloads last month - 1 maintainer
llm_etl_pipeline 0.1.0
LLM extraction from documents
1 version - Latest release: 8 months ago - 17 downloads last month - 1 stars on GitHub - 1 maintainer
bookbridge-mcp 1.0.2
A powerful Model Context Protocol (MCP) server for Chinese-to-English book translation and docume...
3 versions - Latest release: 4 months ago - 39 downloads last month - 0 stars on GitHub - 1 maintainer
science-ocr 0.3.0
Extract clean, structured text from scientific papers in PDF format
3 versions - Latest release: 30 days ago - 148 downloads last month - 1 maintainer
rag-prep 0.1.4
A minimal, extensible framework for preparing documents for RAG/LLM workflows
3 versions - Latest release: 3 months ago - 42 downloads last month - 1 maintainer
atai-ebook-tool 0.0.6
A command-line tool for parsing ebooks (such as EPUB and MOBI) and converting them into a structu...
5 versions - Latest release: 11 months ago - 55 downloads last month - 0 stars on GitHub - 1 maintainer
markdrop 3.5.0
A comprehensive PDF processing toolkit that converts PDFs to markdown with advanced AI-powered fe...
20 versions - Latest release: 7 months ago - 409 downloads last month - 116 stars on GitHub - 2 maintainers
stache-ai-ocr 0.1.2
OCR support for Stache AI document loaders
3 versions - Latest release: 27 days ago - 333 downloads last month - 1 maintainer
stache-ai-documents 0.1.0
Document format loaders for Stache AI (EPUB, DOCX, PPTX)
1 version - Latest release: about 1 month ago - 135 downloads last month - 1 maintainer
flockparser 1.0.9
Distributed document RAG system with intelligent GPU/CPU orchestration
8 versions - Latest release: 3 months ago - 96 downloads last month - 3 stars on GitHub - 1 maintainer
deepseek-ocr-cli 0.3.2
CLI tool for OCR using DeepSeek-OCR model via Ollama
9 versions - Latest release: 28 days ago - 374 downloads last month - 1 maintainer
sharepoint-to-text 0.8.1
Text extraction library for typical file formats found in SharePoint repositories
10 versions - Latest release: about 1 month ago - 1.66 thousand downloads last month - 0 stars on GitHub - 1 maintainer
chandra-parser 0.1.0
PDF to Markdown parser using Datalab's Marker OCR API with optional GPT-based figure filtering
1 version - Latest release: 2 months ago - 27 downloads last month - 1 maintainer
mseep-kreuzberg 3.13.5
Document intelligence framework for Python - Extract text, metadata, and structured data from div...
4 versions - Latest release: 5 months ago - 43 downloads last month - 2,454 stars on GitHub - 1 maintainer
ingest-cli 1.0.2
High-quality document processing for RAG pipelines, supporting multiple formats and processing ba...
1 version - Latest release: 2 months ago - 28 downloads last month - 1 maintainer
mac-letterhead 0.14.0
A macOS utility to merge letterhead with PDF documents using a drag-and-drop interface
100 versions - Latest release: 4 months ago - 704 downloads last month - 0 stars on GitHub - 1 maintainer
pdfsegmenter 0.1
This library builds a graph-representation of the content of PDFs. The graph is then clustered, r...
1 version - Latest release: over 5 years ago - 1 dependent repositories - 18 downloads last month - 23 stars on GitHub - 1 maintainer
peslac 0.1.4
A Python package for the Peslac API
5 versions - Latest release: about 1 year ago - 19 downloads last month - 0 stars on GitHub - 1 maintainer
unstructured-ingest-clickzetta 1.3.5
ClickZetta connector for Unstructured data pipeline - Enhanced ETL with SQL and Volume support
8 versions - Latest release: 5 months ago - 52 downloads last month - 0 stars on GitHub - 1 maintainer
doclayer-cli 1.2.1
Doclayer Command-Line Interface - Document Intelligence Platform CLI
7 versions - Latest release: 2 months ago - 73 downloads last month - 1 maintainer
qagen 0.1.1
A powerful Chinese document QA pairs generation and validation tool with multiple LLM support
2 versions - Latest release: 6 months ago - 11 downloads last month - 0 stars on GitHub - 1 maintainer
pdf2llm 0.1.1
Extract PDF content optimized for Large Language Model (LLM) consumption
2 versions - Latest release: 6 months ago - 17 downloads last month - 1 maintainer
pdf2markdown 0.3.0
Python library and CLI tool that leverages LLMs to convert technical PDF documents to well-struct...
2 versions - Latest release: 5 months ago - 231 downloads last month - 0 stars on GitHub - 1 maintainer
docstrange 1.1.8
Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, J...
19 versions - Latest release: 3 months ago - 1.49 thousand downloads last month - 935 stars on GitHub - 1 maintainer
isotope-rag 0.1.0
Reverse RAG database - index questions, not chunks
1 version - Latest release: about 1 month ago - 90 downloads last month
docslicer 0.1.1
SDK for the DocSlicer document processing API - transform HTML documents into structured chunks f...
1 version - Latest release: about 1 month ago - 109 downloads last month
kiss-ai-stack-types 0.1.0a4
KISS AI Stack's common object types
4 versions - Latest release: about 1 year ago - 22 downloads last month - 0 stars on GitHub - 1 maintainer
indoxminer 0.1.5
Indox Data Extraction
19 versions - Latest release: about 1 year ago - 63 downloads last month - 20 stars on GitHub - 2 maintainers
Related Keywords
pdf 66 ocr 64 ai 62 llm 56 rag 54 nlp 32 text-extraction 32 machine-learning 27 chunking 19 markdown 18 mcp 17 semantic-search 16 openai 16 embeddings 15 python 15 docx 14 document 12 vector-database 11 data-extraction 11 text-processing 10 api 10 cli 10 document-extraction 10 table-extraction 10 sdk 10 document-analysis 9 artificial-intelligence 9 structured-data 9 langchain 9 retrieval-augmented-generation 9 deepseek 8 gemini 8 document-intelligence 8 pdf-to-markdown 8 image-processing 8 document-ai 7 knowledge-base 7 document-parsing 7 file-conversion 7 document-understanding 7 docling 7 xlsx 6 image-to-text 6 agent 6 document-conversion 6 generative-ai 6 information-extraction 6 unstructured-data 6 extraction 6 intelligent-document-processing 5 ppt 5 html 5 structured-data-extraction 5 python3 5 word 5 vector-search 5 pdf-processing 5 mcp-server 5 ai-agent 5 pptx 5 tesseract 5 html-to-markdown 5 powerpoint 4 ml 4 fastmcp 4 multilingual 4 pdf-parser 4 agents 4 llms 4 document-classification 4 semantic-analysis 4 pdf-extraction 4 model-context-protocol 4 chromadb 4 qdrant 4 developer-tools 4 retrieval 4 pdf-parsing 4 transformers 3 powerpoint-to-markdown 3 search 3 layout-analysis 3 conversion 3 vision 3 metadata-extraction 3 excel-to-markdown 3 pdf-to-text 3 ollama 3 llm-ready-data 3 layout-detection 3 pdf-tools 3 local-document-processing 3 document-to-markdown 3 tesseract-alternative 3 paddleocr-alternative 3 mineru-alternative 3 markitdown-alternative 3 framework 3 marker-alternative 3 docling-alternative 3