pypi.org "document-processing" keyword
View the packages on the pypi.org package registry that are tagged with the "document-processing" keyword.
rossum-agent 1.1.2
AI agent toolkit for Rossum: document workflows conversationally, debug pipelines automatically, ...11 versions - Latest release: about 8 hours ago - 1 maintainer
rossum-agent-client 1.1.0
Python client for Rossum Agent API - AI-powered document processing assistant3 versions - Latest release: 15 days ago - 283 downloads last month - 1 maintainer
roset 0.1.2
Roset Python SDK -- Unstructured-to-Structured Transformation Engine3 versions - Latest release: 3 days ago - 171 downloads last month - 1 maintainer
llm-text-splitter 0.2.0
A lightweight, rule-based text splitter for LLM context window management, handles multiple file ...2 versions - Latest release: 7 months ago - 19 downloads last month - 0 stars on GitHub - 1 maintainer
kraken-captia 0.1.2
Official Python SDK for Kraken document processing API1 version - Latest release: about 1 month ago - 99 downloads last month - 1 maintainer
neuraparse 0.3.6
Production-grade agentic document-to-dataset pipeline with GraphRAG support.10 versions - Latest release: about 1 year ago - 31 downloads last month - 1 maintainer
fileseek 0.1.3
FileSeek – AI-Powered Local Document Archive&Search3 versions - Latest release: about 1 year ago - 23 downloads last month - 1 maintainer
nemotron-table-structure-v1 1.0.0
Nemotron Table Structure v1 - A specialized object detection model for table structure extraction1 version - Latest release: about 2 months ago - 196 downloads last month - 1 maintainer
ragctl 0.1.5
ragctl - Production-ready RAG toolkit with advanced OCR, semantic chunking, and intelligent docum...5 versions - Latest release: about 1 month ago - 60 downloads last month - 0 stars on GitHub - 1 maintainer
xgen-doc2chunk 0.2.13
Convert raw documents into AI-understandable context with intelligent text extraction, table dete...14 versions - Latest release: 2 days ago
vi-rag 0.1.4
Vietnamese Retrieval-Augmented Generation (RAG) Framework5 versions - Latest release: 12 days ago
rnsr 0.2.0
Recursive Neural-Symbolic Retriever - Hierarchical document retrieval with font-based structure a...5 versions - Latest release: 6 days ago - 1 maintainer
pdf-mcp 1.1.2
Production-ready MCP server for PDF processing with intelligent caching. Extract text, search, an...4 versions - Latest release: 5 days ago - 1 maintainer
jw-my-rag 0.1.1
OCR Vector Database - Document parsing, semantic segmentation, and vector search2 versions - Latest release: 17 days ago - 185 downloads last month - 1 maintainer
iflow-mcp_rockcj-docx_mcp_cj 0.1.6
DOCX MCP处理器 - 完整的Word文档处理工具,支持图片编辑和表格操作1 version - Latest release: 5 days ago - 1 maintainer
iflow-mcp_icip-cas-pptagent 0.2.22
PPTAgent, a tool for utilizing LLMs to generate PowerPoint presentations from documents.5 versions - Latest release: 4 days ago - 1 maintainer
iflow-mcp_anuragb7-mcp-rag 0.1.1
MCP-RAG system built with the Model Context Protocol (MCP) that handles large files (up to 200MB)...2 versions - Latest release: 5 days ago - 1 maintainer
harvestor 0.0.1
Harvest intelligence from any document - AI-powered data extraction and validation1 version - Latest release: 12 days ago - 110 downloads last month
elizaos-plugin-pdf 2.0.0a4
elizaOS PDF Plugin - PDF reading and text extraction1 version - Latest release: 5 days ago
cosmic-chunker 1.1.0
COSMIC: Concept-aware Semantic Meta-chunking with Intelligent Classification1 version - Latest release: 12 days ago - 159 downloads last month
content-understanding-sdk 0.0.2
Python SDK for Azure Content Understanding API2 versions - Latest release: 15 days ago - 1 maintainer
clox-client 1.0.0
Python client SDK for Clox Document Processing API1 version - Latest release: 8 days ago
agentone 0.1.0
Smart utilities for AI agents infrastructure.1 version - Latest release: 15 days ago - 1 maintainer
mistral-ocr-cli 1.0.2
A clean command-line tool for OCR processing using Mistral AI's API3 versions - Latest release: about 1 month ago - 111 downloads last month - 1 maintainer
blocknote-py 0.3.1 💰
🚀 BlockNote Python library - Convert BlockNote.js blocks to HTML, Markdown, PDF & JSON. Type-safe...3 versions - Latest release: 4 months ago - 44 downloads last month - 1 maintainer
spanish-pdf-parser 0.1.0
A Python package for processing PDFs with header and footer detection1 version - Latest release: about 1 year ago - 61 downloads last month - 1 maintainer
kallia 0.1.6
Semantic Document Processing Library7 versions - Latest release: about 2 months ago - 96 downloads last month - 0 stars on GitHub - 1 maintainer
pdf-structify 0.1.18
Extract structured data from PDFs using LLMs with sklearn-like API19 versions - Latest release: 19 days ago - 1.63 thousand downloads last month
contextifier 0.2.2
Convert raw documents into AI-understandable context with intelligent text extraction, table dete...9 versions - Latest release: 20 days ago - 954 downloads last month
caas-core 0.3.0
A pure, logic-only library for routing context, handling RAG fallacies, and managing context wind...2 versions - Latest release: 16 days ago - 205 downloads last month
docuglean 1.1.0
An SDK for intelligent document processing using SOTA VLLM models1 version - Latest release: 3 months ago - 27 downloads last month - 1 maintainer
docs2db 0.4.3
Repository of docling documents for RAG8 versions - Latest release: about 1 month ago - 186 downloads last month - 1 stars on GitHub - 1 maintainer
pdf-splitter-cli 0.1.3
A modern command-line tool to split PDF files into smaller chunks with progress bars and automati...4 versions - Latest release: 7 months ago - 42 downloads last month - 1 maintainer
chunking-strategy 0.4.1
A comprehensive chunking library for text, documents, audio, video, and data streams (Linux and m...4 versions - Latest release: 4 months ago - 52 downloads last month - 2 stars on GitHub - 1 maintainer
extract-monster 0.1.0
Python SDK for Extract Monster - Extract structured data from files and text using AI1 version - Latest release: 4 months ago - 15 downloads last month - 1 maintainer
smartloop 1.3.3
Smartloop Command Line interface to process documents using LLM33 versions - Latest release: 3 months ago - 246 downloads last month - 2 stars on GitHub - 2 maintainers
ragger-python-sdk 0.1.3
Python SDK for ragger.ai RAG API4 versions - Latest release: 4 months ago - 46 downloads last month - 1 maintainer
deepseek-ocr 0.3.0
A simple and efficient Python SDK for DeepSeek-OCR API4 versions - Latest release: 2 months ago - 666 downloads last month - 2 stars on GitHub - 1 maintainer
quanta-pdf 1.0.5
Advanced PDF layout analysis engine for extracting figures, tables, and structured content5 versions - Latest release: 2 months ago - 73 downloads last month - 2 stars on GitHub - 1 maintainer
multi-ocr-sdk 0.5.1
A simple and efficient Python SDK for multi OCR API, such as deepseek OCR, VLM(qwenvl).5 versions - Latest release: about 2 months ago - 93 downloads last month - 1 maintainer
kreuzberg 4.2.3 💰
High-performance document intelligence library for Python. Extract text, metadata, and structured...107 versions - Latest release: 14 days ago - 18.6 thousand downloads last month - 3,123 stars on GitHub - 1 maintainer
kiss-ai-stack-core 0.1.0
KISS AI Stack's RAG builder core26 versions - Latest release: about 1 year ago - 31 downloads last month - 1 stars on GitHub - 1 maintainer
byteit 0.1.1
AI-powered document intelligence platform - Turn your data into structured data with a single lin...2 versions - Latest release: 19 days ago - 215 downloads last month
qdrant-loader-mcp-server 0.7.6
A Model Context Protocol (MCP) server that provides RAG capabilities to Cursor using Qdrant.24 versions - Latest release: 21 days ago - 1.02 thousand downloads last month - 20 stars on GitHub - 1 maintainer
qdrant-loader 0.7.6
A tool for collecting and vectorizing technical content from multiple sources and storing it in a...31 versions - Latest release: 21 days ago - 812 downloads last month - 20 stars on GitHub - 1 maintainer
qdrant-loader-core 0.7.6
Shared core for provider-agnostic LLM support and configuration mapping for qdrant-loader ecosystem5 versions - Latest release: 21 days ago - 984 downloads last month - 20 stars on GitHub - 1 maintainer
julee 0.1.7
Julee - Clean architecture for accountable and transparent digital supply chains8 versions - Latest release: about 2 months ago - 807 downloads last month - 0 stars on GitHub - 1 maintainer
ocrxdoc 1.0.0
Python Framework for OCR using Qwen3-VL Models1 version - Latest release: 3 months ago - 26 downloads last month - 1 maintainer
freecrawl-mcp 0.1.2
FreeCrawl MCP Server - Self-hosted web scraping and document processing as a Firecrawl replacement3 versions - Latest release: 6 months ago - 40 downloads last month - 1 stars on GitHub - 1 maintainer
pdflinkcheck 1.3.36
A purpose-built PDF link analysis and reporting tool with GUI and CLI.150 versions - Latest release: 18 days ago - 8.63 thousand downloads last month - 1 stars on GitHub - 1 maintainer
docling-enhanced-onnx 1.0.0
Enhanced Docling Models with ONNX Auto-Detection and Air-Gapped Support1 version - Latest release: 5 months ago - 19 downloads last month - 0 stars on GitHub - 1 maintainer
kiss-ai-stack-server 0.1.0a17
KISS AI Stack's Server stub - Simplify AI Agent Development18 versions - Latest release: about 1 year ago - 149 downloads last month - 1 stars on GitHub - 1 maintainer
fitz-ai 0.6.2
A modular, production-ready knowledge engine platform with clean architecture and multi-paradigm ...14 versions - Latest release: 19 days ago - 1.38 thousand downloads last month - 7 stars on GitHub - 1 maintainer
docling-analysis-framework 2.0.0
AI-ready analysis framework for PDF and Office documents using Docling for content extraction - p...4 versions - Latest release: 4 months ago - 38 downloads last month - 0 stars on GitHub - 1 maintainer
atai-gemma3-tool 0.0.3
CLI tool for generating text from images using the Gemma 3 model.3 versions - Latest release: 11 months ago - 37 downloads last month - 0 stars on GitHub - 1 maintainer
contextgem 0.19.4
Effortless LLM extraction from documents42 versions - Latest release: about 2 months ago - 2.02 thousand downloads last month - 1,697 stars on GitHub - 1 maintainer
raggy 0.3.5 💰
scraping stuff25 versions - Latest release: 6 months ago - 1 dependent package - 219 downloads last month - 24 stars on GitHub - 1 maintainer
smart-llm-loader 0.1.0
A powerful PDF processing toolkit that seamlessly integrates with LLMs for intelligent document c...1 version - Latest release: 12 months ago - 22 downloads last month - 66 stars on GitHub - 1 maintainer
magicconvert 0.1.3
MagicConvert is a Python library that converts various document formats (PDF, DOCX, XLSX, PPTX, H...3 versions - Latest release: 9 months ago - 121 downloads last month - 2 stars on GitHub - 1 maintainer
many-ocr-sdk 0.4.0
A simple and efficient Python SDK for DeepSeek-OCR API1 version - Latest release: 2 months ago - 16 downloads last month - 2 stars on GitHub - 1 maintainer
ai-chunking 0.1.9
A powerful Python library for semantic document chunking and enrichment using AI8 versions - Latest release: 11 months ago - 85 downloads last month - 120 stars on GitHub - 1 maintainer
doc-extraction 0.0.1
Multi-format document extraction library for EPUB, PDF, HTML, Markdown, and JSON documents1 version - Latest release: 29 days ago - 102 downloads last month
eless 1.0.3
Evolving Low-resource Embedding and Storage System - A resilient RAG data processing pipeline wit...4 versions - Latest release: 4 months ago - 41 downloads last month - 1 maintainer
deeplightrag 1.0.22
DeepLightRAG: High-performance Document Indexing and Retrieval System (use with any LLM)20 versions - Latest release: about 2 months ago - 285 downloads last month - 0 stars on GitHub - 1 maintainer
powerrag-sdk 0.3.0
A Python SDK for PowerRAG API, providing easy-to-use interfaces for knowledge base management, do...2 versions - Latest release: 30 days ago - 228 downloads last month
mcp-gosling 0.1.0
MCP Gosling - Advanced document processing server for Goose AI using IBM's Docling library1 version - Latest release: 6 months ago - 17 downloads last month - 1 maintainer
docling-onnx-models 0.1.3
ONNX Runtime implementations for Docling AI models3 versions - Latest release: 5 months ago - 56 downloads last month - 0 stars on GitHub - 1 maintainer
smart-ocr 0.1.2
Multi-engine document OCR with cascading fallback3 versions - Latest release: about 1 month ago - 366 downloads last month - 1 maintainer
kita 1.1.0
Official Python SDK for Kita Document Processing API3 versions - Latest release: 25 days ago - 177 downloads last month - 0 stars on GitHub - 1 maintainer
deepcompress 1.4.5
Production-ready document compression library reducing LLM costs by 96% with DeepSeek-OCR integra...39 versions - Latest release: 3 months ago - 341 downloads last month - 1 maintainer
docuglean-ocr 1.0.0
An SDK for intelligent document processing using SOTA VLLM models1 version - Latest release: 5 months ago - 19 downloads last month - 1 stars on GitHub - 1 maintainer
doc2mark 0.4.3
Unified document processing with AI-powered OCR20 versions - Latest release: 2 months ago - 1.04 thousand downloads last month - 39 stars on GitHub - 1 maintainer
chunkana 0.1.6
Intelligent Markdown chunking library for RAG systems7 versions - Latest release: about 1 month ago - 856 downloads last month - 1 maintainer
llm_etl_pipeline 0.1.0
LLM extraction from documents1 version - Latest release: 8 months ago - 17 downloads last month - 1 stars on GitHub - 1 maintainer
bookbridge-mcp 1.0.2
A powerful Model Context Protocol (MCP) server for Chinese-to-English book translation and docume...3 versions - Latest release: 4 months ago - 39 downloads last month - 0 stars on GitHub - 1 maintainer
science-ocr 0.3.0
Extract clean, structured text from scientific papers in PDF format3 versions - Latest release: 30 days ago - 148 downloads last month - 1 maintainer
rag-prep 0.1.4
A minimal, extensible framework for preparing documents for RAG/LLM workflows3 versions - Latest release: 3 months ago - 42 downloads last month - 1 maintainer
atai-ebook-tool 0.0.6
A command-line tool for parsing ebooks (such as EPUB and MOBI) and converting them into a structu...5 versions - Latest release: 11 months ago - 55 downloads last month - 0 stars on GitHub - 1 maintainer
markdrop 3.5.0
A comprehensive PDF processing toolkit that converts PDFs to markdown with advanced AI-powered fe...20 versions - Latest release: 7 months ago - 409 downloads last month - 116 stars on GitHub - 2 maintainers
stache-ai-ocr 0.1.2
OCR support for Stache AI document loaders3 versions - Latest release: 27 days ago - 333 downloads last month - 1 maintainer
stache-ai-documents 0.1.0
Document format loaders for Stache AI (EPUB, DOCX, PPTX)1 version - Latest release: about 1 month ago - 135 downloads last month - 1 maintainer
flockparser 1.0.9
Distributed document RAG system with intelligent GPU/CPU orchestration8 versions - Latest release: 3 months ago - 96 downloads last month - 3 stars on GitHub - 1 maintainer
deepseek-ocr-cli 0.3.2
CLI tool for OCR using DeepSeek-OCR model via Ollama9 versions - Latest release: 28 days ago - 374 downloads last month - 1 maintainer
sharepoint-to-text 0.8.1
Text extraction library for typical file formats found in SharePoint repositories10 versions - Latest release: about 1 month ago - 1.66 thousand downloads last month - 0 stars on GitHub - 1 maintainer
chandra-parser 0.1.0
PDF to Markdown parser using Datalab's Marker OCR API with optional GPT-based figure filtering1 version - Latest release: 2 months ago - 27 downloads last month - 1 maintainer
mseep-kreuzberg 3.13.5
Document intelligence framework for Python - Extract text, metadata, and structured data from div...4 versions - Latest release: 5 months ago - 43 downloads last month - 2,454 stars on GitHub - 1 maintainer
ingest-cli 1.0.2
High-quality document processing for RAG pipelines, supporting multiple formats and processing ba...1 version - Latest release: 2 months ago - 28 downloads last month - 1 maintainer
mac-letterhead 0.14.0
A macOS utility to merge letterhead with PDF documents using a drag-and-drop interface100 versions - Latest release: 4 months ago - 704 downloads last month - 0 stars on GitHub - 1 maintainer
pdfsegmenter 0.1
This library builds a graph-representation of the content of PDFs. The graph is then clustered, r...1 version - Latest release: over 5 years ago - 1 dependent repositories - 18 downloads last month - 23 stars on GitHub - 1 maintainer
peslac 0.1.4
A Python package for the Peslac API5 versions - Latest release: about 1 year ago - 19 downloads last month - 0 stars on GitHub - 1 maintainer
unstructured-ingest-clickzetta 1.3.5
ClickZetta connector for Unstructured data pipeline - Enhanced ETL with SQL and Volume support8 versions - Latest release: 5 months ago - 52 downloads last month - 0 stars on GitHub - 1 maintainer
doclayer-cli 1.2.1
Doclayer Command-Line Interface - Document Intelligence Platform CLI7 versions - Latest release: 2 months ago - 73 downloads last month - 1 maintainer
qagen 0.1.1
A powerful Chinese document QA pairs generation and validation tool with multiple LLM support2 versions - Latest release: 6 months ago - 11 downloads last month - 0 stars on GitHub - 1 maintainer
pdf2llm 0.1.1
Extract PDF content optimized for Large Language Model (LLM) consumption2 versions - Latest release: 6 months ago - 17 downloads last month - 1 maintainer
pdf2markdown 0.3.0
Python library and CLI tool that leverages LLMs to convert technical PDF documents to well-struct...2 versions - Latest release: 5 months ago - 231 downloads last month - 0 stars on GitHub - 1 maintainer
docstrange 1.1.8
Extract and Convert PDF, Word, PowerPoint, Excel, images, URLs into multiple formats (Markdown, J...19 versions - Latest release: 3 months ago - 1.49 thousand downloads last month - 935 stars on GitHub - 1 maintainer
isotope-rag 0.1.0
Reverse RAG database - index questions, not chunks1 version - Latest release: about 1 month ago - 90 downloads last month
docslicer 0.1.1
SDK for the DocSlicer document processing API - transform HTML documents into structured chunks f...1 version - Latest release: about 1 month ago - 109 downloads last month
kiss-ai-stack-types 0.1.0a4
KISS AI Stack's common object types4 versions - Latest release: about 1 year ago - 22 downloads last month - 0 stars on GitHub - 1 maintainer
indoxminer 0.1.5
Indox Data Extraction19 versions - Latest release: about 1 year ago - 63 downloads last month - 20 stars on GitHub - 2 maintainers
Related Keywords
pdf
66
ocr
64
ai
62
llm
56
rag
54
nlp
32
text-extraction
32
machine-learning
27
chunking
19
markdown
18
mcp
17
semantic-search
16
openai
16
embeddings
15
python
15
docx
14
document
12
vector-database
11
data-extraction
11
text-processing
10
api
10
cli
10
document-extraction
10
table-extraction
10
sdk
10
document-analysis
9
artificial-intelligence
9
structured-data
9
langchain
9
retrieval-augmented-generation
9
deepseek
8
gemini
8
document-intelligence
8
pdf-to-markdown
8
image-processing
8
document-ai
7
knowledge-base
7
document-parsing
7
file-conversion
7
document-understanding
7
docling
7
xlsx
6
image-to-text
6
agent
6
document-conversion
6
generative-ai
6
information-extraction
6
unstructured-data
6
extraction
6
intelligent-document-processing
5
ppt
5
html
5
structured-data-extraction
5
python3
5
word
5
vector-search
5
pdf-processing
5
mcp-server
5
ai-agent
5
pptx
5
tesseract
5
html-to-markdown
5
powerpoint
4
ml
4
fastmcp
4
multilingual
4
pdf-parser
4
agents
4
llms
4
document-classification
4
semantic-analysis
4
pdf-extraction
4
model-context-protocol
4
chromadb
4
qdrant
4
developer-tools
4
retrieval
4
pdf-parsing
4
transformers
3
powerpoint-to-markdown
3
search
3
layout-analysis
3
conversion
3
vision
3
metadata-extraction
3
excel-to-markdown
3
pdf-to-text
3
ollama
3
llm-ready-data
3
layout-detection
3
pdf-tools
3
local-document-processing
3
document-to-markdown
3
tesseract-alternative
3
paddleocr-alternative
3
mineru-alternative
3
markitdown-alternative
3
framework
3
marker-alternative
3
docling-alternative
3