An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "document processing" keyword

rainbow-pdf-processor 0.1.0
A powerful PDF processing tool with text extraction, table recognition, and image extraction capa...
1 version - Latest release: about 1 year ago - 10 downloads last month - 0 stars on GitHub - 1 maintainer
detect-row 2.0.5
Hệ thống trích xuất bảng, hàng, cột hoàn chỉnh với AI và GPU support
13 versions - Latest release: 10 months ago - 30 downloads last month - 1 maintainer
doctr-labeler 0.3.1
A Python package for labeling and annotating documents
12 versions - Latest release: about 2 months ago - 168 downloads last month - 15 stars on GitHub - 1 maintainer
docu-devs-api-client 1.9.0
A client library for accessing DocuDevs API
38 versions - Latest release: 19 days ago - 665 downloads last month - 1 maintainer
llmgraphtransformer 0.1.0
A powerful tool for transforming documents into graph-based structures using Large Language Model...
3 versions - Latest release: about 1 year ago - 670 downloads last month - 6 stars on GitHub - 1 maintainer
bbox-align 0.2.8
A python library that reorders bounding boxes generated by OCR engines into the correct reading o...
11 versions - Latest release: 10 months ago - 578 downloads last month - 11 stars on GitHub - 1 maintainer
pydocai 0.1.0
Extract text from PDFs using pypdfium2 with OCR fallback via pytesseract
1 version - Latest release: 3 months ago - 24 downloads last month - 1 maintainer
docutray 0.2.0
Python library for the DocuTray API
2 versions - Latest release: 19 days ago - 1 maintainer
docling-agent 0.1.0
A python library to simplify agentic operations on documents, such as writing, editing, summarizi...
1 version - Latest release: 16 days ago
intelisys 0.5.6
Intelligence/AI services for the Lifsys Enterprise with enhanced max_history_words, efficient his...
37 versions - Latest release: over 1 year ago - 143 downloads last month - 0 stars on GitHub - 1 maintainer
docling-ocr-onnxtr 0.2.1 💰
Onnx Text Recognition (OnnxTR) OCR plugin for docling
6 versions - Latest release: 3 months ago - 27 thousand downloads last month - 11 stars on GitHub - 1 maintainer
chunklet-py 2.2.0
High-fidelity context-aware chunking and interactive visualization for RAG. Advanced segmentation...
8 versions - Latest release: 2 months ago - 283 downloads last month - 62 stars on GitHub - 1 maintainer
onnxtr 0.8.1 💰
Onnx Text Recognition (OnnxTR): docTR Onnx-Wrapper for high-performance OCR on documents.
18 versions - Latest release: 3 months ago - 83.2 thousand downloads last month - 159 stars on GitHub - 1 maintainer
arkeo 0.2.6
markdown archiver betasaurus
8 versions - Latest release: 5 months ago - 53 downloads last month - 1 maintainer
docnav 1.0.1
AI-powered document querying with citations
2 versions - Latest release: 3 months ago - 53 downloads last month - 1 maintainer
pdfsmith 0.2.0
PDF to Markdown conversion with multiple backend support
1 version - Latest release: 5 months ago - 23 downloads last month - 1 stars on GitHub - 1 maintainer
file-parse-by-bajirao 0.1.0
Universal Document Processor for LLM Processing - extracts text, tables, numeric data, and metada...
1 version - Latest release: 5 months ago - 15 downloads last month - 1 maintainer
textextraction 0.1.4
Extract and process text from images and PDFs
5 versions - Latest release: about 1 year ago - 43 downloads last month - 0 stars on GitHub - 1 maintainer
llama-index-readers-layoutir 0.1.1
llama-index readers LayoutIR integration
1 version - Latest release: 2 months ago - 135 downloads last month - 1 maintainer
lex-pdftotext 1.0.0
Extract and structure text from Brazilian legal PDF documents (PJe format)
1 version - Latest release: 4 months ago - 19 downloads last month - 1 maintainer
chunklet 1.4.0
A smart multilingual text chunker for LLMs, RAG, and beyond.
19 versions - Latest release: 8 months ago - 162 downloads last month - 23 stars on GitHub - 1 maintainer
doc2data 0.2.0
Integrated document processing with machine learning.
3 versions - Latest release: over 3 years ago - 1 dependent repositories - 188 downloads last month - 10 stars on GitHub - 1 maintainer
py-document-chunker 0.3.0 removed
A state-of-the-art Python package for advanced text segmentation (chunking).
2 versions - Latest release: 7 months ago - 265 downloads last month - 0 stars on GitHub - 1 maintainer
Related Keywords
ocr 9 pdf 6 text extraction 6 rag 5 OCR 5 natural language processing 4 ai 4 llm 4 markdown 3 computer vision 3 chunking 3 machine learning 3 nlp 3 deep learning 3 docTR 3 document analysis 2 text recognition 2 text detection 2 onnx 2 openai 2 docling 2 RAG 2 document AI 2 deep-learning 2 onnxruntime 2 text-detection 2 text-recognition 2 text-splitting 2 multilingual 2 text processing 2 data processing 2 information retrieval 2 semantic search 2 text-detection-recognition 2 onnxtr 2 table extraction 2 image processing 2 claude 1 layout analysis 1 layoutir 1 chunks-processing 1 legal documents 1 chunks-algorithm 1 brazilian law 1 pje 1 lex-intelligentia 1 pdf parsing 1 text chunking 1 text splitting 1 chunk-visualization 1 Retrieval-Augmented Generation 1 NLP 1 langchain 1 llamaindex 1 document 1 code structure 1 programming languages 1 source code analysis 1 gemini 1 citations 1 text analysis 1 document management 1 retrieval augmented generation 1 query 1 search 1 pdf-markdown-pdf-parser-ocr 1 corpus 1 docx 1 xlsx 1 indexing 1 archiving 1 optical-character-recognition 1 document-recognition 1 csv 1 visualization 1 table detection 1 natural-language-processing 1 document-chunking 1 code-structure 1 code-chunking 1 IR 1 PDF 1 code-chunker 1 lines 1 bounding boxes 1 document to graph 1 text to graph 1 graph transformation 1 LLM 1 document to data 1 docudevs client 1 ai document processing 1 labeling-tool 1 doctr 1 automation 1 OnnxTR 1 annotation 1 labeling 1 automated extraction 1 GPU 1