An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

npmjs.org "document-processing" keyword

View the packages on the npmjs.org package registry that are tagged with the "document-processing" keyword.

@promptbook/pdf 0.102.0
Promptbook: Turn your company's scattered knowledge into AI ready books
352 versions - Latest release: 20 days ago - 5.25 thousand downloads last month - 134 stars on GitHub - 1 maintainer
@dooor-ai/cortexdb 0.3.10
Official TypeScript/JavaScript SDK for CortexDB - Multi-modal RAG Platform with advanced document...
15 versions - Latest release: about 1 hour ago - 446 downloads last month - 1 maintainer
n8n-nodes-vector-store-processor 1.6.5
n8n node for intelligent document chunking and processing for vector store ingestion with Smart Q...
20 versions - Latest release: about 2 hours ago - 597 downloads last month - 1 maintainer
docstrange 1.0.7
Official Node.js client for Docstrange API - Extract data from PDFs, images, and documents in mul...
8 versions - Latest release: about 1 month ago - 117 downloads last month - 1 maintainer
markdoc-traverse 1.1.1
A simple and tiny traversal library for MarkDoc AST
4 versions - Latest release: about 1 year ago - 39 downloads last month - 0 stars on GitHub - 1 maintainer
@promptbook/legacy-documents 0.102.0
Promptbook: Turn your company's scattered knowledge into AI ready books
353 versions - Latest release: 20 days ago - 5.26 thousand downloads last month - 134 stars on GitHub - 1 maintainer
@promptbook/documents 0.102.0
Promptbook: Turn your company's scattered knowledge into AI ready books
356 versions - Latest release: 20 days ago - 6.02 thousand downloads last month - 134 stars on GitHub - 1 maintainer
n8n-nodes-solar 0.3.25
Solar LLM and Embeddings nodes for n8n
49 versions - Latest release: about 12 hours ago - 302 downloads last month - 0 stars on GitHub - 1 maintainer
@ninjadoc-ai/sdk 1.0.9
TypeScript SDK for document processing with zero-friction framework adapters. Features intelligen...
10 versions - Latest release: about 1 month ago - 67 downloads last month - 1 maintainer
@trsdn/mistraldocai-mcp-server 1.0.4
MCP server for document-to-Markdown conversion using Mistral AI OCR
5 versions - Latest release: about 2 months ago - 59 downloads last month - 2 stars on GitHub - 1 maintainer
@ozritesh/queue-agnostic 1.0.2
Universal queue abstraction library supporting RabbitMQ, AWS SQS, Azure Service Bus, and GCP Pub/...
3 versions - Latest release: about 1 month ago - 53 downloads last month - 1 maintainer
n8n-nodes-docx-converter-enhanced 1.0.0
Enhanced n8n community node for DOCX to text conversion with RAG capabilities, page-aware chunkin...
1 version - Latest release: 2 months ago - 20 downloads last month - 1 maintainer
rag-system-pgvector 2.3.1
A complete Retrieval-Augmented Generation system using pgvector, LangChain, and LangGraph for Nod...
10 versions - Latest release: 1 day ago - 472 downloads last month - 1 maintainer
Top 1.9% on npmjs.org
stopword 3.1.5
A module for node.js and the browser that takes in text and returns text that is stripped of stop...
76 versions - Latest release: 5 months ago - 96 dependent packages - 582 dependent repositories - 366 thousand downloads last month - 222 stars on GitHub - 2 maintainers
@bonginkan/maria 4.4.8
🚀 MARIA v4.4.8 - Enterprise AI Development Platform with identity system and character voice impl...
227 versions - Latest release: 1 day ago - 3.11 thousand downloads last month - 2 stars on GitHub - 1 maintainer
@lumina-ai-inc/chunkr-ai 0.0.1
Node.js client for Chunkr API
1 version - Latest release: 10 months ago - 230 downloads last month - 2,864 stars on GitHub - 1 maintainer
linchat 1.0.0
An intelligent AI-powered command-line chat assistant with document processing, code review, and ...
1 version - Latest release: 2 days ago - 1 maintainer
@mastra/rag 1.3.3
The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilit...
459 versions - Latest release: 9 days ago - 112 thousand downloads last month - 17,766 stars on GitHub - 11 maintainers
@iflow-mcp/pdftotext-mcp 1.0.0
A reliable Model Context Protocol server for PDF text extraction using pdftotext from poppler-utils
1 version - Latest release: 4 days ago - 0 stars on GitHub - 2 maintainers
pdftotext-mcp 1.0.0
A reliable Model Context Protocol server for PDF text extraction using pdftotext from poppler-utils
1 version - Latest release: 4 months ago - 10 downloads last month - 0 stars on GitHub - 1 maintainer
pageindex-mcp 1.6.3
MCP server for PageIndex
29 versions - Latest release: 4 days ago - 718 downloads last month - 1 maintainer
n8n-nodes-docx-genie-pro 0.1.2
n8n node package for DOCX document manipulation and processing
3 versions - Latest release: 5 months ago - 23 downloads last month - 0 stars on GitHub - 1 maintainer
n8n-nodes-mistral-ocr 1.0.0
n8n node for Mistral OCR API integration with structured annotations
18 versions - Latest release: 5 months ago - 358 downloads last month - 2 stars on GitHub - 1 maintainer
chatbot-test-william 1.0.16
A flexible and customizable React chat component that supports context-aware conversations and do...
17 versions - Latest release: 5 months ago - 82 downloads last month - 1 maintainer
n8n-nodes-upstage 0.1.0
Upstage LLM and Embeddings nodes for n8n
1 version - Latest release: 8 days ago
@praxlannister/mdexport-core 2.0.0
Core processing engine for MDExport
1 version - Latest release: 6 days ago - 1 maintainer
static-research-engine 1.0.0
Transform documents into structured, queryable span artifacts with intelligent search and ranking
1 version - Latest release: 7 days ago
create-autollama 0.0.1
Placeholder scaffolder for AutoLlama. Creates a new folder (default: autollama) and points to the...
1 version - Latest release: 2 months ago - 9 downloads last month - 24 stars on GitHub - 1 maintainer
passport-ocr-api 1.1.5 💰
Passport OCR API client for extracting passport data from images and PDF files using OCR technology.
1 version - Latest release: about 1 month ago - 83 downloads last month - 3,143 stars on GitHub - 1 maintainer
asciidoctor-html-to-markdown 1.0.0-beta.1
A modular document processing system for converting HTML to Markdown
1 version - Latest release: 3 months ago - 4 downloads last month - 0 stars on GitHub - 1 maintainer
@wdelhagen/textprep 0.1.0
Document text extraction with pluggable extractors. Supports PDF, DOCX, DOC, RTF, TXT, and image ...
1 version - Latest release: 10 days ago - 1 maintainer
langchain-chatbot-react-app 1.0.0
A flexible and customizable React chat component that supports context-aware conversations and do...
1 version - Latest release: 9 months ago - 5 downloads last month - 1 stars on GitHub - 1 maintainer
n8n-nodes-extract-pdf 1.0.26
n8n node to extract text, images and tables from PDF with multilingual support, language detectio...
24 versions - Latest release: 7 months ago - 194 downloads last month - 1 maintainer
parze 0.1.0
TypeScript SDK for the Parze API
1 version - Latest release: 10 days ago - 1 maintainer
koncile-js 0.1.4
JavaScript SDK for the Koncile Intelligent Document Processing API
5 versions - Latest release: 4 months ago - 18 downloads last month - 1 maintainer
n8n-nodes-inner-batched-chain-summarization 0.1.2
n8n community node with intelligent batched chain summarization for processing large documents ef...
3 versions - Latest release: about 1 month ago - 58 downloads last month - 1 maintainer
@jmndao/mongoose-ai 1.4.0
AI-powered Mongoose plugin for intelligent document processing with auto-summarization, semantic ...
10 versions - Latest release: 4 months ago - 172 downloads last month - 3 stars on GitHub - 1 maintainer
n8n-nodes-extract-monster 1.0.1
AI-powered data extraction from PDFs, images, documents, audio, and video. Extract invoices, rece...
1 version - Latest release: 11 days ago - 1 maintainer
@lucianaib/word-cloud-mcp 1.3.1
一个专注于从文档内容制作词云图的 MCP 工具,支持 PDF、Word、TXT、MD 等多种格式的智能文字提取,具备美观的螺旋布局算法
5 versions - Latest release: 3 months ago - 18 downloads last month - 0 stars on GitHub - 1 maintainer
autollama 3.0.10
Modern JavaScript-first RAG framework with contextual embeddings, professional CLI, and one-comma...
15 versions - Latest release: 2 months ago - 216 downloads last month - 24 stars on GitHub - 1 maintainer
sensible-api 0.0.12
Javascript SDK for Sensible, the developer-first platform for extracting structured data from doc...
12 versions - Latest release: 2 months ago - 3.63 thousand downloads last month - 0 stars on GitHub - 1 maintainer
ppu-paddle-ocr 3.1.1
Blazing-fast and lightweight PaddleOCR library for Node.js and Bun. Perform accurate text detecti...
21 versions - Latest release: 4 months ago - 175 downloads last month - 19 stars on GitHub - 1 maintainer
ignidor-idp-mcp 1.0.3
MCP server for Ignidor IDP B2B API integration - enables Claude to process documents through Igni...
4 versions - Latest release: about 1 month ago - 61 downloads last month - 1 maintainer
stopword-trainer 1.1.1
A module for creating stopword lists for any language, based on a set of documents.
29 versions - Latest release: over 3 years ago - 1 dependent package - 1 dependent repositories - 243 downloads last month - 15 stars on GitHub - 1 maintainer
n8n-nodes-docutray 0.5.2
n8n community nodes for Docutray OCR, document identification, and knowledge base search services
9 versions - Latest release: 13 days ago - 613 downloads last month - 0 stars on GitHub - 1 maintainer
n8n-nodes-google-gemini-embeddings-extended 0.2.3
n8n community sub-node for Google Gemini Embeddings with extended features like output dimensions...
7 versions - Latest release: 13 days ago - 105 downloads last month - 8 stars on GitHub - 2 maintainers
n8n-nodes-google-vertex-embeddings-extended 0.8.1
n8n community sub-node for Google Vertex AI Embeddings with output dimensions and configurable ba...
15 versions - Latest release: 13 days ago - 98 downloads last month - 8 stars on GitHub - 2 maintainers
n8n-nodes-semantic-splitter-with-context 0.5.0
Semantic Splitter with Context for n8n with LangChain integration
8 versions - Latest release: 13 days ago - 464 downloads last month - 8 stars on GitHub - 2 maintainers
peslac 1.1.3
A Node.js package to interact with the Peslac API for document processing.
14 versions - Latest release: 9 months ago - 85 downloads last month - 0 stars on GitHub - 1 maintainer
mastra-browser-rag 0.0.9
The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilit...
9 versions - Latest release: 7 months ago - 29 downloads last month - 1 maintainer
context1000 0.1.8
**context1000** is a documentation format for software systems, designed for integration with art...
14 versions - Latest release: 3 months ago - 139 downloads last month - 1 maintainer
hashub-docapp-js 1.0.0
JavaScript/TypeScript SDK for Hashub Document Processing API
1 version - Latest release: 3 months ago - 3 downloads last month - 0 stars on GitHub - 1 maintainer
@nutrient-sdk/document-engine-mcp-server 0.0.1
MCP server for Nutrient Document Engine
1 version - Latest release: 4 months ago - 39 downloads last month - 56 stars on GitHub - 5 maintainers
treechunk 1.1.0
Hierarchical markdown chunking for RAG systems with AI-powered context summarization
5 versions - Latest release: 3 months ago - 30 downloads last month - 0 stars on GitHub - 1 maintainer
@docrouter/mcp 0.2.0
TypeScript MCP server for DocRouter API
4 versions - Latest release: 15 days ago - 232 downloads last month - 1 maintainer
@docrouter/sdk 0.2.0
TypeScript SDK for DocRouter API
2 versions - Latest release: 15 days ago - 110 downloads last month - 1 maintainer
n8n-nodes-unstract 0.4.2
n8n nodes for Unstract services including LLMWhisperer and Unstract API
14 versions - Latest release: 2 months ago - 224 downloads last month - 0 stars on GitHub - 2 maintainers
@entro314labs/mdd 0.0.10
Semantic document layer for AI-to-Office pipeline. Transform markdown into professional PDF/DOCX ...
4 versions - Latest release: 16 days ago - 244 downloads last month - 1 maintainer
lcp-nodes 1.0.43
Node RED Custom Nodes for LCP
1 version - Latest release: 5 months ago - 14 downloads last month - 1 maintainer
n8n-nodes-n8ntools-graphrag-agent 1.5.1 unpublished
Complete GraphRAG integration with both AI Agent and Tool Node - N8N Tools proprietary GraphRAG i...
13 versions - Latest release: about 2 months ago - 1.11 thousand downloads last month - 1 maintainer
n8n-nodes-google-vertex-embeddings-plus 0.3.3 unpublished
n8n community sub-node for Google Vertex AI Embeddings with output dimensions support - use with ...
1 version - Latest release: 5 months ago - 8 stars on GitHub - 1 maintainer
n8n-nodes-sse-trigger-extended 0.1.0
Extended SSE trigger node for n8n with custom headers support
1 version - Latest release: 5 months ago - 23 downloads last month - 8 stars on GitHub - 1 maintainer
@resettech/n8n-nodes-semantic-text-splitter 0.2.2 removed
n8n community sub-node for Semantic Double-Pass Merging text splitting with embeddings-based chun...
1 version - Latest release: 5 months ago - 8 stars on GitHub - 1 maintainer
n8n-nodes-contextual-document-loader 0.4.0
⚠️ DEPRECATED: Use n8n-nodes-semantic-splitter-with-context instead. This package has known issue...
12 versions - Latest release: 5 months ago - 27 downloads last month - 8 stars on GitHub - 1 maintainer
n8n-nodes-query-retriever-rerank 0.4.1
Advanced n8n community node for intelligent document retrieval with multi-step reasoning, reranki...
2 versions - Latest release: 5 months ago - 843 downloads last month - 8 stars on GitHub - 2 maintainers
n8n-nodes-ninjadoc 1.0.0
Actions to access Ninjadoc AI API
1 version - Latest release: 17 days ago - 1 maintainer
@ninjadoc-ai/n8n 1.0.0
Actions to access Ninjadoc AI API
1 version - Latest release: 17 days ago - 1 maintainer
llm-gen 1.0.3
A CLI tool to extract text from a static Next.js export and generate llm.txt for LLM ingestion.
2 versions - Latest release: 3 months ago - 30 downloads last month - 3 stars on GitHub - 1 maintainer
pdf-utils-rust 0.1.1
PDF and image processing utilities compiled to WebAssembly - Fast, secure, client-side file proce...
1 version - Latest release: 18 days ago - 0 stars on GitHub - 1 maintainer
n8n-nodes-puter-ai 2.0.4
Advanced n8n node for Puter.js AI with RAG agentic capabilities, document processing, audio trans...
8 versions - Latest release: 3 months ago - 59 downloads last month - 1 maintainer
typeless-mcp-server 1.0.0
MCP server for Typeless.ai - Build APIs from documents in your IDE
1 version - Latest release: 18 days ago - 1 maintainer
@paulmeller/docflow 0.0.28
A developer-friendly transformation engine for programmatic document manipulation
26 versions - Latest release: 18 days ago - 2.72 thousand downloads last month - 1 maintainer
nanonets 2.0.1
Node.js SDK for the Nanonets API: OCR, document extraction, and workflow automation.
14 versions - Latest release: 5 months ago - 2 dependent packages - 36 downloads last month - 0 stars on GitHub - 3 maintainers
n8n-nodes-agentic-rag-supabase 1.0.0
Advanced n8n node for Agentic RAG with Supabase pgvector - handles structured/unstructured docume...
1 version - Latest release: 3 months ago - 10 downloads last month - 1 maintainer
@abhi-arya1/mastra-minirag 1.0.0
Minimal recursive text chunking functionality extracted from @mastra/rag for edge deployments
1 version - Latest release: 21 days ago
docrouter-mcp 0.1.0
TypeScript MCP server for DocRouter API
1 version - Latest release: 21 days ago
@project-lakechain/polly-synthesizer 0.10.0
Synthesizes text into speech using Amazon Polly.
7 versions - Latest release: about 1 year ago - 12 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/layers 0.10.0
Lambda layer library used by Project Lakechain.
7 versions - Latest release: about 1 year ago - 16 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/newspaper3k 0.10.0
Extracts text and metadata from HTML documents.
7 versions - Latest release: about 1 year ago - 8 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/ollama-processor 0.10.0
Processes documents using models supported by Ollama.
7 versions - Latest release: about 1 year ago - 5 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/bedrock-image-generators 0.10.0
Image generation using Amazon Bedrock models.
7 versions - Latest release: about 1 year ago - 10 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/recursive-character-text-splitter 0.10.0
Transforms text into chunks of tokens using Langchain's recursive character text splitter.
7 versions - Latest release: about 1 year ago - 33 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/subtitle-processor 0.10.0
Parses subtitle documents into text and structured data.
5 versions - Latest release: about 1 year ago - 5 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/image-layer-processor 0.10.0
Applies layer operations on images.
7 versions - Latest release: about 1 year ago - 10 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/translate-text-processor 0.10.0
Translates text documents asynchronously using Amazon Translate.
7 versions - Latest release: about 1 year ago - 15 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/blip2-image-processor 0.10.0
A middleware extracting image captioning information from images using the Blip2 model.
7 versions - Latest release: about 1 year ago - 9 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/canny-edge-detector 0.10.0
Creates a new image with the edges detected using the Canny edge detector algorithm.
3 versions - Latest release: about 1 year ago - 10 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/semantic-ontology-extractor 0.10.0
Extracts semantic ontology from processed documents.
3 versions - Latest release: about 1 year ago - 6 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/opensearch-index 0.10.0
Creates an OpenSearch index using AWS CDK.
7 versions - Latest release: about 1 year ago - 8 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/pdf-text-converter 0.10.0
Converts PDF documents into different formats.
7 versions - Latest release: about 1 year ago - 9 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/rembg-image-processor 0.10.0
Automatically remove background from images using Rembg.
5 versions - Latest release: about 1 year ago - 14 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/jmespath-processor 0.10.0
Applies JMESPath expressions to JSON documents.
7 versions - Latest release: about 1 year ago - 5 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/opensearch-saved-object 0.10.0
Uploads a saved object to OpenSearch using AWS CDK.
7 versions - Latest release: about 1 year ago - 9 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/sentence-transformers 0.10.0
Creates embeddings from text-oriented documents using Sentence Transformers models.
7 versions - Latest release: about 1 year ago - 11 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/delay 0.10.0
A middleware inserting a time delay within a pipeline.
7 versions - Latest release: about 1 year ago - 6 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/opensearch-collection 0.10.0
Creates an OpenSearch Serverless collection in a VPC.
7 versions - Latest release: about 1 year ago - 23 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/tar-inflate-processor 0.5.0 removed
Inflates tarballs from a source to a destination bucket.
3 versions - Latest release: over 1 year ago - 62 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/transformers-text-summarizer 0.4.0 removed
Provides abstractive text summarization using the HuggingFace transformers library.
2 versions - Latest release: over 1 year ago - 11 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/zip-inflate-processor 0.5.0 removed
Inflates Zip archives from a source to a destination bucket.
3 versions - Latest release: over 1 year ago - 65 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/character-text-splitter 0.10.0
Transforms text into chunks of tokens using Langchain's character text splitter.
7 versions - Latest release: about 1 year ago - 6 downloads last month - 186 stars on GitHub - 1 maintainer