An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

npmjs.org "document-processing" keyword

blazedocs 2.0.3
Agent-first CLI for PDF → Markdown. JSON everywhere, --raw, structured errors. Humans get the pol...
7 versions - Latest release: about 14 hours ago - 0 stars on GitHub - 1 maintainer
autollama 3.0.10
Modern JavaScript-first RAG framework with contextual embeddings, professional CLI, and one-comma...
15 versions - Latest release: 8 months ago - 216 downloads last month - 24 stars on GitHub - 1 maintainer
@project-lakechain/scheduler-event-trigger 0.10.0
Triggers pipelines upon scheduling events.
7 versions - Latest release: over 1 year ago - 1 downloads last month - 186 stars on GitHub - 1 maintainer
@mastra/rag 2.2.1
The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilit...
769 versions - Latest release: 3 days ago - 213 thousand downloads last month - 23,026 stars on GitHub - 9 maintainers
n8n-nodes-sse-trigger-extended 0.1.0
Extended SSE trigger node for n8n with custom headers support
1 version - Latest release: 10 months ago - 23 downloads last month - 8 stars on GitHub - 1 maintainer
@project-lakechain/delay 0.10.0
A middleware inserting a time delay within a pipeline.
7 versions - Latest release: over 1 year ago - 6 downloads last month - 186 stars on GitHub - 1 maintainer
n8n-nodes-mistral-ocr 1.0.0
n8n node for Mistral OCR API integration with structured annotations
18 versions - Latest release: 10 months ago - 166 downloads last month - 2 stars on GitHub - 1 maintainer
@project-lakechain/nlp-text-processor 0.10.0
Extracts features from text documents using natural language processing.
7 versions - Latest release: over 1 year ago - 13 downloads last month - 186 stars on GitHub - 1 maintainer
document-photo-to-text-ai 1.0.0
A universal document processor that extracts text from various file formats including PDFs, image...
1 version - Latest release: 6 months ago - 4 downloads last month - 1 maintainer
@project-lakechain/laplacian-image-processor 0.10.0
Computes the Laplacian variance of images.
3 versions - Latest release: over 1 year ago - 4 downloads last month - 186 stars on GitHub - 1 maintainer
@promptbook/documents 0.110.0
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
495 versions - Latest release: 2 months ago - 6.39 thousand downloads last month - 152 stars on GitHub - 1 maintainer
@project-lakechain/semantic-ontology-extractor 0.10.0
Extracts semantic ontology from processed documents.
3 versions - Latest release: over 1 year ago - 7 downloads last month - 186 stars on GitHub - 1 maintainer
pdf-image-extractor 2.0.8
pdf
4 versions - Latest release: about 2 years ago - 304 downloads last month - 5 stars on GitHub - 1 maintainer
n8n-nodes-pdf-excel 0.0.7
N8N nodes for processing PDF and Excel files
6 versions - Latest release: about 1 year ago - 397 downloads last month - 1 maintainer
@project-lakechain/syndication-feed-processor 0.10.0
Parses RSS and Atom syndication feeds.
7 versions - Latest release: over 1 year ago - 8 downloads last month - 186 stars on GitHub - 1 maintainer
ainative-zerodb-memory-mcp 1.0.2
AINative ZeroDB Memory MCP Server - 6 optimized tools for agent memory with smart context managem...
3 versions - Latest release: about 1 month ago - 48 downloads last month - 0 stars on GitHub - 1 maintainer
hashub-docapp-js 1.0.0
JavaScript/TypeScript SDK for Hashub Document Processing API
1 version - Latest release: 8 months ago - 3 downloads last month - 0 stars on GitHub - 1 maintainer
Top 1.9% on npmjs.org
stopword 3.1.5
A module for node.js and the browser that takes in text and returns text that is stripped of stop...
76 versions - Latest release: 11 months ago - 96 dependent packages - 582 dependent repositories - 800 thousand downloads last month - 222 stars on GitHub - 2 maintainers
@project-lakechain/transform 0.10.0
A middleware allowing to transform documents on-the-fly.
5 versions - Latest release: over 1 year ago - 7 downloads last month - 186 stars on GitHub - 1 maintainer
@promptbook/pdf 0.110.0
Promptbook: Create persistent AI agents that turn your company's scattered knowledge into action
488 versions - Latest release: 2 months ago - 5.3 thousand downloads last month - 152 stars on GitHub - 1 maintainer
unsiloed-sdk 0.2.0
JavaScript/TypeScript SDK for Unsiloed Vision API - Parse, Extract, Classify, and Split documents
3 versions - Latest release: about 2 months ago - 102 downloads last month - 1 maintainer
@iflow-mcp/doc-ops-mcp 0.3.8
MCP Document Converter Server — A Model Context Protocol server for seamless document format conv...
1 version - Latest release: 5 months ago - 2 maintainers
@project-lakechain/cli 0.10.0
📟 The official CLI for Project Lakechain.
3 versions - Latest release: over 1 year ago - 6 downloads last month - 186 stars on GitHub - 1 maintainer
n8n-nodes-pdf-accessibility 3.0.0
AI-powered PDF accessibility automation for N8N - comprehensive WCAG compliance analysis, intelli...
24 versions - Latest release: 11 months ago - 135 downloads last month - 0 stars on GitHub - 1 maintainer
asciidoctor-html-to-markdown 1.0.0-beta.1
A modular document processing system for converting HTML to Markdown
1 version - Latest release: 9 months ago - 7 downloads last month - 0 stars on GitHub - 1 maintainer
@ninjadoc-ai/sdk 1.0.9
TypeScript SDK for document processing with zero-friction framework adapters. Features intelligen...
10 versions - Latest release: 7 months ago - 43 downloads last month - 1 maintainer
n8n-nodes-power-document-extractor 0.13.7
Power Document Extractor – universal local document parser for n8n
11 versions - Latest release: 5 months ago - 117 downloads last month - 1 maintainer
n8n-nodes-puter-ai 2.0.4
Advanced n8n node for Puter.js AI with RAG agentic capabilities, document processing, audio trans...
8 versions - Latest release: 9 months ago - 59 downloads last month - 1 maintainer
pdf-oxide-wasm 0.3.37 💰
Fast, zero-dependency PDF toolkit for Node.js, browsers, and edge runtimes — text extraction, mar...
25 versions - Latest release: 4 days ago - 2.5 thousand downloads last month - 580 stars on GitHub - 1 maintainer
@project-lakechain/whisper-transcriber 0.10.0
An audio document transcription middleware based on OpenAI Whisper.
7 versions - Latest release: over 1 year ago - 8 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/bark-synthesizer 0.10.0
Synthesize text to audio using the Bark model.
7 versions - Latest release: over 1 year ago - 17 downloads last month - 186 stars on GitHub - 1 maintainer
docuglean-ocr 1.0.0
An SDK for intelligent document processing using State of the Art AI models.
1 version - Latest release: 8 months ago - 9 downloads last month - 5 stars on GitHub - 1 maintainer
@project-lakechain/ollama-processor 0.10.0
Processes documents using models supported by Ollama.
7 versions - Latest release: over 1 year ago - 5 downloads last month - 186 stars on GitHub - 1 maintainer
@ozritesh/queue-agnostic 1.0.2
Universal queue abstraction library supporting RabbitMQ, AWS SQS, Azure Service Bus, and GCP Pub/...
3 versions - Latest release: 7 months ago - 53 downloads last month - 1 maintainer
@project-lakechain/video-metadata-extractor 0.10.0
Extracts the metadata of video files.
7 versions - Latest release: over 1 year ago - 7 downloads last month - 186 stars on GitHub - 1 maintainer
@ninjadoc-ai/n8n 1.0.2
Actions to access Ninjadoc AI API
3 versions - Latest release: 6 months ago - 1 maintainer
@project-lakechain/clip-image-processor 0.10.0
A document processor generating embeddings for images using the OpenAI Clip model.
7 versions - Latest release: over 1 year ago - 6 downloads last month - 186 stars on GitHub - 1 maintainer
n8n-nodes-upstage 0.4.1
Upstage LLM and Embeddings nodes for n8n
11 versions - Latest release: about 1 month ago - 584 downloads last month - 1 stars on GitHub - 3 maintainers
yq-pdf 0.0.2
High-performance PDF manipulation library with native processing capabilities. Supports encryptio...
2 versions - Latest release: 9 months ago - 100 downloads last month - 1 maintainer
rag-system-pgvector 2.4.9
A complete Retrieval-Augmented Generation system using pgvector, LangChain, and LangGraph for Nod...
21 versions - Latest release: 4 months ago - 219 downloads last month - 1 maintainer
@project-lakechain/tar-processor 0.10.0
Inflates and deflates Tar documents from a source to a destination bucket.
5 versions - Latest release: over 1 year ago - 17 downloads last month - 186 stars on GitHub - 1 maintainer
flux-vector 1.0.1
Lightweight browser-based semantic search library with HNSW vector index and transformer embeddings
2 versions - Latest release: 6 months ago - 11 downloads last month - 1 maintainer
@timangames/vector-grounding-service 1.1.5
A REST wrapper for SAP AI Core Vector API with document grounding capabilities
8 versions - Latest release: 7 months ago - 201 downloads last month - 1 maintainer
@project-lakechain/canny-edge-detector 0.10.0
Creates a new image with the edges detected using the Canny edge detector algorithm.
3 versions - Latest release: over 1 year ago - 10 downloads last month - 186 stars on GitHub - 1 maintainer
@tfw.in/structura-sdk 0.1.0
TypeScript SDK for Saral Structura, providing Zod schemas and validation for document processing ...
1 version - Latest release: 12 months ago - 165 downloads last month - 1 maintainer
@project-lakechain/structured-entity-extractor 0.10.0
Extracts structured entities from processed documents.
3 versions - Latest release: over 1 year ago - 7 downloads last month - 186 stars on GitHub - 1 maintainer
n8n-nodes-docx-genie 0.1.0
n8n node package for DOCX document manipulation and processing
1 version - Latest release: 11 months ago - 8 downloads last month - 0 stars on GitHub - 1 maintainer
llm-gen 1.0.3
A CLI tool to extract text from a static Next.js export and generate llm.txt for LLM ingestion.
2 versions - Latest release: 9 months ago - 19 downloads last month - 3 stars on GitHub - 1 maintainer
n8n-nodes-agentic-rag-supabase 1.0.0
Advanced n8n node for Agentic RAG with Supabase pgvector - handles structured/unstructured docume...
1 version - Latest release: 9 months ago - 10 downloads last month - 1 maintainer
@project-lakechain/image-metadata-extractor 0.10.0
Extracts the metadata of images.
7 versions - Latest release: over 1 year ago - 5 downloads last month - 186 stars on GitHub - 1 maintainer
n8n-nodes-inner-batched-chain-summarization 0.1.2
n8n community node with intelligent batched chain summarization for processing large documents ef...
3 versions - Latest release: 7 months ago - 113 downloads last month - 1 maintainer
n8n-nodes-doclayer 0.1.0
n8n community node for Doclayer - AI-powered document processing, extraction, and search
1 version - Latest release: 4 months ago - 1 maintainer
@transloadit/mcp-server 0.3.17
Transloadit MCP server
25 versions - Latest release: 7 days ago - 1.08 thousand downloads last month - 71 stars on GitHub - 3 maintainers
hybrid-form-ai 0.1.4
Hybrid paper + digital form collection powered by multimodal LLMs
5 versions - Latest release: 8 days ago - 1 maintainer
@23blocks/block-rag 2.1.0
RAG block for 23blocks SDK - vector search, document processing, image search, product identifica...
4 versions - Latest release: about 2 months ago - 94 downloads last month - 4 stars on GitHub - 1 maintainer
stopword-trainer 1.1.1
A module for creating stopword lists for any language, based on a set of documents.
29 versions - Latest release: over 3 years ago - 1 dependent package - 1 dependent repositories - 243 downloads last month - 15 stars on GitHub - 1 maintainer
n8n-nodes-extract-pdf 1.0.26
n8n node to extract text, images and tables from PDF with multilingual support, language detectio...
24 versions - Latest release: about 1 year ago - 116 downloads last month - 1 maintainer
@mixpeek/n8n-nodes-mixpeek 1.0.11
n8n community node for Mixpeek - multimodal data processing and semantic search API
12 versions - Latest release: about 1 month ago - 1 stars on GitHub - 2 maintainers
n8n-nodes-semantic-splitter-with-context 0.6.2
Semantic Splitter with Context for n8n with LangChain integration
13 versions - Latest release: 6 months ago - 310 downloads last month - 9 stars on GitHub - 2 maintainers
static-research-engine 1.0.2
Transform documents into structured, queryable span artifacts with intelligent search and ranking
2 versions - Latest release: 6 months ago - 7 downloads last month - 0 stars on GitHub - 1 maintainer
chatbot-test-william 1.0.16
A flexible and customizable React chat component that supports context-aware conversations and do...
17 versions - Latest release: 11 months ago - 7 downloads last month - 1 maintainer
@piwi.ai/business-schema-configurations 1.0.4
JSON schema configurations for intelligent document processing — document types, entity types, en...
5 versions - Latest release: 2 months ago - 1 maintainer
context1000 0.1.8
**context1000** is a documentation format for software systems, designed for integration with art...
14 versions - Latest release: 8 months ago - 139 downloads last month - 1 maintainer
@docrouter/sdk 1.0.0
TypeScript SDK for DocRouter API
5 versions - Latest release: 3 months ago - 110 downloads last month - 1 maintainer
@project-lakechain/elevenlabs-synthesizer 0.10.0
Synthesizes text into speech using the Elevenlabs API.
3 versions - Latest release: over 1 year ago - 3 downloads last month - 186 stars on GitHub - 1 maintainer
@base64ai/n8n-nodes-base64ai 2.0.1
Official Base64.ai community node for n8n
4 versions - Latest release: 11 days ago - 130 downloads last month - 2 maintainers
@lucianaib/word-cloud-mcp 3.0.0
一个专注于从文档内容制作词云图的 MCP 工具,支持 PDF、Word、TXT、MD 等多种格式的智能文字提取,具备优化的螺旋布局算法和多种输出格式
12 versions - Latest release: 5 months ago - 586 downloads last month - 0 stars on GitHub - 1 maintainer
@project-lakechain/audio-metadata-extractor 0.10.0
A middleware extracting metadata from audio documents.
7 versions - Latest release: over 1 year ago - 8 downloads last month - 186 stars on GitHub - 1 maintainer
@gabsterai/ocr-client 1.2.1
TypeScript/JavaScript client SDK for OCR Service API
4 versions - Latest release: 3 months ago - 1 maintainer
n8n-nodes-tika 0.1.2
n8n community node for Apache Tika — extract text, metadata, detect MIME types and languages from...
3 versions - Latest release: 13 days ago - 1 maintainer
mcp-server-docpipe 1.0.0
MCP server for document processing - PDF extract/merge/split, DOCX to Markdown, image resize/comp...
1 version - Latest release: 13 days ago - 1 maintainer
n8n-nodes-docutray 0.5.2
n8n community nodes for Docutray OCR, document identification, and knowledge base search services
9 versions - Latest release: 6 months ago - 613 downloads last month - 0 stars on GitHub - 1 maintainer
@project-lakechain/sqs-event-trigger 0.10.0
Triggers pipelines upon events being emitted from an SQS queue.
7 versions - Latest release: over 1 year ago - 13 downloads last month - 186 stars on GitHub - 1 maintainer
@aidalinfo/office-to-markdown 1.0.2
Modern TypeScript library for converting Office documents (DOCX) to Markdown format, optimized fo...
3 versions - Latest release: 8 months ago - 18 downloads last month - 6 stars on GitHub - 1 maintainer
knowledge-mgmt-mcp 1.1.0
Production-ready MCP server for document ingestion and knowledge management with vector search. S...
6 versions - Latest release: 7 months ago - 28 downloads last month - 1 maintainer
@portablecore/document-processing 0.1.1
Shared document processing types, category detection, status machine, provider definitions, and c...
2 versions - Latest release: about 2 months ago - 1 maintainer
@songlingqing/mcp-batch-processing 0.0.1
MCP服务器用于分步审核可行性研究报告等文件的批处理
1 version - Latest release: 10 months ago - 3 downloads last month - 1 maintainer
@project-lakechain/ecs-cluster 0.10.0
A managed ECS cluster construct with quick autoscaling and EFS storage.
7 versions - Latest release: over 1 year ago - 17 downloads last month - 186 stars on GitHub - 1 maintainer
ppu-paddle-ocr 5.1.1 💰
Blazing-fast and lightweight PaddleOCR library for Web, Node.js and Bun. Perform accurate text de...
37 versions - Latest release: 13 days ago - 641 downloads last month - 34 stars on GitHub - 1 maintainer
@sybil-studio-devs/sdk 0.2.0
Official SDK for Sybil AI - Embeddable wiki (Atlas), file storage (Nexus), document processing, Y...
3 versions - Latest release: 3 months ago - 1 maintainer
n8n-nodes-doctr 0.1.7
Extract text from images using docTR OCR in n8n workflows
8 versions - Latest release: 5 months ago - 176 downloads last month - 5,497 stars on GitHub - 1 maintainer
@sea-dev/widget-ng 0.1.25
Native Angular widget for Sea.dev document processing
15 versions - Latest release: about 2 months ago - 74 downloads last month - 2 maintainers
@project-lakechain/email-text-processor 0.10.0
Parse e-mail documents into different output formats.
7 versions - Latest release: over 1 year ago - 44 downloads last month - 186 stars on GitHub - 1 maintainer
lichat 1.0.0
A configurable chatbot and email processor that collects user-defined data points from both chat ...
1 version - Latest release: 3 months ago - 8 downloads last month - 1 maintainer
ppu-doc-correction 1.0.1
Lightweight, type-safe document image correction for Node.js, Bun, and browsers. Provides orienta...
2 versions - Latest release: 16 days ago - 1 maintainer
n8n-nodes-structura 0.2.2
n8n community node for Structura — Transform documents (PDF, images) into structured JSON, Excel,...
4 versions - Latest release: 16 days ago - 1 maintainer
docuprox-mcp 1.0.0
MCP server for DocuProx document processing API
1 version - Latest release: 16 days ago - 1 maintainer
@project-lakechain/hashing-image-processor 0.10.0
Computes the hashes of images using different algorithms.
3 versions - Latest release: over 1 year ago - 2 downloads last month - 186 stars on GitHub - 1 maintainer
n8n-nodes-ninjadoc 1.0.1
Actions to access Ninjadoc AI API
2 versions - Latest release: 6 months ago - 1 maintainer
@promptbook/legacy-documents 0.110.0
Promptbook: Turn your company's scattered knowledge into AI ready books
476 versions - Latest release: 2 months ago - 5.75 thousand downloads last month - 152 stars on GitHub - 1 maintainer
rag-lite-ts 2.3.1
Local-first TypeScript retrieval engine with Chameleon Multimodal Architecture for semantic searc...
14 versions - Latest release: 3 months ago - 333 downloads last month - 3 stars on GitHub - 1 maintainer
n8n-nodes-condoc 0.3.13
n8n community node for ConDoc — a multi-tenant document processing and OCR platform. Automate doc...
26 versions - Latest release: 16 days ago - 2.2 thousand downloads last month - 0 stars on GitHub - 1 maintainer
lcp-nodes 1.0.43
Node RED Custom Nodes for LCP
1 version - Latest release: 10 months ago - 14 downloads last month - 1 maintainer
doc-ops-mcp 0.3.8
MCP Document Converter Server — A Model Context Protocol server for seamless document format conv...
32 versions - Latest release: 8 months ago - 198 downloads last month - 31 stars on GitHub - 1 maintainer
@paulmeller/docflow 0.0.28
A developer-friendly transformation engine for programmatic document manipulation
26 versions - Latest release: 6 months ago - 12 downloads last month - 1 maintainer
devabase-sdk 0.5.6
Official Node.js SDK for Devabase - Backend for RAG/LLM Applications
11 versions - Latest release: about 2 months ago - 925 downloads last month - 0 stars on GitHub - 1 maintainer
n8n-nodes-contextual-document-loader 0.4.0
⚠️ DEPRECATED: Use n8n-nodes-semantic-splitter-with-context instead. This package has known issue...
12 versions - Latest release: 11 months ago - 22 downloads last month - 8 stars on GitHub - 1 maintainer
n8n-nodes-deep-ocr 1.5.8
n8n community node for Deep-OCR document processing API
24 versions - Latest release: 17 days ago - 2.18 thousand downloads last month - 0 stars on GitHub - 1 maintainer
@project-lakechain/bedrock-image-generators 0.10.0
Image generation using Amazon Bedrock models.
7 versions - Latest release: over 1 year ago - 18 downloads last month - 186 stars on GitHub - 1 maintainer
n8n-nodes-adverant-nexus 0.2.0
n8n community nodes for Adverant Nexus AI platform - GraphRAG knowledge management, MageAgent orc...
1 version - Latest release: about 2 months ago - 93 downloads last month - 1 maintainer