npmjs.org "document-processing" keyword
@project-lakechain/lancedb-storage-connector 0.10.0
A data store connector for LanceDB.3 versions - Latest release: over 1 year ago - 8 downloads last month - 186 stars on GitHub - 1 maintainer
md-anything 0.2.1
Local-first Markdown conversion for files, webpages, and media — CLI and MCP3 versions - Latest release: 9 days ago - 336 downloads last month - 0 stars on GitHub - 1 maintainer
Top 1.9% on npmjs.org
76 versions - Latest release: 10 months ago - 96 dependent packages - 582 dependent repositories - 705 thousand downloads last month - 222 stars on GitHub - 2 maintainers
stopword 3.1.5
A module for node.js and the browser that takes in text and returns text that is stripped of stop...76 versions - Latest release: 10 months ago - 96 dependent packages - 582 dependent repositories - 705 thousand downloads last month - 222 stars on GitHub - 2 maintainers
@llamaindex/liteparse 1.4.2
Open-source PDF parsing with spatial text extraction and OCR processing10 versions - Latest release: 3 days ago - 19.2 thousand downloads last month - 2,392 stars on GitHub - 8 maintainers
@mediaproc/cli 1.0.1
Universal media processing CLI with plugin architecture17 versions - Latest release: 26 days ago - 190 downloads last month - 0 stars on GitHub - 1 maintainer
docqa-mcp 1.0.2
MCP server for DocQA — AI document verification, PDF extraction, OCR, and format conversion via a...3 versions - Latest release: about 22 hours ago - 1 maintainer
@ansonlai/docx-redline-js 0.1.4
Host-independent OOXML reconciliation engine for .docx manipulation with track changes3 versions - Latest release: about 1 month ago - 197 downloads last month - 0 stars on GitHub - 1 maintainer
@project-lakechain/jmespath-processor 0.10.0
Applies JMESPath expressions to JSON documents.7 versions - Latest release: over 1 year ago - 3 downloads last month - 186 stars on GitHub - 1 maintainer
@nlptools/splitter 0.0.2
Text splitting utilities - LangChain.js text splitters wrapper for NLPTools2 versions - Latest release: 5 months ago - 13 downloads last month - 0 stars on GitHub - 1 maintainer
secure-redact 1.0.4
Client-side PII detection and redaction React component. Upload documents, automatically detect s...5 versions - Latest release: about 1 month ago - 536 downloads last month - 0 stars on GitHub - 2 maintainers
@project-lakechain/character-text-splitter 0.10.0
Transforms text into chunks of tokens using Langchain's character text splitter.7 versions - Latest release: over 1 year ago - 6 downloads last month - 186 stars on GitHub - 1 maintainer
@getaide/sdk 0.1.0
AIDE SDK, CLI & MCP Server - Extract structured data from documents using AI1 version - Latest release: 4 days ago - 1 maintainer
n8n-nodes-extract-monster 1.0.3
AI-powered data extraction from PDFs, images, documents, audio, and video. Extract invoices, rece...3 versions - Latest release: 5 months ago - 163 downloads last month - 1 maintainer
@project-lakechain/trafilatura 0.10.0
Extracts text and metadata from HTML documents using Trafilatura.3 versions - Latest release: over 1 year ago - 9 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/translate-text-processor 0.10.0
Translates text documents asynchronously using Amazon Translate.7 versions - Latest release: over 1 year ago - 15 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/bert-extractive-summarizer 0.10.0
Provides text summarization using the Bert Extractive Summarizer model.7 versions - Latest release: over 1 year ago - 17 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/bedrock-embedding-processors 0.10.0
Creates embeddings from documents using Amazon Bedrock models.7 versions - Latest release: over 1 year ago - 16 downloads last month - 186 stars on GitHub - 1 maintainer
@caleblawson/rag 1.0.0
The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilit...1 version - Latest release: 9 months ago - 12 downloads last month - 1 maintainer
lekana-gemini 1.0.0
A shared TypeScript library for Lekana microservices that provides AI-powered document processing...1 version - Latest release: 4 months ago - 1 maintainer
ignidor-idp-mcp 1.0.4
MCP server for Ignidor IDP B2B API integration - enables Claude to process documents through Igni...5 versions - Latest release: 5 months ago - 19 downloads last month - 1 maintainer
@buildel/ocr 0.1.1
Document processing application with CLI and API interfaces11 versions - Latest release: 8 months ago - 41 downloads last month - 3 maintainers
@aivue/chatbot 2.5.5
AI-powered chat components for Vue.js with RAG (Retrieval-Augmented Generation) support51 versions - Latest release: 3 months ago - 389 downloads last month - 7 stars on GitHub - 1 maintainer
n8n-nodes-docx-converter-enhanced 1.0.0
Enhanced n8n community node for DOCX to text conversion with RAG capabilities, page-aware chunkin...1 version - Latest release: 7 months ago - 20 downloads last month - 1 maintainer
@project-lakechain/opensearch-domain 0.10.0
Creates an OpenSearch domain with Cognito authentication.7 versions - Latest release: over 1 year ago - 2 downloads last month - 186 stars on GitHub - 1 maintainer
doc-to-readable 1.5.3
Universal document-to-markdown and section splitter for HTML, URLs, and PDFs.22 versions - Latest release: 9 months ago - 127 downloads last month - 6 stars on GitHub - 1 maintainer
n8n-nodes-google-vertex-embeddings-extended 0.8.1
n8n community sub-node for Google Vertex AI Embeddings with output dimensions and configurable ba...15 versions - Latest release: 5 months ago - 116 downloads last month - 8 stars on GitHub - 2 maintainers
@project-lakechain/layers 0.10.0
Lambda layer library used by Project Lakechain.7 versions - Latest release: over 1 year ago - 4 downloads last month - 186 stars on GitHub - 1 maintainer
@nyazkhan/react-pdf-viewer 1.1.1
A comprehensive React TypeScript component library for viewing and interacting with PDF files usi...5 versions - Latest release: 8 months ago - 86 downloads last month - 0 stars on GitHub - 1 maintainer
@base64ai/n8n-nodes-base64ai 2.0.0
Official Base64.ai community node for n8n3 versions - Latest release: 2 months ago - 130 downloads last month - 2 maintainers
n8n-nodes-graphorlm 0.1.18
n8n community nodes for Graphor - Intelligent document processing, RAG pipelines, and document ch...15 versions - Latest release: 5 days ago - 512 downloads last month - 0 stars on GitHub - 1 maintainer
@project-lakechain/opensearch-saved-object 0.10.0
Uploads a saved object to OpenSearch using AWS CDK.7 versions - Latest release: over 1 year ago - 2 downloads last month - 186 stars on GitHub - 1 maintainer
@casemark/thurgood 2.0.1
Thurgood CLI - Legal engineer AI agent powered by Case.dev. Build legal applications with documen...8 versions - Latest release: 4 months ago - 667 downloads last month - 3 maintainers
openclaw-docproc-mcp 1.0.0
MCP server for OpenClaw Document Processing APIs — PDF extraction, OCR, and format conversion pai...1 version - Latest release: 7 days ago - 1 maintainer
peslac 1.1.3
A Node.js package to interact with the Peslac API for document processing.14 versions - Latest release: about 1 year ago - 85 downloads last month - 0 stars on GitHub - 1 maintainer
@project-lakechain/reducer 0.10.0
A middleware allowing to reduce multiple events into a single event.5 versions - Latest release: over 1 year ago - 13 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/rekognition-image-processor 0.10.0
Processes images using Amazon Rekognition.7 versions - Latest release: over 1 year ago - 2 downloads last month - 186 stars on GitHub - 1 maintainer
@storepress/llm-md-text-splitter 0.0.1
High-performance streaming Markdown text splitter for LLM pipelines and RAG systems. Zero sequenc...1 version - Latest release: about 2 months ago - 1 maintainer
markdoc-traverse 1.1.1
A simple and tiny traversal library for MarkDoc AST4 versions - Latest release: over 1 year ago - 19 downloads last month - 0 stars on GitHub - 1 maintainer
@easyrag/sdk 0.1.1
Official JavaScript SDK for EasyRAG.com API2 versions - Latest release: 4 months ago - 7 downloads last month - 0 stars on GitHub - 1 maintainer
@wdelhagen/textprep 0.3.2
Document text extraction with pluggable extractors. Supports PDF, DOCX, DOC, RTF, TXT, and image ...7 versions - Latest release: about 1 month ago - 301 downloads last month - 1 maintainer
@project-lakechain/panns-embedding-processor 0.10.0
A processor generating embeddings for audio documents using Pretrained Audio Neural Networks.7 versions - Latest release: over 1 year ago - 8 downloads last month - 186 stars on GitHub - 1 maintainer
invoicify-json-craft 1.0.0
AI-powered invoice to JSON converter using Mistral AI with dynamic field detection and master sch...1 version - Latest release: 10 months ago - 4 downloads last month - 1 maintainer
@equus-ai/sdk 1.1.0
Official JavaScript/TypeScript SDK for EQUUS AI Infrastructure Platform - Document Processing, RA...2 versions - Latest release: 3 months ago - 147 downloads last month - 1 maintainer
treechunk 1.1.0
Hierarchical markdown chunking for RAG systems with AI-powered context summarization5 versions - Latest release: 8 months ago - 8 downloads last month - 0 stars on GitHub - 1 maintainer
@project-lakechain/rembg-image-processor 0.10.0
Automatically remove background from images using Rembg.5 versions - Latest release: over 1 year ago - 2 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/ollama-embedding-processor 0.10.0
Creates embeddings from documents using Ollama models.3 versions - Latest release: over 1 year ago - 11 downloads last month - 186 stars on GitHub - 1 maintainer
nanonets 2.0.1
Node.js SDK for the Nanonets API: OCR, document extraction, and workflow automation.14 versions - Latest release: 10 months ago - 2 dependent packages - 41 downloads last month - 0 stars on GitHub - 3 maintainers
@project-lakechain/s3-storage-connector 0.10.0
Stores documents and their metadata in an S3 Bucket.7 versions - Latest release: over 1 year ago - 3 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/core 0.10.0
Core package for building middlewares with Project Lakechain.7 versions - Latest release: over 1 year ago - 5 downloads last month - 184 stars on GitHub - 1 maintainer
pdf-utils-rust 0.1.1
PDF and image processing utilities compiled to WebAssembly - Fast, secure, client-side file proce...1 version - Latest release: 6 months ago - 22 downloads last month - 0 stars on GitHub - 1 maintainer
@project-lakechain/sentence-transformers 0.10.0
Creates embeddings from text-oriented documents using Sentence Transformers models.7 versions - Latest release: over 1 year ago - 12 downloads last month - 185 stars on GitHub - 1 maintainer
@nlptools/nlptools 0.0.2
Main NLPTools package - Complete suite of NLP algorithms, text distance, similarity, splitting, a...16 versions - Latest release: 5 months ago - 7 downloads last month - 0 stars on GitHub - 1 maintainer
@project-lakechain/passthrough 0.10.0
A middleware acting as a passthrough logging received events.7 versions - Latest release: over 1 year ago - 5 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/keybert-text-processor 0.10.0
Extracts the main keywords from a text document using the KeyBERT model.7 versions - Latest release: over 1 year ago - 6 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/regexp-text-splitter 0.10.0
Transforms text into chunks of tokens based on regular expressions.5 versions - Latest release: over 1 year ago - 5 downloads last month - 186 stars on GitHub - 1 maintainer
n8n-nodes-deep-ocr 1.4.1
n8n community node for Deep-OCR document processing API13 versions - Latest release: 6 days ago - 173 downloads last month - 0 stars on GitHub - 1 maintainer
@project-lakechain/polly-synthesizer 0.10.0
Synthesizes text into speech using Amazon Polly.7 versions - Latest release: over 1 year ago - 12 downloads last month - 186 stars on GitHub - 1 maintainer
@dooor-ai/cortexdb 0.9.12
Official TypeScript/JavaScript SDK for CortexDB - Multi-modal RAG Platform with advanced document...68 versions - Latest release: about 2 months ago - 973 downloads last month - 1 maintainer
@project-lakechain/pinecone-storage-connector 0.10.0
A data store connector for Pinecone.7 versions - Latest release: over 1 year ago - 8 downloads last month - 186 stars on GitHub - 1 maintainer
@instafill.ai/instafill 0.3.2
Instafill AI Node.js library for automating PDF form filling using AI-powered technology.3 versions - Latest release: about 1 year ago - 46 downloads last month - 1 maintainer
@mastra/rag 2.1.2
The Retrieval-Augmented Generation (RAG) module contains document processing and embedding utilit...697 versions - Latest release: about 1 month ago - 213 thousand downloads last month - 21,839 stars on GitHub - 11 maintainers
@project-lakechain/sqs-storage-connector 0.10.0
Stores documents and their metadata in an SQS queue.7 versions - Latest release: over 1 year ago - 18 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/zip-processor 0.10.0
Inflates and deflates Zip documents from a source to a destination bucket.5 versions - Latest release: over 1 year ago - 7 downloads last month - 186 stars on GitHub - 1 maintainer
universal-documents-converter 1.0.1
Universal MCP Server for Multi-Rendering PDF Quality Assurance System with AI-powered optimization1 version - Latest release: 7 months ago - 12 downloads last month - 1 maintainer
@promptbook/legacy-documents 0.110.0
Promptbook: Turn your company's scattered knowledge into AI ready books463 versions - Latest release: about 2 months ago - 4.04 thousand downloads last month - 152 stars on GitHub - 1 maintainer
@promptbook/documents 0.110.0
Promptbook: Turn your company's scattered knowledge into AI ready books466 versions - Latest release: about 2 months ago - 1.77 thousand downloads last month - 152 stars on GitHub - 1 maintainer
@promptbook/pdf 0.110.0
Promptbook: Turn your company's scattered knowledge into AI ready books461 versions - Latest release: about 2 months ago - 7.12 thousand downloads last month - 152 stars on GitHub - 1 maintainer
koncile-js 0.1.4
JavaScript SDK for the Koncile Intelligent Document Processing API5 versions - Latest release: 9 months ago - 14 downloads last month - 1 maintainer
@jmndao/mongoose-ai 1.4.0
AI-powered Mongoose plugin for intelligent document processing with auto-summarization, semantic ...10 versions - Latest release: 9 months ago - 172 downloads last month - 3 stars on GitHub - 1 maintainer
@project-lakechain/sentence-text-splitter 0.10.0
Transforms text into chunks of tokens using a sentence text splitter.7 versions - Latest release: over 1 year ago - 7 downloads last month - 186 stars on GitHub - 1 maintainer
n8n-nodes-condoc 0.3.9
n8n community node for ConDoc — a multi-tenant document processing and OCR platform. Automate doc...22 versions - Latest release: 8 days ago - 1.62 thousand downloads last month - 0 stars on GitHub - 1 maintainer
@project-lakechain/opensearch-storage-connector 0.10.0
Stores document metadata in an OpenSearch index.7 versions - Latest release: over 1 year ago - 4 downloads last month - 186 stars on GitHub - 1 maintainer
n8n-nodes-docx-genie-pro 0.1.2
n8n node package for DOCX document manipulation and processing3 versions - Latest release: 10 months ago - 4 downloads last month - 0 stars on GitHub - 1 maintainer
n8n-nodes-unicraft 2.1.8
UniCraft N8N custom nodes - Unified AI Model Router with Multi-Modal Support by CloudCraft Labs f...9 versions - Latest release: 6 months ago - 98 downloads last month - 1 maintainer
docx-edit 0.1.0
A JS library that parses DOCX into a virtual component tree and writes paragraph-level text chang...1 version - Latest release: 8 days ago - 1 maintainer
docx-vcomponent 0.1.0
A JS library that parses DOCX into a virtual component tree and writes paragraph-level text chang...1 version - Latest release: 8 days ago - 1 maintainer
@project-lakechain/condition 0.10.0
A middleware allowing to express complex conditions in pipelines.7 versions - Latest release: over 1 year ago - 3 downloads last month - 185 stars on GitHub - 1 maintainer
@heripo/pdf-parser 0.1.18
PDF parsing library using Docling SDK with OCR support for macOS19 versions - Latest release: 11 days ago - 1.22 thousand downloads last month - 4 stars on GitHub - 2 maintainers
@heripo/document-processor 0.1.18
Document processor with LLM-based analysis for heripo engine19 versions - Latest release: 11 days ago - 1.28 thousand downloads last month - 4 stars on GitHub - 2 maintainers
@heripo/model 0.1.18
Document models and type definitions for heripo engine19 versions - Latest release: 11 days ago - 1.28 thousand downloads last month - 4 stars on GitHub - 2 maintainers
@project-lakechain/ffmpeg-processor 0.10.0
Processes media documents using FFMPEG.5 versions - Latest release: over 1 year ago - 7 downloads last month - 186 stars on GitHub - 1 maintainer
n8n-nodes-docuprox 1.0.8
A n8n Node for processing files and base64 data using the Docuprox API. Easily integrate automate...9 versions - Latest release: about 1 month ago - 211 downloads last month - 1 stars on GitHub - 1 maintainer
@docrouter/mcp 1.0.0
TypeScript MCP server for DocRouter API8 versions - Latest release: 2 months ago - 232 downloads last month - 1 maintainer
@aidalinfo/pdf-processor 1.0.18
Powerful PDF data extraction library powered by AI vision models. Transform PDFs into structured,...18 versions - Latest release: 7 months ago - 195 downloads last month - 6 stars on GitHub - 1 maintainer
n8n-nodes-pdf-api-hub 4.0.16
n8n community node to parse, extract, merge, convert, and lock PDFs using the PDF API Hub25 versions - Latest release: 9 days ago - 996 downloads last month - 0 stars on GitHub - 1 maintainer
@pageindex/sdk 0.8.0
PageIndex SDK - Document processing for AI applications via REST API and MCP6 versions - Latest release: 9 days ago - 724 downloads last month - 1 maintainer
@scaly/dazzle 0.1.0
DSSSL processor - CLI wrapper for @scaly/openjade1 version - Latest release: 4 months ago - 1 maintainer
@project-lakechain/service-linked-role 0.10.0
Creates a service linked role for a given service in an idempotent way.3 versions - Latest release: over 1 year ago - 6 downloads last month - 186 stars on GitHub - 1 maintainer
mcp-upstage-server 0.5.0
MCP server for Upstage AI document processing - Node.js implementation11 versions - Latest release: 7 months ago - 1.06 thousand downloads last month - 0 stars on GitHub - 1 maintainer
@project-lakechain/scheduler-event-trigger 0.10.0
Triggers pipelines upon scheduling events.7 versions - Latest release: over 1 year ago - 1 downloads last month - 186 stars on GitHub - 1 maintainer
@scaly/openjade 0.2.1
TypeScript port of OpenJade DSSSL engine3 versions - Latest release: 4 months ago - 1 maintainer
n8n-nodes-vector-store-processor 1.8.15
n8n node for intelligent document chunking and processing for vector store ingestion with Smart Q...49 versions - Latest release: 5 months ago - 2.54 thousand downloads last month - 1 maintainer
@project-lakechain/sdk 0.10.0
An SDK providing helpers to create Lakechain middlewares in TypeScript.9 versions - Latest release: over 1 year ago - 13 downloads last month - 186 stars on GitHub - 1 maintainer
sprint-docx-mcp-server 1.0.0
GitHub Copilot agent for processing DOCX Sprint documents with hierarchical structure into Jira-f...1 version - Latest release: 4 months ago - 1 maintainer
@nutrient-sdk/document-engine-mcp-server 0.0.2
MCP server for Nutrient Document Engine2 versions - Latest release: 29 days ago - 17 downloads last month - 56 stars on GitHub - 5 maintainers
@abhi-arya1/mastra-minirag 1.0.1
Minimal recursive text chunking functionality extracted from @mastra/rag for edge deployments2 versions - Latest release: 6 months ago - 1 maintainer
create-autollama 0.0.1
Placeholder scaffolder for AutoLlama. Creates a new folder (default: autollama) and points to the...1 version - Latest release: 7 months ago - 9 downloads last month - 24 stars on GitHub - 1 maintainer
expo-pdf-text-extract 1.0.1
Native PDF text extraction for React Native and Expo. Extract text content from PDF files using p...2 versions - Latest release: about 1 month ago - 3 stars on GitHub - 1 maintainer
@project-lakechain/neo4j-storage-connector 0.10.0
A data store connector for Neo4j.3 versions - Latest release: over 1 year ago - 2 downloads last month - 186 stars on GitHub - 1 maintainer
@project-lakechain/sharp-image-transform 0.10.0
A middleware transforming images using the sharp library.7 versions - Latest release: over 1 year ago - 2 downloads last month - 186 stars on GitHub - 1 maintainer
Related Keywords
machine-learning
88
ai
87
retrieval-augmented-generation
86
natural-language-processing
82
generative-ai
79
computer-vision
78
serverless
78
aws
77
hacktoberfest
76
aws-cdk
75
pdf
70
ocr
63
typescript
53
n8n-community-node-package
46
llm
44
rag
40
n8n
38
mcp
36
embeddings
36
model-context-protocol
27
markdown
26
text-extraction
26
docx
22
semantic-search
21
sdk
19
claude
18
automation
18
langchain
17
nlp
16
openai
15
workflow
15
nodejs
15
vector-search
14
document
14
pdf-parser
14
cli
13
text-splitting
13
vector-database
12
chunking
12
data-extraction
12
pdf-processing
11
knowledge-base
11
api
11
chatbot
11
html
10
javascript
10
monorepo
10
cdk
9
extraction
9
web-scraping
9
react
9
text-processing
8
multimodal
8
structured-data
7
browser
7
n8n-community-nodes
7
document-conversion
7
lakechain
7
gemini
7
developer-tools
7
pdf-extraction
7
n8n-node
7
text-splitter
6
graphor
6
office-documents
6
word
6
image-processing
6
ai-sdk
6
n8n-community-node
5
tesseract
5
batch-processing
5
ai-agents
5
latex
5
embedding
5
document-analysis
5
sub-node
5
vector-store
5
json
5
documents
5
file-processing
5
parser
5
mcp-server
5
ai-agent
5
ai-workflow
5
agent
5
image
5
nextjs
5
text-analysis
5
invoice
5
playwright
4
markdown-to-pdf
4
cross-platform
4
conversational-ai
4
pipeline
4
function-calling
4
language-model
4
tool
4
pdf-to-markdown
4
docdigitizer
4
opensearch
4