An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "text-processing" keyword

catalyst-dkms 0.1.0
Domain Knowledge Management System - preprocessing toolset for heterogeneous text ingestion
1 version - Latest release: 6 months ago - 13 downloads last month - 1 maintainer
ihroteka-converter 1.2.1 💰
A lightweight package for converting Markdown into Steam-compatible markup.
7 versions - Latest release: 10 days ago - 184 downloads last month - 1 stars on GitHub - 1 maintainer
lingo-nlp-toolkit 0.2.2
Advanced NLP Toolkit - Lightweight, Fast, and Transformer-Ready
5 versions - Latest release: 8 months ago - 28 downloads last month - 0 stars on GitHub - 1 maintainer
textfission 0.1.1
A powerful Python library for intelligent text processing, question generation, and answer genera...
2 versions - Latest release: 10 months ago - 19 downloads last month - 0 stars on GitHub - 1 maintainer
tregex-tobiasli 1.0.3
Wrapper for more functionality out of regex parse results.
4 versions - Latest release: over 6 years ago - 2 dependent repositories - 64 downloads last month - 0 stars on GitHub - 1 maintainer
pawpaw 1.0.1
High Performance Text Processing & Segmentation Framework
17 versions - Latest release: about 1 year ago - 156 downloads last month - 25 stars on GitHub - 1 maintainer
Top 1.1% on pypi.org
pyparsing 3.3.2 💰
pyparsing - Classes and methods to define and execute parsing grammars
88 versions - Latest release: 3 months ago - 1,663 dependent packages - 264,180 dependent repositories - 324 million downloads last month - 2,216 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
konoha 5.7.0
Add your description here
30 versions - Latest release: about 1 month ago - 3 dependent packages - 134 dependent repositories - 288 thousand downloads last month - 261 stars on GitHub - 1 maintainer
terraphim-automata 1.0.0
Fast autocomplete and text processing for knowledge graphs
1 version - Latest release: 2 days ago - 1 maintainer
Top 1.9% on pypi.org
pythainlp 5.3.4
Thai Natural Language Processing library
121 versions - Latest release: 6 days ago - 37 dependent packages - 183 dependent repositories - 1.21 million downloads last month - 1,122 stars on GitHub - 2 maintainers
hamtaa-texttools 2.8.4
A high-level NLP toolkit built on top of modern LLMs.
72 versions - Latest release: 5 days ago - 1.99 thousand downloads last month - 2 maintainers
reliq 0.0.47
Python ctypes bindings for reliq
47 versions - Latest release: 2 months ago - 2.2 thousand downloads last month - 13 stars on GitHub - 1 maintainer
ripgrep 15.0.0 💰
ripgrep is a line-oriented search tool that recursively searches the current directory for a rege...
3 versions - Latest release: 6 months ago - 70.5 thousand downloads last month - 60,524 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
pymupdf 1.27.2
A high performance Python library for data extraction, analysis, conversion & manipulation of PDF...
141 versions - Latest release: 29 days ago - 206 dependent packages - 1,798 dependent repositories - 85.7 million downloads last month - 9,356 stars on GitHub - 1 maintainer
voxera 0.0.1
An Open-Source Persian Language Techs Toolkit with Python
1 version - Latest release: over 3 years ago - 7 downloads last month - 5 stars on GitHub - 1 maintainer
losearch 0.1.1
A high-performance Python search library with intelligent relevance scoring, advanced indexing ca...
3 versions - Latest release: 7 months ago - 21 downloads last month - 2 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
pyarabic 0.6.15 💰
Arabic text tools for Python
18 versions - Latest release: almost 4 years ago - 7 dependent packages - 34 dependent repositories - 113 thousand downloads last month - 439 stars on GitHub - 1 maintainer
vettu 1.0.4
Multi-level tokenizer for Tamil text — sentence, word, character, and morpheme tokenization
4 versions - Latest release: 3 days ago - 388 downloads last month - 1 maintainer
obsilink 0.3.1
Extract Obsidian-style wikilink targets from text
2 versions - Latest release: 3 days ago - 321 downloads last month - 1 maintainer
chunkmate 0.2.2
Intelligent text and document chunking library with automatic format detection and AI-powered met...
6 versions - Latest release: about 1 month ago - 1.58 thousand downloads last month - 0 stars on GitHub - 1 maintainer
morphify 0.1.2
Lightweight templating and value formatting utilities
1 version - Latest release: 6 months ago - 7 downloads last month - 0 stars on GitHub - 1 maintainer
turkish-twitter-preprocess 0.0.7
a light-weight python package to pre-process turkish twitter statuses(tweets).
7 versions - Latest release: about 5 years ago - 1 dependent repositories - 32 downloads last month - 7 stars on GitHub - 1 maintainer
utilsaxn 0.3.4
A modular set of data science utilities for EDA, cleaning, and more.
1 version - Latest release: 11 months ago - 15 downloads last month - 2 stars on GitHub - 1 maintainer
hangulpy 1.3.1
A comprehensive Python library for Korean language processing, inspired by es-hangul
12 versions - Latest release: 22 days ago - 874 downloads last month - 2 stars on GitHub - 1 maintainer
chunking-strategy 0.4.1
A comprehensive chunking library for text, documents, audio, video, and data streams (Linux and m...
4 versions - Latest release: 6 months ago - 33 downloads last month - 5 stars on GitHub - 1 maintainer
konfuzio-sdk 0.3.51
Konfuzio Software Development Kit
533 versions - Latest release: 20 days ago - 1 dependent repositories - 2.19 thousand downloads last month - 54 stars on GitHub - 1 maintainer
vecclean 1.0.0
Production-ready text cleaning, deduplication, and vectorization pipeline with C++ acceleration
3 versions - Latest release: 9 months ago - 18 downloads last month - 1 maintainer
yurenizer 0.2.2
A library for standardizing terms with spelling variations using a synonym dictionary.
18 versions - Latest release: over 1 year ago - 96 downloads last month - 4 stars on GitHub - 1 maintainer
pasban 1.0.1
Pure Persian text processing and foreign word detection library
1 version - Latest release: 6 months ago - 10 downloads last month - 1 maintainer
chaos-box 0.6.0
Collection of handy utils written in Python 3
6 versions - Latest release: about 1 month ago - 115 downloads last month - 1 stars on GitHub - 1 maintainer
matterify 0.3.1
Extract and aggregate YAML frontmatter from Markdown files into structured JSON
2 versions - Latest release: 3 days ago - 240 downloads last month - 1 maintainer
akin 1.0.1
Akin is a Python library for detecting near-duplicate texts using min-hashing and locality sensit...
3 versions - Latest release: about 1 year ago - 1 dependent repositories - 77 downloads last month - 8 stars on GitHub - 1 maintainer
textcl 1.0.1
Text preprocessing package for use in NLP tasks
6 versions - Latest release: over 1 year ago - 1 dependent repositories - 25 downloads last month - 11 stars on GitHub - 1 maintainer
sesdiff 0.3.2
Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings ...
2 versions - Latest release: over 1 year ago - 137 downloads last month - 7 stars on GitHub - 1 maintainer
contexto 0.2.0
Librería para el procesamiento y análisis de texto con Python
3 versions - Latest release: over 4 years ago - 1 dependent repositories - 24 downloads last month - 48 stars on GitHub - 1 maintainer
pythainlp-rust-modules 0.2.2
pythainlp-rust-modules is now nlpo3
3 versions - Latest release: over 4 years ago - 1 dependent repositories - 55 downloads last month - 40 stars on GitHub - 2 maintainers
pylib-summarize 0.1.0
Summarize long text using frequency-based or AI-based summarizers. Perfect for AI agents. Perfect...
1 version - Latest release: 5 months ago - 10 downloads last month - 1 maintainer
cedartl 0.1.1
A lightweight, intuitive templating language designed for interactive use in LLM chat sessions.
13 versions - Latest release: over 1 year ago - 50 downloads last month - 5 stars on GitHub - 1 maintainer
burmese-tokenizer 0.1.3
A simple tokenizer for Burmese text
3 versions - Latest release: 8 months ago - 24 downloads last month - 1 stars on GitHub - 1 maintainer
outline-ai 2025.12.22110945
outline-ai converts text into structured summaries, proposals, or outlines, automating publicatio...
1 version - Latest release: 4 months ago - 10 downloads last month - 1 stars on GitHub - 1 maintainer
diff-match-patch-cython 20121119
The Diff Match and Patch libraries offer robust algorithms to perform the operations required for...
1 version - Latest release: about 10 years ago - 2 dependent repositories - 7 downloads last month - 7,925 stars on GitHub - 2 maintainers
quotes-convert 1.2.0
Convert matching double-quotes to single-quotes or vice versa in strings and streams. Inspired by...
3 versions - Latest release: about 1 month ago - 136 downloads last month - 1 maintainer
octolingo 0.3.0
A Python package for translating large texts with advanced features including OCR support.
16 versions - Latest release: 12 months ago - 60 downloads last month - 1 maintainer
wordwrap 0.2.4
A simple library for wrapping text to a fixed column width.
6 versions - Latest release: 6 months ago - 64 downloads last month - 1 maintainer
uneff 1.0.1
Remove BOM and problematic Unicode characters from text files
2 versions - Latest release: 8 months ago - 20 downloads last month - 0 stars on GitHub - 1 maintainer
huggingface-text-data-analyzer 1.1.0
A comprehensive tool for analyzing text datasets from HuggingFace's datasets library
3 versions - Latest release: over 1 year ago - 23 downloads last month - 7 stars on GitHub - 1 maintainer
Top 9.4% on pypi.org
puristaa 2022.7.24
Puristaa (Finnish for compress) - shared prefix compression of ordered string sequences.
3 versions - Latest release: over 3 years ago - 1 dependent package - 2 dependent repositories - 29 downloads last month - 2 stars on GitHub - 1 maintainer
kallia 0.1.6
Semantic Document Processing Library
7 versions - Latest release: 4 months ago - 33 downloads last month - 0 stars on GitHub - 1 maintainer
webis-html 1.0.4
Webis HTML extraction tool
5 versions - Latest release: 4 months ago - 59 downloads last month - 1 maintainer
blobify 1.1.0
Package your entire codebase into a single text file for AI consumption
3 versions - Latest release: 8 months ago - 17 downloads last month - 0 stars on GitHub - 1 maintainer
mon-tokenizer 0.1.5
A simple tokenizer for Mon text
6 versions - Latest release: 7 months ago - 26 downloads last month - 1 stars on GitHub - 1 maintainer
amharic-text-processor 0.1.3
A modular toolkit for cleaning and normalizing Amharic text.
3 versions - Latest release: 4 months ago - 45 downloads last month - 1 maintainer
texthumanize 0.27.1
Algorithmic text humanization with AI detection, tone analysis, paraphrasing, and spinning — 38-s...
5 versions - Latest release: about 1 month ago - 370 downloads last month - 1 maintainer
linesieve 1.0
An unholy blend of grep, sed, awk, and Python.
13 versions - Latest release: almost 3 years ago - 1 dependent repositories - 60 downloads last month - 10 stars on GitHub - 1 maintainer
refinedoc 1.0.1
Python library for post-extraction refinement of text that may be derived from PDF extraction.
4 versions - Latest release: 7 months ago - 268 downloads last month - 18 stars on GitHub - 1 maintainer
pdf2textbox 0.4.3
A PDF-to-text converter based on pdfminer2
14 versions - Latest release: about 7 years ago - 4 dependent repositories - 56 downloads last month - 5 stars on GitHub - 1 maintainer
docxmd-converter 3.0.0
Convert between .docx and .md files with template support and advanced document post-processing
3 versions - Latest release: 8 months ago - 112 downloads last month - 0 stars on GitHub - 1 maintainer
aqpymupdf 1.23.7
A high performance Python library for data extraction, analysis, conversion & manipulation of PDF...
1 version - Latest release: almost 2 years ago - 33 downloads last month - 8,284 stars on GitHub - 1 maintainer
html-to-markdown 3.1.0
High-performance HTML to Markdown converter powered by Rust with a clean Python API
125 versions - Latest release: 7 days ago - 475 thousand downloads last month - 585 stars on GitHub
infoextract-cidoc 0.1.7
LLM-powered CIDOC CRM v7.1.3 entity extraction from unstructured text — Pydantic models, Cypher e...
8 versions - Latest release: about 1 month ago - 167 downloads last month - 1 maintainer
chonkie 1.6.1
🦛 CHONK your texts with Chonkie ✨ - The no-nonsense chunking library
55 versions - Latest release: 21 days ago - 356 thousand downloads last month - 2,478 stars on GitHub - 2 maintainers
jamolib 0.2.1
Utilities for decomposing, composing, and keyboard-mapping Korean Hangul text.
3 versions - Latest release: 21 days ago - 325 downloads last month - 1 maintainer
latincy-preprocess 0.2.0
Latin text preprocessing: U/V normalization, long-s correction, and more
4 versions - Latest release: 24 days ago - 186 downloads last month - 0 stars on GitHub - 1 maintainer
kkltk 1.0
kkltk is a toolkit designed for Kinyarwanda and Kirundi languages processing
1 version - Latest release: over 5 years ago - 1 dependent repositories - 6 downloads last month - 1 stars on GitHub - 1 maintainer
pycoreux 0.1.1
A Python library providing shell-like utilities for file operations, text processing, and subproc...
2 versions - Latest release: 8 months ago - 13 downloads last month - 3 stars on GitHub - 1 maintainer
Top 8.0% on pypi.org
tiny-tokenizer 3.4.0 💰
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with...
19 versions - Latest release: over 5 years ago - 16 dependent repositories - 203 downloads last month - 261 stars on GitHub - 1 maintainer
trunajod 0.1.1
A python lib for readability analyses.
3 versions - Latest release: almost 5 years ago - 1 dependent repositories - 63 downloads last month - 30 stars on GitHub - 1 maintainer
dict-fr-wordscapes 2021.9.10
French dictionary of Wordscapes solutions
1 version - Latest release: over 4 years ago - 1 dependent repositories - 17 downloads last month - 0 stars on GitHub - 1 maintainer
yosina 1.1.1
Japanese text transliteration library
5 versions - Latest release: 21 days ago - 439 downloads last month - 19 stars on GitHub - 1 maintainer
Top 1.8% on pypi.org
pymupdfb 1.24.10
MuPDF shared libraries for PyMuPDF.
18 versions - Latest release: over 1 year ago - 4 dependent packages - 133 dependent repositories - 1.62 million downloads last month - 8,284 stars on GitHub - 1 maintainer
ngram-polars 0.1.2
High-performance n-gram generation for Polars
2 versions - Latest release: about 1 month ago - 177 downloads last month - 1 maintainer
nahiarhdnlp 1.5.3
Advanced Indonesian Natural Language Processing Library
34 versions - Latest release: 3 months ago - 360 downloads last month - 1 maintainer
forumscraper 0.1.21
A forum scraper library
31 versions - Latest release: 7 months ago - 487 downloads last month - 34 stars on GitHub - 1 maintainer
python-ucto 0.6.10
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost a...
25 versions - Latest release: 2 months ago - 1 dependent package - 4 dependent repositories - 527 downloads last month - 29 stars on GitHub - 1 maintainer
fleetfluid 0.1.6
AI Agent Functions for ETL/Data Processing
6 versions - Latest release: 6 months ago - 59 downloads last month - 0 stars on GitHub - 1 maintainer
stringtools 3.0.1
stringtools provides string operations, such as analaysing, converting, generating, validating.
22 versions - Latest release: over 3 years ago - 1 dependent repositories - 80 downloads last month - 5 stars on GitHub - 1 maintainer
Top 3.7% on pypi.org
padatious 0.4.8
A neural network intent parser
25 versions - Latest release: almost 6 years ago - 3 dependent packages - 47 dependent repositories - 2.41 thousand downloads last month - 158 stars on GitHub - 1 maintainer
magic-profanity 2.0.1
A Python library for detecting and censoring profanity in text
4 versions - Latest release: 11 months ago - 1.81 thousand downloads last month - 1 stars on GitHub - 1 maintainer
javanese-stemmer 1.2.1
Comprehensive Javanese language stemmer with morphophonological rules and dictionary validation
5 versions - Latest release: 5 months ago - 42 downloads last month - 1 maintainer
char-index-mcp 0.2.1
A Model Context Protocol server for character-level index-based string manipulation
4 versions - Latest release: 5 months ago - 63 downloads last month - 1 maintainer
furlanspellchecker 0.1.1
A comprehensive spell checker for the Friulian language with CLI and pipeline service.
2 versions - Latest release: 4 months ago - 7 downloads last month - 1 maintainer
awking 1.1.2
Make it easier to use Python as an AWK replacement
4 versions - Latest release: over 4 years ago - 1 dependent repositories - 48 downloads last month - 0 stars on GitHub - 1 maintainer
txt2phrases 1.0.3
A comprehensive library for text processing, keyword extraction, and classification from PDF and ...
7 versions - Latest release: 5 months ago - 65 downloads last month - 0 stars on GitHub - 2 maintainers
wordninja-enhanced 3.1.1
Probabilistically split concatenated words. Now with more functionality and languages!
4 versions - Latest release: 9 months ago - 34.4 thousand downloads last month - 0 stars on GitHub - 1 maintainer
long2short 0.1.4
A flexible text summarization library to summarize long documents supporting multiple LLM providers
5 versions - Latest release: about 1 year ago - 16 downloads last month - 1 maintainer
finglish3 1.4.8
Finglish-to-Persian converter.
1 version - Latest release: almost 8 years ago - 1 dependent repositories - 14 downloads last month - 84 stars on GitHub - 1 maintainer
personnamenorm 0.2
unifying person names in different notations
1 version - Latest release: over 5 years ago - 1 dependent repositories - 5 downloads last month - 2 stars on GitHub - 1 maintainer
dhelp 0.0.5
DH Python tools for scraping web pages, pre-processing data, and performing nlp analysis quickly.
4 versions - Latest release: almost 8 years ago - 1 dependent repositories - 30 downloads last month - 5 stars on GitHub - 1 maintainer
title-fix 0.0.3
A Python package for intelligent title case conversion and text formatting
3 versions - Latest release: 6 months ago - 34 downloads last month - 0 stars on GitHub - 1 maintainer
finglish 1.5.1
Finglish-to-Persian converter.
22 versions - Latest release: almost 6 years ago - 1 dependent repositories - 159 downloads last month - 84 stars on GitHub - 1 maintainer
block-spinning 1.0.4
A Python module for block spinning
3 versions - Latest release: almost 6 years ago - 24 downloads last month - 1 stars on GitHub - 1 maintainer
Top 9.0% on pypi.org
fuzzychinese 0.1.5
A small package to fuzzy match chinese words 中文模糊匹配
3 versions - Latest release: almost 7 years ago - 2 dependent repositories - 645 downloads last month - 89 stars on GitHub - 1 maintainer
xia-diff-match-patch 0.0.3
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
3 versions - Latest release: over 2 years ago - 1 dependent package - 30 downloads last month - 7,713 stars on GitHub - 1 maintainer
bangla-normalizer 0.1.1
A Python library for normalizing Bengali text, including dates, numbers, currency, etc.
2 versions - Latest release: 10 months ago - 38 downloads last month - 0 stars on GitHub - 3 maintainers
thinkstrip 0.2.2
Think-block filter for LLM streams
4 versions - Latest release: 13 days ago - 280 downloads last month - 1 maintainer
contextgem 0.22.0
Effortless LLM extraction from documents
45 versions - Latest release: 24 days ago - 2.02 thousand downloads last month - 1,697 stars on GitHub - 1 maintainer
text2ics 0.1.2
A Python tool to convert unstructured text into an ICS calendar file using an LLM.
3 versions - Latest release: 9 months ago - 24 downloads last month - 0 stars on GitHub - 1 maintainer
fast-readability 0.0.1
A fast HTML content extractor based on Mozilla's Readability.js
1 version - Latest release: 10 months ago - 71 downloads last month - 1 stars on GitHub - 1 maintainer
simple-anonymizer 0.1.18
Privacy-first text anonymization tool with enterprise-grade accuracy for removing PII from documents
11 versions - Latest release: 9 months ago - 35 downloads last month - 1 maintainer
Top 9.4% on pypi.org
wordcloud-fa 0.1.10 💰
A wrapper for wordcloud module for creating persian (and other rtl languages) word cloud.
10 versions - Latest release: over 3 years ago - 6 dependent repositories - 146 downloads last month - 145 stars on GitHub - 1 maintainer