An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "text-processing" keyword

nlp-preprocessing 0.2.0
A Package for text preprocessing
14 versions - Latest release: over 5 years ago - 1 dependent repositories - 60 downloads last month - 16 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
konoha 5.7.0 💰
Add your description here
30 versions - Latest release: 8 days ago - 3 dependent packages - 134 dependent repositories - 215 thousand downloads last month - 261 stars on GitHub - 1 maintainer
Top 1.1% on pypi.org
pyparsing 3.3.2 💰
pyparsing - Classes and methods to define and execute parsing grammars
88 versions - Latest release: about 2 months ago - 1,663 dependent packages - 264,180 dependent repositories - 279 million downloads last month - 2,216 stars on GitHub - 1 maintainer
Top 5.2% on pypi.org
lingua-franca 0.4.3
Mycroft's multilingual text parsing and formatting library
9 versions - Latest release: over 3 years ago - 1 dependent package - 24 dependent repositories - 2.58 thousand downloads last month - 77 stars on GitHub - 1 maintainer
tukuy 0.0.34 💰
A flexible data transformation library with a plugin system
29 versions - Latest release: 1 day ago - 7.98 thousand downloads last month - 3 stars on GitHub - 1 maintainer
lingo-nlp-toolkit 0.2.2
Advanced NLP Toolkit - Lightweight, Fast, and Transformer-Ready
5 versions - Latest release: 7 months ago - 31 downloads last month - 0 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
pymupdf 1.27.1
A high performance Python library for data extraction, analysis, conversion & manipulation of PDF...
139 versions - Latest release: 26 days ago - 206 dependent packages - 1,798 dependent repositories - 39.7 million downloads last month - 9,151 stars on GitHub - 1 maintainer
strparser 0.1.1
Simple Utilities to Extract Text Substrings in Python
2 versions - Latest release: 3 days ago - 149 downloads last month - 1 maintainer
doc2term 0.1
A fast NLP tokenizer that detects tokens and remove duplications and punctuations
1 version - Latest release: almost 5 years ago - 1 dependent repositories - 18 downloads last month - 2 stars on GitHub - 1 maintainer
mytext 0.0.0
MyText: A Minimal AI-Powered Text Rewriting Tool
6 versions - Latest release: 4 months ago - 177 downloads last month - 1 maintainer
textmining3 1.1.0
Text Mining Utilities for Python 3
2 versions - Latest release: over 7 years ago - 1 dependent repositories - 63 downloads last month - 1 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
pyarabic 0.6.15 💰
Arabic text tools for Python
18 versions - Latest release: over 3 years ago - 7 dependent packages - 34 dependent repositories - 113 thousand downloads last month - 439 stars on GitHub - 1 maintainer
nametract 1.1.3
Simple and stupid name extraction
6 versions - Latest release: over 4 years ago - 1 dependent repositories - 42 downloads last month - 0 stars on GitHub - 1 maintainer
ihroteka-converter 1.1.3 💰
A lightweight package for converting Markdown into Steam-compatible markup.
5 versions - Latest release: 2 days ago - 195 downloads last month - 1 stars on GitHub - 1 maintainer
hamtaa-texttools 2.8.3
A high-level NLP toolkit built on top of modern LLMs.
71 versions - Latest release: 12 days ago - 1.99 thousand downloads last month - 2 maintainers
natsulang 1.0.0b11
A text-processing language based on Python 3.
10 versions - Latest release: over 5 years ago - 1 dependent repositories - 90 downloads last month - 8 stars on GitHub - 1 maintainer
reliq 0.0.47
Python ctypes bindings for reliq
47 versions - Latest release: about 1 month ago - 4.35 thousand downloads last month - 12 stars on GitHub - 1 maintainer
gaspra 0.1.0a3
A fast Python tool for searching, diffing, and merging text
2 versions - Latest release: about 2 years ago - 28 downloads last month - 1 stars on GitHub - 1 maintainer
catalyst-dkms 0.1.0
Domain Knowledge Management System - preprocessing toolset for heterogeneous text ingestion
1 version - Latest release: 5 months ago - 26 downloads last month - 1 maintainer
Top 1.9% on pypi.org
pythainlp 5.2.0
Thai Natural Language Processing library
116 versions - Latest release: 3 months ago - 37 dependent packages - 183 dependent repositories - 1.15 million downloads last month - 1,111 stars on GitHub - 2 maintainers
ultranlp 1.0.6
Ultra-fast, comprehensive NLP preprocessing library with advanced tokenization
6 versions - Latest release: 7 months ago - 62 downloads last month - 0 stars on GitHub - 1 maintainer
voxera 0.0.1
An Open-Source Persian Language Techs Toolkit with Python
1 version - Latest release: over 3 years ago - 8 downloads last month - 5 stars on GitHub - 1 maintainer
turkish-twitter-preprocess 0.0.7
a light-weight python package to pre-process turkish twitter statuses(tweets).
7 versions - Latest release: about 5 years ago - 1 dependent repositories - 32 downloads last month - 7 stars on GitHub - 1 maintainer
hangulpy 1.3.0
A comprehensive Python library for Korean language processing, inspired by es-hangul
11 versions - Latest release: 4 months ago - 874 downloads last month - 2 stars on GitHub - 1 maintainer
yurenizer 0.2.2
A library for standardizing terms with spelling variations using a synonym dictionary.
18 versions - Latest release: about 1 year ago - 154 downloads last month - 4 stars on GitHub - 1 maintainer
ripgrep 15.0.0 💰
ripgrep is a line-oriented search tool that recursively searches the current directory for a rege...
3 versions - Latest release: 5 months ago - 51.4 thousand downloads last month - 60,137 stars on GitHub - 1 maintainer
shekar 1.4.1
Simplifying Persian NLP for Modern Applications
38 versions - Latest release: 15 days ago - 1.09 thousand downloads last month - 52 stars on GitHub - 1 maintainer
chunkmate 0.2.2
Intelligent text and document chunking library with automatic format detection and AI-powered met...
6 versions - Latest release: 4 days ago - 39 downloads last month - 0 stars on GitHub - 1 maintainer
phileas-redact 1.0.0
A Python library to deidentify and redact PII, PHI, and other sensitive information from text
1 version - Latest release: 9 days ago - 105 downloads last month - 1 stars on GitHub - 1 maintainer
pasban 1.0.1
Pure Persian text processing and foreign word detection library
1 version - Latest release: 5 months ago - 12 downloads last month - 1 maintainer
vecclean 1.0.0
Production-ready text cleaning, deduplication, and vectorization pipeline with C++ acceleration
3 versions - Latest release: 8 months ago - 9 downloads last month - 1 maintainer
textcl 1.0.1
Text preprocessing package for use in NLP tasks
6 versions - Latest release: over 1 year ago - 1 dependent repositories - 52 downloads last month - 11 stars on GitHub - 1 maintainer
konfuzio-sdk 0.3.48
Konfuzio Software Development Kit
531 versions - Latest release: 26 days ago - 1 dependent repositories - 2.69 thousand downloads last month - 54 stars on GitHub - 1 maintainer
pomera-ai-commander 1.4.4
Text processing toolkit with 22 MCP tools for AI assistants - case transformation, encoding, hash...
23 versions - Latest release: 9 days ago - 1.17 thousand downloads last month - 0 stars on GitHub - 1 maintainer
fibpetokenizer 0.1.1
A blazing fast Byte Pair Encoding (BPE) tokenizer library with Python bindings
2 versions - Latest release: 5 days ago - 1 maintainer
Top 9.4% on pypi.org
textpipe 0.12.2
textpipe: clean and extract metadata from text
38 versions - Latest release: about 5 years ago - 2 dependent repositories - 208 downloads last month - 302 stars on GitHub - 3 maintainers
utilsaxn 0.3.4
A modular set of data science utilities for EDA, cleaning, and more.
1 version - Latest release: 10 months ago - 4 downloads last month - 2 stars on GitHub - 1 maintainer
akin 1.0.1
Akin is a Python library for detecting near-duplicate texts using min-hashing and locality sensit...
3 versions - Latest release: about 1 year ago - 1 dependent repositories - 109 downloads last month - 8 stars on GitHub - 1 maintainer
sesdiff 0.3.2
Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings ...
2 versions - Latest release: over 1 year ago - 137 downloads last month - 7 stars on GitHub - 1 maintainer
markover 0.7
Natural Language Generator with Markov
21 versions - Latest release: over 2 years ago - 1 dependent repositories - 46 downloads last month - 27 stars on GitHub - 1 maintainer
chunking-strategy 0.4.1
A comprehensive chunking library for text, documents, audio, video, and data streams (Linux and m...
4 versions - Latest release: 5 months ago - 85 downloads last month - 2 stars on GitHub - 1 maintainer
morphify 0.1.2
Lightweight templating and value formatting utilities
1 version - Latest release: 5 months ago - 18 downloads last month - 0 stars on GitHub - 1 maintainer
pdf2textbox 0.4.3
A PDF-to-text converter based on pdfminer2
14 versions - Latest release: about 7 years ago - 4 dependent repositories - 170 downloads last month - 5 stars on GitHub - 1 maintainer
pythainlp-rust-modules 0.2.2
pythainlp-rust-modules is now nlpo3
3 versions - Latest release: over 4 years ago - 1 dependent repositories - 26 downloads last month - 40 stars on GitHub - 2 maintainers
Top 7.5% on pypi.org
piicloak 1.0.4
Enterprise-grade PII detection and anonymization API. Helps achieve GDPR/CCPA compliance. Support...
5 versions - Latest release: about 2 months ago - 399 downloads last month - 1 maintainer
diff-match-patch-cython 20121119
The Diff Match and Patch libraries offer robust algorithms to perform the operations required for...
1 version - Latest release: about 10 years ago - 2 dependent repositories - 6 downloads last month - 7,925 stars on GitHub - 2 maintainers
octolingo 0.3.0
A Python package for translating large texts with advanced features including OCR support.
16 versions - Latest release: 11 months ago - 53 downloads last month - 1 maintainer
wordwrap 0.2.4
A simple library for wrapping text to a fixed column width.
6 versions - Latest release: 5 months ago - 62 downloads last month - 1 maintainer
huggingface-text-data-analyzer 1.1.0
A comprehensive tool for analyzing text datasets from HuggingFace's datasets library
3 versions - Latest release: over 1 year ago - 34 downloads last month - 7 stars on GitHub - 1 maintainer
Top 9.4% on pypi.org
puristaa 2022.7.24
Puristaa (Finnish for compress) - shared prefix compression of ordered string sequences.
3 versions - Latest release: over 3 years ago - 1 dependent package - 2 dependent repositories - 13 downloads last month - 2 stars on GitHub - 1 maintainer
blobify 1.1.0
Package your entire codebase into a single text file for AI consumption
3 versions - Latest release: 7 months ago - 41 downloads last month - 0 stars on GitHub - 1 maintainer
kallia 0.1.6
Semantic Document Processing Library
7 versions - Latest release: 3 months ago - 47 downloads last month - 0 stars on GitHub - 1 maintainer
webis-html 1.0.4
Webis HTML extraction tool
5 versions - Latest release: 3 months ago - 84 downloads last month - 1 maintainer
amharic-text-processor 0.1.3
A modular toolkit for cleaning and normalizing Amharic text.
3 versions - Latest release: 3 months ago - 95 downloads last month - 1 maintainer
texthumanize 0.25.0
Algorithmic text humanization with AI detection, tone analysis, paraphrasing, and spinning — 20-s...
3 versions - Latest release: 8 days ago - 244 downloads last month - 1 maintainer
linesieve 1.0
An unholy blend of grep, sed, awk, and Python.
13 versions - Latest release: almost 3 years ago - 1 dependent repositories - 78 downloads last month - 10 stars on GitHub - 1 maintainer
infoextract-cidoc 0.1.7
LLM-powered CIDOC CRM v7.1.3 entity extraction from unstructured text — Pydantic models, Cypher e...
8 versions - Latest release: 8 days ago - 89 downloads last month - 1 maintainer
latincy-preprocess 0.1.2
Latin text preprocessing: U/V normalization, long-s correction, and more
3 versions - Latest release: 8 days ago - 186 downloads last month - 0 stars on GitHub - 1 maintainer
kkltk 1.0
kkltk is a toolkit designed for Kinyarwanda and Kirundi languages processing
1 version - Latest release: over 5 years ago - 1 dependent repositories - 8 downloads last month - 1 stars on GitHub - 1 maintainer
contextf 0.0.6
Efficient context builder
2 versions - Latest release: 4 months ago - 64 downloads last month - 0 stars on GitHub - 1 maintainer
Top 8.0% on pypi.org
tiny-tokenizer 3.4.0 💰
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with...
19 versions - Latest release: over 5 years ago - 16 dependent repositories - 171 downloads last month - 261 stars on GitHub - 1 maintainer
pycoreux 0.1.1
A Python library providing shell-like utilities for file operations, text processing, and subproc...
2 versions - Latest release: 7 months ago - 17 downloads last month - 3 stars on GitHub - 1 maintainer
burmese-tokenizer 0.1.3
A simple tokenizer for Burmese text
3 versions - Latest release: 7 months ago - 83 downloads last month - 1 stars on GitHub - 1 maintainer
yosina 1.0.0
Japanese text transliteration library
3 versions - Latest release: 6 months ago - 439 downloads last month - 19 stars on GitHub - 1 maintainer
ngram-polars 0.1.2
High-performance n-gram generation for Polars
2 versions - Latest release: 9 days ago - 177 downloads last month - 1 maintainer
forumscraper 0.1.21
A forum scraper library
31 versions - Latest release: 6 months ago - 388 downloads last month - 34 stars on GitHub - 1 maintainer
stringtools 3.0.1
stringtools provides string operations, such as analaysing, converting, generating, validating.
22 versions - Latest release: over 3 years ago - 1 dependent repositories - 189 downloads last month - 5 stars on GitHub - 1 maintainer
Top 3.7% on pypi.org
padatious 0.4.8
A neural network intent parser
25 versions - Latest release: almost 6 years ago - 3 dependent packages - 47 dependent repositories - 1.28 thousand downloads last month - 158 stars on GitHub - 1 maintainer
html-to-markdown 2.26.1 💰
High-performance HTML to Markdown converter powered by Rust with a clean Python API
109 versions - Latest release: 10 days ago - 328 thousand downloads last month - 533 stars on GitHub
dhelp 0.0.5
DH Python tools for scraping web pages, pre-processing data, and performing nlp analysis quickly.
4 versions - Latest release: almost 8 years ago - 1 dependent repositories - 45 downloads last month - 5 stars on GitHub - 1 maintainer
text2ics 0.1.2
A Python tool to convert unstructured text into an ICS calendar file using an LLM.
3 versions - Latest release: 8 months ago - 24 downloads last month - 0 stars on GitHub - 1 maintainer
furlanspellchecker 0.1.1
A comprehensive spell checker for the Friulian language with CLI and pipeline service.
2 versions - Latest release: 3 months ago - 25 downloads last month - 1 maintainer
pylocalvoice 0.1.1
A professional Python library for local voice and Hmong language processing
2 versions - Latest release: 5 months ago - 24 downloads last month - 0 stars on GitHub - 1 maintainer
morsecode-package 1.0.0
Python package for converting text to Morse code and vice versa
1 version - Latest release: over 1 year ago - 11 downloads last month - 0 stars on GitHub - 1 maintainer
long2short 0.1.4
A flexible text summarization library to summarize long documents supporting multiple LLM providers
5 versions - Latest release: about 1 year ago - 23 downloads last month - 1 maintainer
flashtext2 1.1.0
A package for extracting keywords from large text very quickly (much faster than regex and the or...
5 versions - Latest release: over 1 year ago - 1 dependent package - 2.51 thousand downloads last month - 23 stars on GitHub - 1 maintainer
trunajod 0.1.1
A python lib for readability analyses.
3 versions - Latest release: almost 5 years ago - 1 dependent repositories - 111 downloads last month - 30 stars on GitHub - 1 maintainer
Top 9.4% on pypi.org
wordcloud-fa 0.1.10 💰
A wrapper for wordcloud module for creating persian (and other rtl languages) word cloud.
10 versions - Latest release: over 3 years ago - 6 dependent repositories - 146 downloads last month - 145 stars on GitHub - 1 maintainer
fleetfluid 0.1.6
AI Agent Functions for ETL/Data Processing
6 versions - Latest release: 5 months ago - 59 downloads last month - 0 stars on GitHub - 1 maintainer
nahiarhdnlp 1.5.3
Advanced Indonesian Natural Language Processing Library
34 versions - Latest release: 2 months ago - 360 downloads last month - 1 maintainer
snipstr 1.0.3
A lightweight library for easy-to-use text truncation with a friendly interface.
7 versions - Latest release: about 2 months ago - 179 downloads last month - 0 stars on GitHub - 1 maintainer
char-index-mcp 0.2.1
A Model Context Protocol server for character-level index-based string manipulation
4 versions - Latest release: 4 months ago - 115 downloads last month - 1 maintainer
txt2phrases 1.0.3
A comprehensive library for text processing, keyword extraction, and classification from PDF and ...
7 versions - Latest release: 4 months ago - 111 downloads last month - 0 stars on GitHub - 2 maintainers
magic-profanity 2.0.1
A Python library for detecting and censoring profanity in text
4 versions - Latest release: 10 months ago - 1.81 thousand downloads last month - 1 stars on GitHub - 1 maintainer
contextgem 0.21.0
Effortless LLM extraction from documents
44 versions - Latest release: 15 days ago - 2.02 thousand downloads last month - 1,697 stars on GitHub - 1 maintainer
rb-tocase 1.3.2 💰
RB toCase is a Case converter.
2 versions - Latest release: about 4 years ago - 1 dependent repositories - 28 downloads last month - 3 stars on GitHub - 1 maintainer
wetextprocessing 1.0.3
WeTextProcessing, including TN & ITN
28 versions - Latest release: over 1 year ago - 2 dependent packages - 97.9 thousand downloads last month - 694 stars on GitHub - 2 maintainers
python-semantic-splitter 0.1.1
Semantic Python code splitter for AI/RAG pipelines
2 versions - Latest release: 3 months ago - 36 downloads last month - 1 maintainer
rs-bpe 0.1.0
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
1 version - Latest release: 12 months ago - 11.1 thousand downloads last month - 22 stars on GitHub - 1 maintainer
Top 7.1% on pypi.org
persian 1.1.0
Fast, typed Python library for Persian language localization
10 versions - Latest release: about 2 months ago - 1 dependent package - 7 dependent repositories - 698 downloads last month - 180 stars on GitHub - 1 maintainer
slugify-fr 0.1.0
Générateur de slugs URL-friendly optimisé pour le français
1 version - Latest release: 9 months ago - 4 downloads last month - 0 stars on GitHub - 1 maintainer
block-spinning 1.0.4
A Python module for block spinning
3 versions - Latest release: almost 6 years ago - 37 downloads last month - 1 stars on GitHub - 1 maintainer
xia-diff-match-patch 0.0.3
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
3 versions - Latest release: over 2 years ago - 1 dependent package - 38 downloads last month - 7,713 stars on GitHub - 1 maintainer
zhtw 2.8.7
Simplified/HK Traditional to Taiwan Traditional Chinese Converter
20 versions - Latest release: about 2 months ago - 427 downloads last month - 0 stars on GitHub - 1 maintainer
twat-llm 2.7.5
LLM integration for twat
14 versions - Latest release: about 1 year ago - 79 downloads last month - 1 stars on GitHub - 1 maintainer
tigrinya-nlp 0.1.2
Lightweight Tigrinya text preprocessing: normalization, cleaning, tokenization, and stopwords.
3 versions - Latest release: 2 months ago - 62 downloads last month - 0 stars on GitHub - 1 maintainer
clean-text-rhoni 0.1.14
package to clean and normalize text
2 versions - Latest release: over 2 years ago - 32 downloads last month - 0 stars on GitHub - 1 maintainer
magicconvert 0.1.3
MagicConvert is a Python library that converts various document formats (PDF, DOCX, XLSX, PPTX, H...
3 versions - Latest release: 10 months ago - 15 downloads last month - 2 stars on GitHub - 1 maintainer
personnamenorm 0.2
unifying person names in different notations
1 version - Latest release: over 5 years ago - 1 dependent repositories - 13 downloads last month - 2 stars on GitHub - 1 maintainer
fulfulde-stopwords 1.0
Stopwords for the Fulfulde language (Adamawa variant)
1 version - Latest release: about 2 months ago - 48 downloads last month - 1 maintainer