An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "text-processing" keyword

View the packages on the pypi.org package registry that are tagged with the "text-processing" keyword.

mrsnippets 2.0.1
A complete collection of commonly used code Snippets in Python
6 versions - Latest release: almost 5 years ago - 1 dependent repositories - 50 downloads last month - 2 stars on GitHub - 1 maintainer
greek-normalisation 0.5.1 💰
Python 3 utilities for validating and normalising Ancient Greek text
6 versions - Latest release: over 5 years ago - 4 dependent repositories - 112 downloads last month - 22 stars on GitHub - 1 maintainer
chiecthuyenngoaixa 0.2.1
An utility library for processing Vietnamese texts
5 versions - Latest release: over 1 year ago - 1 dependent repositories - 53 downloads last month - 5 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
pymupdf 1.27.1
A high performance Python library for data extraction, analysis, conversion & manipulation of PDF...
139 versions - Latest release: about 5 hours ago - 206 dependent packages - 1,798 dependent repositories - 41.9 million downloads last month - 8,706 stars on GitHub - 1 maintainer
l3wtransformer 0.3.0
A word hashing method based on vectors of letter n-grams. Currently transforms text into sequence...
2 versions - Latest release: over 7 years ago - 1 dependent repositories - 9 downloads last month - 10 stars on GitHub - 1 maintainer
cword 0.1.0
A CLI spelling assistant powered by AI
1 version - Latest release: 8 months ago - 14 downloads last month - 2 stars on GitHub - 1 maintainer
wizardspell 1.0.0
Dictionary-based spell checking with Unicode-aware tokenization and light normalization. Supports...
1 version - Latest release: 6 months ago - 13 downloads last month - 1 stars on GitHub - 1 maintainer
reliq 0.0.47
Python ctypes bindings for reliq
47 versions - Latest release: 14 days ago - 4.35 thousand downloads last month - 12 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
konoha 5.6.0
Add your description here
29 versions - Latest release: 10 months ago - 3 dependent packages - 134 dependent repositories - 94.1 thousand downloads last month - 260 stars on GitHub - 1 maintainer
Top 1.9% on pypi.org
pythainlp 5.2.0
Thai Natural Language Processing library
116 versions - Latest release: about 2 months ago - 37 dependent packages - 183 dependent repositories - 1.09 million downloads last month - 1,092 stars on GitHub - 2 maintainers
strip-bom 1.0.0
Strip UTF-8 byte order mark (BOM) from strings, bytes, streams, and files. Inspired by the popula...
1 version - Latest release: 11 days ago - 182 downloads last month
ruaccent-predictor 1.2.0
Russian stress accent prediction using Transformer model
2 versions - Latest release: 6 days ago
kyrgyz-normalizer 0.1.1
Кыргызча текст нормализатор NLP тапшырмалары үчүн / Kyrgyz text normalizer for NLP tasks
2 versions - Latest release: 5 days ago - 1 maintainer
cosmic-chunker 1.1.0
COSMIC: Concept-aware Semantic Meta-chunking with Intelligent Classification
1 version - Latest release: 12 days ago - 159 downloads last month
strkernel 0.2
Collection of string kernels
1 version - Latest release: over 7 years ago - 1 dependent repositories - 18 downloads last month - 17 stars on GitHub - 1 maintainer
matcher-py 0.5.8
A high-performance matcher designed to solve LOGICAL and TEXT VARIATIONS problems in word matchin...
39 versions - Latest release: 6 months ago - 979 downloads last month - 15 stars on GitHub - 1 maintainer
texy 0.1.0
Supercharge text processing
5 versions - Latest release: about 3 years ago - 66 downloads last month - 0 stars on GitHub - 1 maintainer
dict-fr-dela 2021.8.27
French dictionaries from Laboratoire d'Automatique Documentaire et Linguistique (LADL)
1 version - Latest release: over 4 years ago - 1 dependent repositories - 45 downloads last month - 0 stars on GitHub - 1 maintainer
Top 1.1% on pypi.org
pyparsing 3.3.2 💰
pyparsing - Classes and methods to define and execute parsing grammars
88 versions - Latest release: 22 days ago - 1,663 dependent packages - 264,180 dependent repositories - 262 million downloads last month - 2,216 stars on GitHub - 1 maintainer
rs-document 0.1.8
High-performance Rust implementation of LangChain's Document model and Unstructured.io's text cle...
10 versions - Latest release: 2 months ago - 4.38 thousand downloads last month - 1 stars on GitHub - 1 maintainer
transpolibre 0.8.15 💰
Automate translation of gettext PO files using LibreTranslate, Ollama, and local models
9 versions - Latest release: 6 months ago - 52 downloads last month - 107 stars on GitHub - 1 maintainer
html5lib-truncation 0.1.0
Truncating HTML with html5lib filter
1 version - Latest release: almost 11 years ago - 5 dependent repositories - 221 downloads last month - 11 stars on GitHub - 1 maintainer
blabla 0.2.2
Novoic linguistics feature extraction package.
4 versions - Latest release: over 5 years ago - 48 downloads last month - 32 stars on GitHub - 1 maintainer
trelasticext 0.2.1
Elasticsearch Extensions for Hebrew and Multi-language Text Processing
3 versions - Latest release: 28 days ago - 14 downloads last month - 0 stars on GitHub - 1 maintainer
Top 5.2% on pypi.org
lingua-franca 0.4.3
Mycroft's multilingual text parsing and formatting library
9 versions - Latest release: over 3 years ago - 1 dependent package - 24 dependent repositories - 1.02 thousand downloads last month - 77 stars on GitHub - 1 maintainer
ultranlp 1.0.6
Ultra-fast, comprehensive NLP preprocessing library with advanced tokenization
6 versions - Latest release: 6 months ago - 62 downloads last month - 0 stars on GitHub - 1 maintainer
voxera 0.0.1
An Open-Source Persian Language Techs Toolkit with Python
1 version - Latest release: over 3 years ago - 25 downloads last month - 5 stars on GitHub - 1 maintainer
markover 0.7
Natural Language Generator with Markov
21 versions - Latest release: over 2 years ago - 1 dependent repositories - 51 downloads last month - 27 stars on GitHub - 1 maintainer
Top 9.4% on pypi.org
textpipe 0.12.2
textpipe: clean and extract metadata from text
38 versions - Latest release: about 5 years ago - 2 dependent repositories - 424 downloads last month - 302 stars on GitHub - 3 maintainers
pdf2textbox 0.4.3
A PDF-to-text converter based on pdfminer2
14 versions - Latest release: almost 7 years ago - 4 dependent repositories - 118 downloads last month - 5 stars on GitHub - 1 maintainer
doc2term 0.1
A fast NLP tokenizer that detects tokens and remove duplications and punctuations
1 version - Latest release: over 4 years ago - 1 dependent repositories - 11 downloads last month - 2 stars on GitHub - 1 maintainer
catalyst-dkms 0.1.0
Domain Knowledge Management System - preprocessing toolset for heterogeneous text ingestion
1 version - Latest release: 4 months ago - 20 downloads last month - 1 maintainer
hamtaa-texttools 2.3.0
A high-level NLP toolkit built on top of modern LLMs.
57 versions - Latest release: 7 days ago - 650 downloads last month - 2 maintainers
natsulang 1.0.0b11
A text-processing language based on Python 3.
10 versions - Latest release: over 5 years ago - 1 dependent repositories - 88 downloads last month - 8 stars on GitHub - 1 maintainer
textmining3 1.1.0
Text Mining Utilities for Python 3
2 versions - Latest release: over 7 years ago - 1 dependent repositories - 52 downloads last month - 1 stars on GitHub - 1 maintainer
vecclean 1.0.0
Production-ready text cleaning, deduplication, and vectorization pipeline with C++ acceleration
3 versions - Latest release: 7 months ago - 38 downloads last month - 1 maintainer
html-to-markdown 2.24.5
High-performance HTML to Markdown converter powered by Rust with a clean Python API
105 versions - Latest release: 10 days ago - 237 thousand downloads last month - 409 stars on GitHub
Top 1.4% on pypi.org
pyarabic 0.6.15 💰
Arabic text tools for Python
18 versions - Latest release: over 3 years ago - 7 dependent packages - 34 dependent repositories - 113 thousand downloads last month - 439 stars on GitHub - 1 maintainer
turkish-twitter-preprocess 0.0.7
a light-weight python package to pre-process turkish twitter statuses(tweets).
7 versions - Latest release: about 5 years ago - 1 dependent repositories - 71 downloads last month - 7 stars on GitHub - 1 maintainer
textcl 1.0.1
Text preprocessing package for use in NLP tasks
6 versions - Latest release: over 1 year ago - 1 dependent repositories - 52 downloads last month - 11 stars on GitHub - 1 maintainer
trunajod 0.1.1
A python lib for readability analyses.
3 versions - Latest release: almost 5 years ago - 1 dependent repositories - 91 downloads last month - 29 stars on GitHub - 1 maintainer
mytext 0.0.0
MyText: A Minimal AI-Powered Text Rewriting Tool
5 versions - Latest release: 3 months ago - 83 downloads last month - 1 maintainer
yurenizer 0.2.2
A library for standardizing terms with spelling variations using a synonym dictionary.
18 versions - Latest release: about 1 year ago - 170 downloads last month - 4 stars on GitHub - 1 maintainer
hdporncomics 0.0.13
A full api for hdporncomics
13 versions - Latest release: 3 months ago - 84 downloads last month - 0 stars on GitHub - 1 maintainer
sesdiff 0.3.2
Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings ...
2 versions - Latest release: over 1 year ago - 202 downloads last month - 7 stars on GitHub - 1 maintainer
utilsaxn 0.3.4
A modular set of data science utilities for EDA, cleaning, and more.
1 version - Latest release: 9 months ago - 20 downloads last month - 2 stars on GitHub - 1 maintainer
kallia 0.1.6
Semantic Document Processing Library
7 versions - Latest release: about 2 months ago - 96 downloads last month - 0 stars on GitHub - 1 maintainer
shekar 1.3.1
Simplifying Persian NLP for Modern Applications
36 versions - Latest release: about 2 months ago - 412 downloads last month - 52 stars on GitHub - 1 maintainer
octolingo 0.3.0
A Python package for translating large texts with advanced features including OCR support.
16 versions - Latest release: 10 months ago - 171 downloads last month - 1 maintainer
nametract 1.1.3
Simple and stupid name extraction
6 versions - Latest release: over 4 years ago - 1 dependent repositories - 58 downloads last month - 0 stars on GitHub - 1 maintainer
piicloak 1.0.4
Enterprise-grade PII detection and anonymization API. Helps achieve GDPR/CCPA compliance. Support...
5 versions - Latest release: 20 days ago - 399 downloads last month
lingo-nlp-toolkit 0.2.2
Advanced NLP Toolkit - Lightweight, Fast, and Transformer-Ready
5 versions - Latest release: 6 months ago - 35 downloads last month - 0 stars on GitHub - 1 maintainer
webis-html 1.0.4
Webis HTML extraction tool
5 versions - Latest release: about 2 months ago - 84 downloads last month - 1 maintainer
konfuzio-sdk 0.3.47
Konfuzio Software Development Kit
530 versions - Latest release: 3 months ago - 1 dependent repositories - 2.67 thousand downloads last month - 54 stars on GitHub - 1 maintainer
chunking-strategy 0.4.1
A comprehensive chunking library for text, documents, audio, video, and data streams (Linux and m...
4 versions - Latest release: 4 months ago - 52 downloads last month - 2 stars on GitHub - 1 maintainer
pomera-ai-commander 1.3.7
Text processing toolkit with 22 MCP tools for AI assistants - case transformation, encoding, hash...
16 versions - Latest release: 11 days ago - 1.44 thousand downloads last month - 0 stars on GitHub - 1 maintainer
morphify 0.1.2
Lightweight templating and value formatting utilities
1 version - Latest release: 4 months ago - 19 downloads last month - 0 stars on GitHub - 1 maintainer
dict-fr-abu 2021.8.27
French dictionaries from Association des Bibliophiles Universels (ABU)
2 versions - Latest release: over 4 years ago - 1 dependent repositories - 41 downloads last month - 0 stars on GitHub - 1 maintainer
linesieve 1.0
An unholy blend of grep, sed, awk, and Python.
13 versions - Latest release: almost 3 years ago - 1 dependent repositories - 78 downloads last month - 10 stars on GitHub - 1 maintainer
blobify 1.1.0
Package your entire codebase into a single text file for AI consumption
3 versions - Latest release: 6 months ago - 35 downloads last month - 0 stars on GitHub - 1 maintainer
amharic-text-processor 0.1.3
A modular toolkit for cleaning and normalizing Amharic text.
3 versions - Latest release: about 2 months ago - 35 downloads last month - 1 maintainer
akin 1.0.1
Akin is a Python library for detecting near-duplicate texts using min-hashing and locality sensit...
3 versions - Latest release: 12 months ago - 1 dependent repositories - 59 downloads last month - 8 stars on GitHub - 1 maintainer
huggingface-text-data-analyzer 1.1.0
A comprehensive tool for analyzing text datasets from HuggingFace's datasets library
3 versions - Latest release: about 1 year ago - 31 downloads last month - 7 stars on GitHub - 1 maintainer
pasban 1.0.1
Pure Persian text processing and foreign word detection library
1 version - Latest release: 4 months ago - 17 downloads last month - 1 maintainer
gaspra 0.1.0a3
A fast Python tool for searching, diffing, and merging text
2 versions - Latest release: about 2 years ago - 28 downloads last month - 1 stars on GitHub - 1 maintainer
pycoreux 0.1.1
A Python library providing shell-like utilities for file operations, text processing, and subproc...
2 versions - Latest release: 7 months ago - 25 downloads last month - 3 stars on GitHub - 1 maintainer
contextf 0.0.6
Efficient context builder
2 versions - Latest release: 4 months ago - 28 downloads last month - 0 stars on GitHub - 1 maintainer
pythainlp-rust-modules 0.2.2
pythainlp-rust-modules is now nlpo3
3 versions - Latest release: over 4 years ago - 1 dependent repositories - 100 downloads last month - 36 stars on GitHub - 2 maintainers
Top 9.4% on pypi.org
puristaa 2022.7.24
Puristaa (Finnish for compress) - shared prefix compression of ordered string sequences.
3 versions - Latest release: over 3 years ago - 1 dependent package - 2 dependent repositories - 13 downloads last month - 2 stars on GitHub - 1 maintainer
kkltk 1.0
kkltk is a toolkit designed for Kinyarwanda and Kirundi languages processing
1 version - Latest release: over 5 years ago - 1 dependent repositories - 8 downloads last month - 1 stars on GitHub - 1 maintainer
ripgrep 15.0.0 💰
ripgrep is a line-oriented search tool that recursively searches the current directory for a rege...
3 versions - Latest release: 4 months ago - 31.6 thousand downloads last month - 56,925 stars on GitHub - 1 maintainer
diff-match-patch-cython 20121119
The Diff Match and Patch libraries offer robust algorithms to perform the operations required for...
1 version - Latest release: about 10 years ago - 2 dependent repositories - 6 downloads last month - 7,925 stars on GitHub - 2 maintainers
wordwrap 0.2.4
A simple library for wrapping text to a fixed column width.
6 versions - Latest release: 4 months ago - 62 downloads last month - 1 maintainer
burmese-tokenizer 0.1.3
A simple tokenizer for Burmese text
3 versions - Latest release: 6 months ago - 33 downloads last month - 1 stars on GitHub - 1 maintainer
stringtools 3.0.1
stringtools provides string operations, such as analaysing, converting, generating, validating.
22 versions - Latest release: about 3 years ago - 1 dependent repositories - 189 downloads last month - 5 stars on GitHub - 1 maintainer
Top 3.7% on pypi.org
padatious 0.4.8
A neural network intent parser
25 versions - Latest release: over 5 years ago - 3 dependent packages - 47 dependent repositories - 1.28 thousand downloads last month - 158 stars on GitHub - 1 maintainer
dhelp 0.0.5
DH Python tools for scraping web pages, pre-processing data, and performing nlp analysis quickly.
4 versions - Latest release: over 7 years ago - 1 dependent repositories - 50 downloads last month - 5 stars on GitHub - 1 maintainer
text2ics 0.1.2
A Python tool to convert unstructured text into an ICS calendar file using an LLM.
3 versions - Latest release: 7 months ago - 55 downloads last month - 0 stars on GitHub - 1 maintainer
furlanspellchecker 0.1.1
A comprehensive spell checker for the Friulian language with CLI and pipeline service.
2 versions - Latest release: 2 months ago - 35 downloads last month - 1 maintainer
pylocalvoice 0.1.1
A professional Python library for local voice and Hmong language processing
2 versions - Latest release: 4 months ago - 26 downloads last month - 0 stars on GitHub - 1 maintainer
morsecode-package 1.0.0
Python package for converting text to Morse code and vice versa
1 version - Latest release: over 1 year ago - 18 downloads last month - 0 stars on GitHub - 1 maintainer
flashtext2 1.1.0
A package for extracting keywords from large text very quickly (much faster than regex and the or...
5 versions - Latest release: over 1 year ago - 1 dependent package - 2.05 thousand downloads last month - 23 stars on GitHub - 1 maintainer
long2short 0.1.4
A flexible text summarization library to summarize long documents supporting multiple LLM providers
5 versions - Latest release: about 1 year ago - 51 downloads last month - 1 maintainer
Top 9.4% on pypi.org
wordcloud-fa 0.1.10 💰
A wrapper for wordcloud module for creating persian (and other rtl languages) word cloud.
10 versions - Latest release: over 3 years ago - 6 dependent repositories - 125 downloads last month - 145 stars on GitHub - 1 maintainer
fleetfluid 0.1.6
AI Agent Functions for ETL/Data Processing
6 versions - Latest release: 5 months ago - 50 downloads last month - 0 stars on GitHub - 1 maintainer
nahiarhdnlp 1.5.3
Advanced Indonesian Natural Language Processing Library
34 versions - Latest release: about 1 month ago - 607 downloads last month - 1 maintainer
snipstr 1.0.3
A lightweight library for easy-to-use text truncation with a friendly interface.
7 versions - Latest release: about 1 month ago - 179 downloads last month - 0 stars on GitHub - 1 maintainer
txt2phrases 1.0.3
A comprehensive library for text processing, keyword extraction, and classification from PDF and ...
7 versions - Latest release: 3 months ago - 114 downloads last month - 0 stars on GitHub - 2 maintainers
char-index-mcp 0.2.1
A Model Context Protocol server for character-level index-based string manipulation
4 versions - Latest release: 3 months ago - 85 downloads last month - 1 maintainer
ihroteka-converter 1.1.2
A lightweight package for converting Markdown into Steam-compatible markup.
4 versions - Latest release: about 1 month ago - 144 downloads last month - 1 maintainer
magic-profanity 2.0.1
A Python library for detecting and censoring profanity in text
4 versions - Latest release: 9 months ago - 1.81 thousand downloads last month - 1 stars on GitHub - 1 maintainer
hangulpy 1.3.0
A comprehensive Python library for Korean language processing, inspired by es-hangul
11 versions - Latest release: 3 months ago - 423 downloads last month - 1 stars on GitHub - 1 maintainer
contextgem 0.19.4
Effortless LLM extraction from documents
42 versions - Latest release: about 2 months ago - 2.02 thousand downloads last month - 1,697 stars on GitHub - 1 maintainer
wetextprocessing 1.0.3
WeTextProcessing, including TN & ITN
28 versions - Latest release: over 1 year ago - 2 dependent packages - 97.9 thousand downloads last month - 694 stars on GitHub - 2 maintainers
rb-tocase 1.3.2 💰
RB toCase is a Case converter.
2 versions - Latest release: about 4 years ago - 1 dependent repositories - 28 downloads last month - 3 stars on GitHub - 1 maintainer
rs-bpe 0.1.0
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
1 version - Latest release: 11 months ago - 5.68 thousand downloads last month - 22 stars on GitHub - 1 maintainer
Top 7.1% on pypi.org
persian 1.1.0
Fast, typed Python library for Persian language localization
10 versions - Latest release: 28 days ago - 1 dependent package - 7 dependent repositories - 1.14 thousand downloads last month - 180 stars on GitHub - 1 maintainer
twat-llm 2.7.5
LLM integration for twat
14 versions - Latest release: 11 months ago - 155 downloads last month - 1 stars on GitHub - 1 maintainer
magicconvert 0.1.3
MagicConvert is a Python library that converts various document formats (PDF, DOCX, XLSX, PPTX, H...
3 versions - Latest release: 9 months ago - 121 downloads last month - 2 stars on GitHub - 1 maintainer
xia-diff-match-patch 0.0.3
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
3 versions - Latest release: over 2 years ago - 1 dependent package - 57 downloads last month - 7,713 stars on GitHub - 1 maintainer