An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "text processing" keyword

Top 1.9% on pypi.org
pythainlp 5.3.0
Thai Natural Language Processing library
117 versions - Latest release: about 23 hours ago - 37 dependent packages - 183 dependent repositories - 1.13 million downloads last month - 1,114 stars on GitHub - 2 maintainers
doti18n 0.9.1
Python library for loading localizations with dot access and pluralization.
22 versions - Latest release: about 8 hours ago - 884 downloads last month - 3 stars on GitHub - 1 maintainer
token-distance 0.2.5
Python library designed to perform fuzzy token matching within text documents. Utilizing advanced...
6 versions - Latest release: 12 months ago - 20 downloads last month - 0 stars on gitlab.com - 1 maintainer
codexcollector 0.2.0
A wrapper for common Python packages to build a text corpus with a single line of code.
2 versions - Latest release: about 2 months ago - 66 downloads last month - 1 maintainer
thaitextaug 0.0.4 💰
Thai Text Augmentation
15 versions - Latest release: almost 5 years ago - 1 dependent repositories - 86 downloads last month - 5 stars on GitHub - 1 maintainer
Top 1.0% on pypi.org
textacy 0.13.0
NLP, before and after spaCy
32 versions - Latest release: almost 3 years ago - 18 dependent packages - 436 dependent repositories - 67.8 thousand downloads last month - 2,214 stars on GitHub - 1 maintainer
Top 7.9% on pypi.org
nemo-text-processing 1.1.0
NeMo text processing for ASR and TTS
14 versions - Latest release: over 1 year ago - 2 dependent packages - 1 dependent repositories - 108 thousand downloads last month - 274 stars on GitHub - 1 maintainer
mathstring 0.1.0
English:
1 version - Latest release: 9 months ago - 14 downloads last month - 1 maintainer
fkscore 2.0.1
Flesch Kincaid readability scoring algorithm
7 versions - Latest release: about 2 years ago - 1 dependent repositories - 604 downloads last month - 2 stars on GitHub - 1 maintainer
thai2transformers 0.1.2
Pretraining transformer based Thai language models
8 versions - Latest release: almost 5 years ago - 1 dependent repositories - 104 downloads last month - 114 stars on GitHub - 1 maintainer
cleantextkit 0.1.1
A preprocessor which performs operations of lowering text, removing special characters and removi...
2 versions - Latest release: over 2 years ago - 30 downloads last month - 1 maintainer
Top 3.0% on pypi.org
quantulum3 0.9.2 💰
Extract quantities from unstructured text.
41 versions - Latest release: over 1 year ago - 8 dependent packages - 44 dependent repositories - 228 thousand downloads last month - 142 stars on GitHub - 1 maintainer
pypage 2.2.1
Light-weight Python Templating Engine
10 versions - Latest release: 11 months ago - 1 dependent package - 1 dependent repositories - 34 downloads last month - 31 stars on GitHub - 1 maintainer
textformatter-plus 1.0.4
A powerful Python package for text formatting and validation. Includes utilities for text transfo...
4 versions - Latest release: 3 months ago - 36 downloads last month - 1 maintainer
nupunkt-rs 0.1.1
High-performance Rust implementation of nupunkt sentence/paragraph tokenization
2 versions - Latest release: 7 months ago - 1.94 thousand downloads last month - 0 stars on GitHub - 1 maintainer
zalgolib 0.2.2
A Python library for a _FULL_ Zalgo experience
4 versions - Latest release: over 2 years ago - 1 dependent package - 1 dependent repositories - 19.5 thousand downloads last month - 6 stars on GitHub - 1 maintainer
aiwand 0.4.37
A simple AI toolkit for text processing using OpenAI and Gemini APIs
42 versions - Latest release: 5 months ago - 724 downloads last month - 0 stars on GitHub - 1 maintainer
biz-dfch-ste100parser 0.2.3
An ASD-STE100 (Simplified Technical English) parser.
8 versions - Latest release: 15 days ago - 241 downloads last month - 1 maintainer
intelli3text 0.2.7
Ingestion (web/PDF/DOCX/TXT), cleaning, paragraph-level LID (PT/EN/ES), and spaCy-based normaliza...
7 versions - Latest release: 5 months ago - 43 downloads last month - 1 maintainer
regex-buddy 1.0.0
Plain English to Regex - Your friendly regex helper
1 version - Latest release: about 2 months ago - 29 downloads last month - 1 maintainer
charboundary 0.5.0
Fast character-based boundary detection for sentence and paragraphs
13 versions - Latest release: 11 months ago - 263 downloads last month - 4 stars on GitHub - 1 maintainer
ingredient-slicer 1.2.21
Parses unstructured recipe ingredient text into standardized quantities, units, and foods
47 versions - Latest release: 11 months ago - 432 downloads last month - 1 maintainer
rdatools 0.1.7
tools for relational discourse analysis
2 versions - Latest release: over 8 years ago - 1 dependent repositories - 7 downloads last month - 1 maintainer
chunklet-py 2.2.0
High-fidelity context-aware chunking and interactive visualization for RAG. Advanced segmentation...
8 versions - Latest release: 16 days ago - 494 downloads last month - 62 stars on GitHub - 1 maintainer
linesieve 1.0
An unholy blend of grep, sed, awk, and Python.
13 versions - Latest release: almost 3 years ago - 1 dependent repositories - 78 downloads last month - 10 stars on GitHub - 1 maintainer
seanox-ai-nlp 1.3.0
Lightweight NLP components for semantic processing of domain-specific content.
5 versions - Latest release: 5 months ago - 38 downloads last month - 0 stars on GitHub - 1 maintainer
Top 3.0% on pypi.org
addheader 0.3.2
A command to manage a header section for a source code tree
9 versions - Latest release: about 3 years ago - 3 dependent packages - 19 dependent repositories - 18 thousand downloads last month - 1 stars on GitHub - 1 maintainer
spacy-pythainlp 1.0
PyThaiNLP For spaCy
10 versions - Latest release: about 1 month ago - 1 dependent repositories - 486 downloads last month - 13 stars on GitHub - 1 maintainer
wakong 1.1.1 💰
Wakong: An appropriate and robust masking algorithm for generating the training objective of text...
3 versions - Latest release: over 3 years ago - 1 dependent repositories - 22 downloads last month - 3 stars on GitHub - 1 maintainer
wordtonumber 1.1.0
A Python library for converting words to numbers.
4 versions - Latest release: 11 months ago - 49 downloads last month - 1 stars on GitHub - 1 maintainer
Top 9.7% on pypi.org
thai-nner 0.3
Thai Nested Named Entity Recognition
3 versions - Latest release: almost 4 years ago - 1 dependent package - 3 dependent repositories - 147 downloads last month - 47 stars on GitHub - 2 maintainers
processtext 0.1.7
An open-source python package to process text data
10 versions - Latest release: about 2 years ago - 49 downloads last month - 5 stars on GitHub - 1 maintainer
flexi-nlp-tools 0.6.0
NLP toolkit based on the flexi-dict data structure, designed for efficient fuzzy search, with a f...
21 versions - Latest release: 7 months ago - 120 downloads last month - 1 maintainer
pythaitts 0.4.2
Open Source Thai Text-to-speech library in Python
8 versions - Latest release: about 1 month ago - 1 dependent repositories - 686 downloads last month - 27 stars on GitHub - 1 maintainer
Top 9.4% on pypi.org
wordcloud-fa 0.1.10 💰
A wrapper for wordcloud module for creating persian (and other rtl languages) word cloud.
10 versions - Latest release: over 3 years ago - 6 dependent repositories - 146 downloads last month - 145 stars on GitHub - 1 maintainer
corpuskit 0.1.1
Corpus analysis and processing toolkit
2 versions - Latest release: 10 months ago - 25 downloads last month - 1 maintainer
chunklet 1.4.0
A smart multilingual text chunker for LLMs, RAG, and beyond.
19 versions - Latest release: 7 months ago - 162 downloads last month - 23 stars on GitHub - 1 maintainer
quantulum 0.1.16
Extract quantities from unstructured text.
17 versions - Latest release: over 2 years ago - 1 dependent package - 4 dependent repositories - 54 downloads last month - 119 stars on GitHub - 1 maintainer
nlup 0.8
('Core libraries for natural language processing',)
4 versions - Latest release: about 7 years ago - 11 dependent repositories - 6.1 thousand downloads last month - 10 stars on GitHub - 3 maintainers
vogo 0.1.0
Librería Python para interfaces multimodales accesibles - Reconocimiento de voz, gestos y comandos
1 version - Latest release: 4 months ago - 15 downloads last month - 1 maintainer
lekcut 0.1
LEKCut (เล็ก คัด) is a Thai tokenization library that ports the deep learning model to the onnx m...
1 version - Latest release: over 3 years ago - 71 downloads last month - 7 stars on GitHub - 1 maintainer
lttl 2.1.0
LangTech Text Library (LTTL) for text processing and analysis
24 versions - Latest release: about 1 year ago - 1 dependent repositories - 2.36 thousand downloads last month - 3 stars on GitHub - 1 maintainer
Top 6.6% on pypi.org
huspacy 0.12.1 💰
HuSpaCy: industrial strength Hungarian natural language processing
23 versions - Latest release: over 1 year ago - 1 dependent package - 6 dependent repositories - 3.82 thousand downloads last month - 142 stars on GitHub - 1 maintainer
delb 0.5.1
A library that provides an ergonomic model for XML encoded text documents (e.g. with TEI-XML).
34 versions - Latest release: about 1 year ago - 2 dependent packages - 4 dependent repositories - 643 downloads last month - 17 stars on GitHub - 1 maintainer
easynertag 0.2 💰
Easy tagging for annotate NER corpus
2 versions - Latest release: over 3 years ago - 24 downloads last month - 2 stars on GitHub - 1 maintainer
pyunormalize 17.0.0
A library for Unicode normalization (NFC, NFD, NFKC, NFKD) independent of Python's core Unicode d...
8 versions - Latest release: 5 months ago - 12 dependent packages - 8 dependent repositories - 4.85 million downloads last month - 9 stars on GitHub - 1 maintainer
multiel 0.5
Multilingual Entity Linking model by BELA model
5 versions - Latest release: over 2 years ago - 1 dependent package - 198 downloads last month - 12 stars on GitHub - 1 maintainer
fast-dedupe 0.1.1
Fast, Minimalist Text Deduplication Library for Python
2 versions - Latest release: 12 months ago - 24 downloads last month - 1 maintainer
nlp-wowool-sdk 3.6.0
Wowool SDK
1 version - Latest release: 19 days ago - 1 downloads last month - 1 maintainer
printb 1.0.2 💰
printb is a wrapper for print/input built-ins, that swaps string directions for BIDI languages.
1 version - Latest release: over 4 years ago - 1 dependent repositories - 15 downloads last month - 0 stars on GitHub - 1 maintainer
gatenlp-ml-tner 0.1.0a1
Train and use transformer token classification models using tner
1 version - Latest release: over 3 years ago - 10 downloads last month - 0 stars on GitHub - 1 maintainer
hadal 0.0.3
Tool for mining/alignment parallel texts
3 versions - Latest release: over 2 years ago - 22 downloads last month - 6 stars on GitHub - 1 maintainer
Top 6.1% on pypi.org
wordaxe 1.0.1
Provide hyphenation for python programs and ReportLab paragraphs.
5 versions - Latest release: over 2 years ago - 1 dependent package - 4 dependent repositories - 1 maintainer
markdowncleaner 0.3.1
A tool for cleaning and formatting markdown documents
4 versions - Latest release: 2 months ago - 51.3 thousand downloads last month - 2 stars on GitHub - 1 maintainer
artless-template 0.6.3
The artless and minimalist templating for Python server-side rendering.
12 versions - Latest release: 7 months ago - 265 downloads last month - 6,288 stars on GitHub - 1 maintainer
data-filtering 0.1.21
A library to filter and deduplicate Q&A text datasets from CSV files.
4 versions - Latest release: 9 months ago - 40 downloads last month - 1 maintainer
tts-text-norm 1.0.0
多语言文本规范化库,支持中文、日语和英语
1 version - Latest release: 4 months ago - 1 maintainer
anpe 1.1.3
Accurately extract complete noun phrases with customisation and strctural output.
13 versions - Latest release: 10 months ago - 293 downloads last month - 0 stars on GitHub - 1 maintainer
ai-data-preprocessing-queue 1.7.0
Can be used to pre process data before ai processing
6 versions - Latest release: 4 months ago - 385 downloads last month - 1 maintainer
arabicscript 0.1.4
Tools for Arabic script
4 versions - Latest release: over 9 years ago - 1 dependent repositories - 519 downloads last month - 8 stars on GitHub - 1 maintainer
breame 0.1.2
Breame is a lightweight Python package with a number of tools to aid in the detection of words th...
3 versions - Latest release: over 4 years ago - 1 dependent repositories - 8.46 thousand downloads last month - 16 stars on GitHub - 1 maintainer
jange 0.1.7
Easy NLP library for Python
8 versions - Latest release: over 4 years ago - 1 dependent repositories - 30 downloads last month - 18 stars on GitHub - 1 maintainer
pug 0.1.22
Meta package to install the PDX Python User Group utilities.
11 versions - Latest release: almost 11 years ago - 8 dependent repositories - 90 downloads last month - 12 stars on GitHub - 1 maintainer
pyosis 0.2.1
Unofficial Python client for parsing OSIS (Open Scriptural Information Standard) files
4 versions - Latest release: 8 months ago - 149 downloads last month - 0 stars on GitHub - 1 maintainer
pewanalytics 1.1.1
Utilities for text processing and statistical analysis from Pew Research Center
5 versions - Latest release: about 4 years ago - 1 dependent repositories - 31 downloads last month - 86 stars on GitHub - 1 maintainer
tex-untag 1.3.0
A script for removing all of a given markup tag from a set of TeX files.
6 versions - Latest release: about 4 years ago - 1 dependent repositories - 39 downloads last month - 1 stars on GitHub - 1 maintainer
nlp-helper 0.0.6
A small collection of NLP utility functions.
6 versions - Latest release: 9 months ago - 34 downloads last month - 1 maintainer
textdatasetcleaner 0.0.6
Pipeline for cleaning (preprocessing/normalizing) text datasets
4 versions - Latest release: about 5 years ago - 1 dependent repositories - 49 downloads last month - 40 stars on GitHub - 1 maintainer
fastopic 1.0.1
FASTopic
8 versions - Latest release: 9 months ago - 1.21 thousand downloads last month - 115 stars on GitHub - 1 maintainer
word2num-converter 1.0.0
A Python library to convert number words (e.g., twenty-one) to numeric digits (e.g., 21) for Engl...
2 versions - Latest release: 10 months ago - 0 stars on GitHub - 1 maintainer
sculptor 0.2.0
Sculptor: Structuring unstructured data with LLMs
7 versions - Latest release: 11 months ago - 54 downloads last month - 34 stars on GitHub - 1 maintainer
pricetag 1.0.0
A pure-Python library for extracting price and currency information from unstructured text
1 version - Latest release: 7 months ago - 23 downloads last month - 0 stars on GitHub - 1 maintainer
thaibraille 0.1.dev2 💰
Thai Braille for Natural Language Processing.
3 versions - Latest release: almost 3 years ago - 27 downloads last month - 3 stars on GitHub - 1 maintainer
docdump 1.0.4
A package to extract text from common document types.
5 versions - Latest release: over 5 years ago - 1 dependent repositories - 25 downloads last month - 0 stars on GitHub - 1 maintainer
tamilkavi 0.6.0
A command-line tool for exploring Tamil Kavithaigal.
5 versions - Latest release: about 2 months ago - 18 downloads last month - 0 stars on GitHub - 1 maintainer
intertext 0.0.1
tools for relational discourse analysis
1 version - Latest release: over 8 years ago - 1 dependent repositories - 16 downloads last month - 1 maintainer
texterra 1.0.1
API for natural language processing.
2 versions - Latest release: over 8 years ago - 2 dependent repositories - 33 downloads last month - 1 maintainer
pylda2vec 1.0.0
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec
2 versions - Latest release: about 7 years ago - 1 dependent repositories - 24 downloads last month - 30 stars on GitHub - 1 maintainer
fstflow 0.1.0
A Python library to simplify the creation and manipulation of finite state transducers
1 version - Latest release: 4 months ago - 1 maintainer
bibleparser 0.0.2
Parse a mistranscribed dictated bible reference into a standard format
2 versions - Latest release: over 1 year ago - 50 downloads last month - 0 stars on GitHub - 1 maintainer
sculpt 0.1.35
Sculpt: Structuring unstructured data with LLMs
3 versions - Latest release: 11 months ago - 3.24 thousand downloads last month - 35 stars on GitHub - 2 maintainers
khamyo 0.3.0 💰
Thai abbreviation to full text library
4 versions - Latest release: over 1 year ago - 1 dependent package - 1 dependent repositories - 192 downloads last month - 6 stars on GitHub - 1 maintainer
tainlp 0.0.1.dev0
Tai Natural Language Processing library
1 version - Latest release: almost 3 years ago - 9 downloads last month - 1 maintainer
nupunkt 0.6.0
Next-generation Punkt sentence and paragraph boundary detection with zero dependencies
6 versions - Latest release: 7 months ago - 4.04 thousand downloads last month - 19 stars on GitHub - 1 maintainer
sofairfilter 1.0.0
Tool for identifying candidate documents for software mention extraction.
1 version - Latest release: 8 months ago - 9 downloads last month - 1 maintainer
freq-frame 1.0.0
1 version - Latest release: almost 5 years ago - 1 dependent repositories - 10 downloads last month - 1 maintainer
my-feature-package 0.1.0
Text formatting and validation utilities
1 version - Latest release: 3 months ago - 22 downloads last month - 1 maintainer
xylo.py 0.2.0
A powerful template engine with Python expression evaluation
10 versions - Latest release: 28 days ago - 398 downloads last month - 1 maintainer
gatenlp 1.0.8
GATE NLP implementation in Python.
29 versions - Latest release: over 3 years ago - 2 dependent repositories - 1.02 thousand downloads last month - 66 stars on GitHub - 3 maintainers
lingualab 3.5.10
A multilingual text and voice processing toolkit
10 versions - Latest release: 7 months ago - 37 downloads last month - 1 maintainer
dtbag 3.1.1
Data Tool Bag (dtbag) - A Python library for text processing, data cleaning, and similarity-based...
4 versions - Latest release: 3 months ago - 111 downloads last month - 1 maintainer
qante 0.0.5
qante - Query ANnotated TExt
5 versions - Latest release: over 2 years ago - 21 downloads last month - 5 stars on GitHub - 1 maintainer
Top 9.4% on pypi.org
laonlp 1.3.0 💰
Lao Natural Language Processing library
19 versions - Latest release: 2 months ago - 1 dependent package - 4 dependent repositories - 3.11 thousand downloads last month - 34 stars on GitHub - 1 maintainer
thaixtransformers 0.1.0
ThaiXtransformers: Use Pretraining RoBERTa based Thai language models from VISTEC-depa AI Researc...
1 version - Latest release: over 2 years ago - 40 downloads last month - 7 stars on GitHub - 1 maintainer
punjabi-stemmer 1.0.1
A Python library for stemming Punjabi language words, including preprocessing for noise removal.
2 versions - Latest release: almost 2 years ago - 29 downloads last month - 2 stars on GitHub - 1 maintainer
auto-mapper 0.1.2
An auto mapper that accepts a list of string and a list of objects of the format {'code', 'name'}...
2 versions - Latest release: over 5 years ago - 1 dependent repositories - 137 downloads last month - 1 maintainer
geomentions 0.0.1
A mini Python package for geotagging text and retrieving location info.
1 version - Latest release: about 1 year ago - 4 downloads last month - 1 stars on GitHub - 1 maintainer
huspacy-nightly 0.11.0.dev261 💰
HuSpaCy: industrial strength Hungarian natural language processing
126 versions - Latest release: about 2 years ago - 1 dependent repositories - 56 downloads last month - 155 stars on GitHub - 1 maintainer
durak-nlp 0.2.4
Durak: modular Turkish NLP preprocessing toolkit.
6 versions - Latest release: 4 months ago - 76 downloads last month - 0 stars on GitHub - 1 maintainer
num2geotext 0.0.1
A Python package for converting numbers and floats (up to 15 digits) into Georgian text,
1 version - Latest release: over 1 year ago - 10 downloads last month - 1 maintainer
Related Keywords
natural language processing 42 nlp 42 NLP 25 python 22 text analytics 20 localization 19 computational linguistics 17 Thai language 11 natural-language-processing 10 text 8 machine-learning 8 ThaiNLP 8 Thai NLP 7 regex 7 nlp-library 7 linguistics 7 text-processing 7 text mining 7 tokenization 6 ai 6 data science 6 parser 5 llm 5 text analysis 5 sentence boundary detection 5 information extraction 5 thai-nlp 5 thai-language 5 thai 5 text-mining 5 hacktoberfest 5 math 4 utilities 4 language 4 text normalization 4 spacy 4 information retrieval 4 data analysis 4 thai-nlp-library 4 parsing 4 python3 4 lemmatization 4 data extraction 3 topic-modeling 3 named entity recognition 3 units 3 tagging 3 language processing 3 validation 3 machine learning 3 statistics 3 paragraph detection 3 information-extraction 3 Thai 3 normalization 3 structured data 3 regular expressions 3 rag 3 preprocessing 3 cli 3 xml 3 search 3 universal-dependencies 2 markup 2 text cleaning 2 topic-models 2 pythainlp 2 bible 2 word-embeddings 2 discourse analysis 2 network analysis 2 citation analysis 2 data sculpting 2 large language model 2 clustering 2 unstructured data 2 textmining 2 data transformation 2 pipeline 2 text to structured data 2 zotero 2 sbd 2 sentence splitting 2 pos tagging 2 science 2 i18n 2 PDF 2 ner 2 cleaning 2 artificial intelligence 2 speech recognition 2 word embeddings 2 word vectors 2 spacy model 2 dependency-parsing 2 hungarian 2 hunlp 2 morphological-analysis 2 named-entity-recognition 2 pos-tagger 2