An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "text processing" keyword

View the packages on the pypi.org package registry that are tagged with the "text processing" keyword.

Top 1.9% on pypi.org
pythainlp 5.1.2
Thai Natural Language Processing library
114 versions - Latest release: 3 months ago - 37 dependent packages - 183 dependent repositories - 658 thousand downloads last month - 1,053 stars on GitHub - 2 maintainers
lingualab 3.5.9
A multilingual text and voice processing toolkit
9 versions - Latest release: about 7 hours ago - 390 downloads last month - 1 maintainer
thai2transformers 0.1.2
Pretraining transformer based Thai language models
8 versions - Latest release: over 4 years ago - 1 dependent repositories - 56 downloads last month - 114 stars on GitHub - 1 maintainer
arabicscript 0.1.4
Tools for Arabic script
4 versions - Latest release: almost 9 years ago - 1 dependent repositories - 12 downloads last month - 8 stars on GitHub - 1 maintainer
Top 6.1% on pypi.org
wordaxe 1.0.1
Provide hyphenation for python programs and ReportLab paragraphs.
5 versions - Latest release: almost 2 years ago - 1 dependent package - 4 dependent repositories - 1 maintainer
auto-mapper 0.1.2
An auto mapper that accepts a list of string and a list of objects of the format {'code', 'name'}...
2 versions - Latest release: about 5 years ago - 1 dependent repositories - 84 downloads last month - 1 maintainer
Top 1.0% on pypi.org
textacy 0.13.0
NLP, before and after spaCy
32 versions - Latest release: over 2 years ago - 18 dependent packages - 436 dependent repositories - 30.5 thousand downloads last month - 2,214 stars on GitHub - 1 maintainer
tainlp 0.0.1.dev0
Tai Natural Language Processing library
1 version - Latest release: over 2 years ago - 9 downloads last month - 1 maintainer
freq-frame 1.0.0
1 version - Latest release: over 4 years ago - 1 dependent repositories - 3 downloads last month - 1 maintainer
mathstring 0.1.0
English:
1 version - Latest release: about 2 months ago - 17 downloads last month - 1 maintainer
data-filtering 0.1.21
A library to filter and deduplicate Q&A text datasets from CSV files.
4 versions - Latest release: about 1 month ago - 31 downloads last month - 1 maintainer
Top 3.0% on pypi.org
quantulum3 0.9.2 💰
Extract quantities from unstructured text.
41 versions - Latest release: about 1 year ago - 8 dependent packages - 44 dependent repositories - 134 thousand downloads last month - 142 stars on GitHub - 1 maintainer
tex-untag 1.3.0
A script for removing all of a given markup tag from a set of TeX files.
6 versions - Latest release: over 3 years ago - 1 dependent repositories - 29 downloads last month - 1 stars on GitHub - 1 maintainer
quantulum 0.1.16
Extract quantities from unstructured text.
17 versions - Latest release: almost 2 years ago - 1 dependent package - 4 dependent repositories - 38 downloads last month - 119 stars on GitHub - 1 maintainer
aiwand 0.4.24
A simple AI toolkit for text processing using OpenAI and Gemini APIs
31 versions - Latest release: 3 days ago - 2.13 thousand downloads last month - 0 stars on GitHub - 1 maintainer
thaixtransformers 0.1.0
ThaiXtransformers: Use Pretraining RoBERTa based Thai language models from VISTEC-depa AI Researc...
1 version - Latest release: about 2 years ago - 70 downloads last month - 7 stars on GitHub - 1 maintainer
nlp-helper 0.0.6
A small collection of NLP utility functions.
6 versions - Latest release: about 1 month ago - 93 downloads last month - 1 maintainer
hanpud 0.1.dev0
Han Pud (ห่าน พูด): Thai super large generative model
1 version - Latest release: about 2 years ago - 10 downloads last month - 1 stars on GitHub - 1 maintainer
ingredient-slicer 1.2.21
Parses unstructured recipe ingredient text into standardized quantities, units, and foods
47 versions - Latest release: 3 months ago - 335 downloads last month - 1 maintainer
sofairfilter 1.0.0
Tool for identifying candidate documents for software mention extraction.
1 version - Latest release: 3 days ago
minification-station 0.1.4
Designed to process and combine multiple files within a specified directory into a single output ...
5 versions - Latest release: 10 months ago - 11 downloads last month - 0 stars on GitHub - 1 maintainer
huspacy-nightly 0.11.0.dev261 💰
HuSpaCy: industrial strength Hungarian natural language processing
126 versions - Latest release: over 1 year ago - 1 dependent repositories - 281 downloads last month - 155 stars on GitHub - 1 maintainer
texterra 1.0.1
API for natural language processing.
2 versions - Latest release: over 7 years ago - 2 dependent repositories - 20 downloads last month - 1 maintainer
processtext 0.1.7
An open-source python package to process text data
10 versions - Latest release: over 1 year ago - 13 downloads last month - 4 stars on GitHub - 1 maintainer
flexi-nlp-tools 0.5.5
NLP toolkit based on the flexi-dict data structure, designed for efficient fuzzy search, with a f...
20 versions - Latest release: 6 months ago - 75 downloads last month - 1 maintainer
ai-data-preprocessing-queue 1.6.0
Can be used to pre process data before ai processing
5 versions - Latest release: 6 months ago - 338 downloads last month - 1 maintainer
anpe 1.1.3
Accurately extract complete noun phrases with customisation and strctural output.
13 versions - Latest release: 2 months ago - 183 downloads last month - 0 stars on GitHub - 1 maintainer
lekcut 0.1
LEKCut (เล็ก คัด) is a Thai tokenization library that ports the deep learning model to the onnx m...
1 version - Latest release: over 2 years ago - 19 downloads last month - 7 stars on GitHub - 1 maintainer
sotastream 1.0.1
Sotastream is a command line tool that augments a batch of text and produces infinite stream of r...
2 versions - Latest release: almost 2 years ago - 18 downloads last month - 20 stars on GitHub - 3 maintainers
spacy-pythainlp 0.1
PyThaiNLP For spaCy
9 versions - Latest release: over 2 years ago - 1 dependent repositories - 327 downloads last month - 13 stars on GitHub - 1 maintainer
Top 9.4% on pypi.org
laonlp 1.2.0 💰
Lao Natural Language Processing library
18 versions - Latest release: about 1 year ago - 1 dependent package - 4 dependent repositories - 1.32 thousand downloads last month - 31 stars on GitHub - 1 maintainer
Top 7.9% on pypi.org
nemo-text-processing 1.1.0
NeMo text processing for ASR and TTS
14 versions - Latest release: 11 months ago - 2 dependent packages - 1 dependent repositories - 39.6 thousand downloads last month - 274 stars on GitHub - 1 maintainer
sculpt 0.1.35
Sculpt: Structuring unstructured data with LLMs
3 versions - Latest release: 3 months ago - 36 downloads last month - 33 stars on GitHub - 2 maintainers
pewanalytics 1.1.1
Utilities for text processing and statistical analysis from Pew Research Center
5 versions - Latest release: over 3 years ago - 1 dependent repositories - 35 downloads last month - 84 stars on GitHub - 1 maintainer
regexa 0.1.1
A modern, full-featured regex library for Python
2 versions - Latest release: 8 months ago - 11 downloads last month - 0 stars on GitHub - 1 maintainer
artless-template 0.6.2
The artless and minimalist templating for Python server-side rendering.
11 versions - Latest release: 7 days ago - 173 downloads last month - 6,288 stars on GitHub - 1 maintainer
sculptor 0.2.0
Sculptor: Structuring unstructured data with LLMs
7 versions - Latest release: 3 months ago - 272 downloads last month - 33 stars on GitHub - 1 maintainer
word2num-converter 1.0.0
A Python library to convert number words (e.g., twenty-one) to numeric digits (e.g., 21) for Engl...
2 versions - Latest release: 2 months ago - 0 stars on GitHub - 1 maintainer
hassans-frame 0.0.0
1 version - Latest release: over 4 years ago - 1 dependent repositories - 3 downloads last month - 1 maintainer
multiel 0.5
Multilingual Entity Linking model by BELA model
5 versions - Latest release: about 2 years ago - 1 dependent package - 135 downloads last month - 12 stars on GitHub - 1 maintainer
intertext 0.0.1
tools for relational discourse analysis
1 version - Latest release: over 7 years ago - 1 dependent repositories - 9 downloads last month - 1 maintainer
pug 0.1.22
Meta package to install the PDX Python User Group utilities.
11 versions - Latest release: over 10 years ago - 8 dependent repositories - 46 downloads last month - 12 stars on GitHub - 1 maintainer
fixthaipdf 0.2.1 💰
Fix Thai PDF Text
3 versions - Latest release: over 1 year ago - 240 downloads last month - 33 stars on GitHub - 1 maintainer
num2geotext 0.0.1
A Python package for converting numbers and floats (up to 15 digits) into Georgian text,
1 version - Latest release: 10 months ago - 15 downloads last month - 1 maintainer
qante 0.0.5
qante - Query ANnotated TExt
5 versions - Latest release: almost 2 years ago - 10 downloads last month - 5 stars on GitHub - 1 maintainer
doti18n 0.3.0
Python library for loading YAML localizations with dot access and pluralization.
3 versions - Latest release: 9 days ago - 115 downloads last month - 0 stars on GitHub - 1 maintainer
tamilkavi 0.5.0
A command-line tool for exploring Tamil Kavithaigal.
4 versions - Latest release: 3 months ago - 31 downloads last month - 0 stars on GitHub - 1 maintainer
punjabi-stemmer 1.0.1
A Python library for stemming Punjabi language words, including preprocessing for noise removal.
2 versions - Latest release: over 1 year ago - 10 downloads last month - 2 stars on GitHub - 1 maintainer
rebnf 0.9
ReBNF: Regexes for Extended Backus-Naur Form (EBNF)
7 versions - Latest release: about 2 years ago - 14 downloads last month - 0 stars on gitlab.com - 1 maintainer
fast-dedupe 0.1.1
Fast, Minimalist Text Deduplication Library for Python
2 versions - Latest release: 5 months ago - 18 downloads last month - 1 maintainer
jange 0.1.7
Easy NLP library for Python
8 versions - Latest release: almost 4 years ago - 1 dependent repositories - 30 downloads last month - 17 stars on GitHub - 1 maintainer
breame 0.1.2
Breame is a lightweight Python package with a number of tools to aid in the detection of words th...
3 versions - Latest release: almost 4 years ago - 1 dependent repositories - 2.78 thousand downloads last month - 16 stars on GitHub - 1 maintainer
delb 0.5.1
A library that provides an ergonomic model for XML encoded text documents (e.g. with TEI-XML).
31 versions - Latest release: 7 months ago - 2 dependent packages - 4 dependent repositories - 522 downloads last month - 17 stars on GitHub - 1 maintainer
nupunkt 0.5.1
Next-generation Punkt sentence and paragraph boundary detection with zero dependencies
5 versions - Latest release: 4 months ago - 224 downloads last month - 17 stars on GitHub - 1 maintainer
linesieve 1.0
An unholy blend of grep, sed, awk, and Python.
13 versions - Latest release: over 2 years ago - 1 dependent repositories - 108 downloads last month - 9 stars on GitHub - 1 maintainer
pylda2vec 1.0.0
Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec
2 versions - Latest release: over 6 years ago - 1 dependent repositories - 22 downloads last month - 30 stars on GitHub - 1 maintainer
wakong 1.1.1 💰
Wakong: An appropriate and robust masking algorithm for generating the training objective of text...
3 versions - Latest release: almost 3 years ago - 1 dependent repositories - 22 downloads last month - 3 stars on GitHub - 1 maintainer
pyosis 0.2.1
Unofficial Python client for parsing OSIS (Open Scriptural Information Standard) files
4 versions - Latest release: 15 days ago - 40 downloads last month - 0 stars on GitHub - 1 maintainer
docdump 1.0.4
A package to extract text from common document types.
5 versions - Latest release: over 4 years ago - 1 dependent repositories - 11 downloads last month - 0 stars on GitHub - 1 maintainer
fastopic 1.0.1
FASTopic
8 versions - Latest release: about 2 months ago - 1.25 thousand downloads last month - 110 stars on GitHub - 1 maintainer
pypage 2.2.1
Light-weight Python Templating Engine
10 versions - Latest release: 3 months ago - 1 dependent package - 1 dependent repositories - 421 downloads last month - 30 stars on GitHub - 1 maintainer
charboundary 0.5.0
Fast character-based boundary detection for sentence and paragraphs
13 versions - Latest release: 4 months ago - 90 downloads last month - 3 stars on GitHub - 1 maintainer
gatenlp 1.0.8
GATE NLP implementation in Python.
29 versions - Latest release: over 2 years ago - 2 dependent repositories - 3.43 thousand downloads last month - 66 stars on GitHub - 3 maintainers
wordtonumber 1.1.0
A Python library for converting words to numbers.
4 versions - Latest release: 4 months ago - 212 downloads last month - 1 stars on GitHub - 1 maintainer
codextextpipe 0.1.3
All-in-one tool for text processing
2 versions - Latest release: 3 months ago - 21 downloads last month - 0 stars on GitHub - 1 maintainer
pythaisa 0.2.1
Python Thai Sentiment Analysis
2 versions - Latest release: almost 6 years ago - 1 dependent repositories - 18 downloads last month - 14 stars on GitHub - 1 maintainer
hadal 0.0.3
Tool for mining/alignment parallel texts
3 versions - Latest release: over 1 year ago - 21 downloads last month - 6 stars on GitHub - 1 maintainer
thaitextaug 0.0.4 💰
Thai Text Augmentation
15 versions - Latest release: about 4 years ago - 1 dependent repositories - 52 downloads last month - 5 stars on GitHub - 1 maintainer
khamyo 0.3.0 💰
Thai abbreviation to full text library
4 versions - Latest release: 11 months ago - 1 dependent package - 1 dependent repositories - 352 downloads last month - 6 stars on GitHub - 1 maintainer
lttl 2.1.0
LangTech Text Library (LTTL) for text processing and analysis
24 versions - Latest release: 6 months ago - 1 dependent repositories - 7.31 thousand downloads last month - 3 stars on GitHub - 1 maintainer
cleantextkit 0.1.1
A preprocessor which performs operations of lowering text, removing special characters and removi...
2 versions - Latest release: almost 2 years ago - 22 downloads last month - 1 maintainer
printb 1.0.2 💰
printb is a wrapper for print/input built-ins, that swaps string directions for BIDI languages.
1 version - Latest release: almost 4 years ago - 1 dependent repositories - 11 downloads last month - 0 stars on GitHub - 1 maintainer
geomentions 0.0.1
A mini Python package for geotagging text and retrieving location info.
1 version - Latest release: 5 months ago - 10 downloads last month - 1 stars on GitHub - 1 maintainer
ttg 0.1.dev3
Thai Text Generator library
3 versions - Latest release: about 5 years ago - 1 dependent repositories - 39 downloads last month - 4 stars on GitHub - 1 maintainer
bibleparser 0.0.2
Parse a mistranscribed dictated bible reference into a standard format
2 versions - Latest release: 9 months ago - 28 downloads last month - 0 stars on GitHub - 1 maintainer
Top 9.4% on pypi.org
wordcloud-fa 0.1.10 💰
A wrapper for wordcloud module for creating persian (and other rtl languages) word cloud.
10 versions - Latest release: almost 3 years ago - 6 dependent repositories - 144 downloads last month - 145 stars on GitHub - 1 maintainer
zalgolib 0.2.2
A Python library for a _FULL_ Zalgo experience
4 versions - Latest release: over 1 year ago - 1 dependent package - 1 dependent repositories - 15.1 thousand downloads last month - 5 stars on GitHub - 1 maintainer
gatenlp-ml-tner 0.1.0a1
Train and use transformer token classification models using tner
1 version - Latest release: about 3 years ago - 11 downloads last month - 0 stars on GitHub - 1 maintainer
markdowncleaner 0.2.0
A tool for cleaning and formatting markdown documents
2 versions - Latest release: 5 months ago - 34 downloads last month - 0 stars on GitHub - 1 maintainer
smartscalc 1.0.1
A library to calculate mathematical equations from text input. مكتبة تتيح لك حساب المعادلات الريا...
2 versions - Latest release: about 2 months ago - 228 downloads last month - 0 stars on GitHub - 1 maintainer
nlup 0.8
('Core libraries for natural language processing',)
4 versions - Latest release: over 6 years ago - 11 dependent repositories - 6.03 thousand downloads last month - 10 stars on GitHub - 3 maintainers
Top 9.7% on pypi.org
thai-nner 0.3
Thai Nested Named Entity Recognition
3 versions - Latest release: about 3 years ago - 1 dependent package - 3 dependent repositories - 333 downloads last month - 46 stars on GitHub - 2 maintainers
rdatools 0.1.7
tools for relational discourse analysis
2 versions - Latest release: almost 8 years ago - 1 dependent repositories - 14 downloads last month - 1 maintainer
textdatasetcleaner 0.0.6
Pipeline for cleaning (preprocessing/normalizing) text datasets
4 versions - Latest release: over 4 years ago - 1 dependent repositories - 17 downloads last month - 40 stars on GitHub - 1 maintainer
easynertag 0.2 💰
Easy tagging for annotate NER corpus
2 versions - Latest release: almost 3 years ago - 24 downloads last month - 2 stars on GitHub - 1 maintainer
freqframe 1.0.0
1 version - Latest release: over 4 years ago - 1 dependent repositories - 11 downloads last month - 1 maintainer
thaibraille 0.1.dev2 💰
Thai Braille for Natural Language Processing.
3 versions - Latest release: over 2 years ago - 22 downloads last month - 3 stars on GitHub - 1 maintainer
slaviclean 0.0.6
Text filter designed to cleanse text of profanity and offensive language, specifically tailored f...
6 versions - Latest release: 6 months ago - 19 downloads last month - 1 maintainer
corpuskit 0.1.1
Corpus analysis and processing toolkit
2 versions - Latest release: about 2 months ago - 66 downloads last month - 1 maintainer
Top 3.0% on pypi.org
addheader 0.3.2
A command to manage a header section for a source code tree
9 versions - Latest release: over 2 years ago - 3 dependent packages - 19 dependent repositories - 12.1 thousand downloads last month - 1 stars on GitHub - 1 maintainer
token-distance 0.2.5
Python library designed to perform fuzzy token matching within text documents. Utilizing advanced...
6 versions - Latest release: 4 months ago - 37 downloads last month - 0 stars on gitlab.com - 1 maintainer
pythaitts 0.3.0
Open Source Thai Text-to-speech library in Python
5 versions - Latest release: over 1 year ago - 1 dependent repositories - 218 downloads last month - 27 stars on GitHub - 1 maintainer
fkscore 2.0.1
Flesch Kincaid readability scoring algorithm
7 versions - Latest release: over 1 year ago - 1 dependent repositories - 214 downloads last month - 2 stars on GitHub - 1 maintainer
Top 6.6% on pypi.org
huspacy 0.12.1 💰
HuSpaCy: industrial strength Hungarian natural language processing
23 versions - Latest release: 9 months ago - 1 dependent package - 6 dependent repositories - 1.35 thousand downloads last month - 142 stars on GitHub - 1 maintainer
pyrxg 1.0.0
Regular Expression Generator and Generaliser
1 version - Latest release: 10 months ago - 9 downloads last month - 0 stars on gitlab.com - 1 maintainer
ck-textprocessor 0.0.1 removed
A preprocessor which performs operations of lowering text, removing special characters and removi...
1 version - Latest release: almost 2 years ago - 1 maintainer
Related Keywords
natural language processing 37 nlp 33 NLP 21 text analytics 21 localization 19 python 19 computational linguistics 17 Thai language 11 natural-language-processing 9 ThaiNLP 8 machine-learning 7 nlp-library 7 text 7 Thai NLP 7 text-processing 6 text mining 6 text-mining 5 data science 5 linguistics 5 text analysis 5 thai-nlp 5 thai-language 5 thai 5 hacktoberfest 5 regex 5 thai-nlp-library 4 math 4 sentence boundary detection 4 information extraction 4 python3 4 spacy 4 parsing 4 parser 4 data analysis 3 Thai 3 statistics 3 llm 3 named entity recognition 3 topic-modeling 3 lemmatization 3 tagging 3 information-extraction 3 language 3 search 3 tokenization 3 language processing 3 ai 3 pos-tagger 2 spacy-models 2 spacy-pipeline 2 text preprocessing 2 universal-dependencies 2 syntax 2 text cleaner 2 named-entity-recognition 2 morphological-analysis 2 hunlp 2 hungarian 2 dependency-parsing 2 spacy model 2 word vectors 2 word embeddings 2 ner 2 pos tagging 2 sentence splitting 2 sbd 2 Hungarian 2 huspacy 2 pythainlp 2 nlp tools 2 text to structured data 2 regular expressions 2 discourse analysis 2 network analysis 2 citation analysis 2 textmining 2 zotero 2 bibliometrics 2 scientometrics 2 neural net 2 science 2 bible 2 word-embeddings 2 deep-learning 2 paragraph detection 2 xml 2 clustering 2 text conversion 2 text normalization 2 linguistic tools 2 python-gatenlp 2 gatenlp 2 machine translation 2 machine learning 2 deep learning 2 artificial intelligence 2 tts 2 data sculpting 2 large language model 2 unstructured data 2