An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "text-processing" keyword

View the packages on the pypi.org package registry that are tagged with the "text-processing" keyword.

Top 1.9% on pypi.org
pythainlp 5.1.1
Thai Natural Language Processing library
113 versions - Latest release: 19 days ago - 37 dependent packages - 183 dependent repositories - 357 thousand downloads last month - 1,026 stars on GitHub - 2 maintainers
wikiwho 1.0.3
An algorithm to identify authorship and editor interactions in Wiki revisioned content.
1 version - Latest release: about 6 years ago - 1 dependent repositories - 69 downloads last month - 31 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
pyarabic 0.6.15 πŸ’°
Arabic text tools for Python
18 versions - Latest release: almost 3 years ago - 7 dependent packages - 34 dependent repositories - 481 thousand downloads last month - 439 stars on GitHub - 1 maintainer
Top 1.1% on pypi.org
pyparsing 3.2.3 πŸ’°
pyparsing module - Classes and methods to define and execute parsing grammars
81 versions - Latest release: 25 days ago - 1,663 dependent packages - 264,180 dependent repositories - 154 million downloads last month - 2,216 stars on GitHub - 1 maintainer
dygest 0.6.1
DYGEST: Document Insights Generator
7 versions - Latest release: 1 day ago - 264 downloads last month - 3 stars on GitHub - 1 maintainer
pyregexp 0.3.1
Simple regex library
13 versions - Latest release: almost 3 years ago - 1 dependent repositories - 555 downloads last month - 12 stars on GitHub - 1 maintainer
kkltk 1.0
kkltk is a toolkit designed for Kinyarwanda and Kirundi languages processing
1 version - Latest release: over 4 years ago - 1 dependent repositories - 29 downloads last month - 1 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
pymupdf 1.25.5
A high performance Python library for data extraction, analysis, conversion & manipulation of PDF...
131 versions - Latest release: 18 days ago - 206 dependent packages - 1,798 dependent repositories - 8.36 million downloads last month - 6,889 stars on GitHub - 1 maintainer
pdfautonup 1.11.0
Convert PDF files to 'n-up' PDF files, guessing the output layout.
23 versions - Latest release: 4 months ago - 1 dependent package - 1 dependent repositories - 946 downloads last month - 6,889 stars on GitHub - 1 maintainer
aqpymupdf 1.23.7
A high performance Python library for data extraction, analysis, conversion & manipulation of PDF...
1 version - Latest release: about 1 year ago - 67 downloads last month - 6,889 stars on GitHub - 1 maintainer
pdfmod 0.1.5
A tool for PDF file manipulation.
1 version - Latest release: 5 months ago - 62 downloads last month - 6,368 stars on GitHub - 1 maintainer
huggingface-text-data-analyzer 1.1.0
A comprehensive tool for analyzing text datasets from HuggingFace's datasets library
3 versions - Latest release: 4 months ago - 122 downloads last month - 6 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
konoha 5.5.6 πŸ’°
Add your description here
28 versions - Latest release: 11 months ago - 3 dependent packages - 134 dependent repositories - 92.6 thousand downloads last month - 241 stars on GitHub - 1 maintainer
oneai-stage 0.0.1
NLP as a Service
1 version - Latest release: almost 3 years ago - 1 dependent repositories - 57 downloads last month - 38 stars on GitHub - 1 maintainer
textform 0.11.0
A text shaping package.
7 versions - Latest release: over 3 years ago - 1 dependent repositories - 293 downloads last month - 7 stars on GitHub - 1 maintainer
turkish-twitter-preprocess 0.0.7
a light-weight python package to pre-process turkish twitter statuses(tweets).
7 versions - Latest release: about 4 years ago - 1 dependent repositories - 314 downloads last month - 7 stars on GitHub - 1 maintainer
cascajal 0.1
PDF generation for Hieroglyph presentations.
1 version - Latest release: about 11 years ago - 2 dependent repositories - 43 downloads last month - 1 stars on GitHub - 1 maintainer
texturizer 0.1.9
Python command line application to add text features to a CSV or TSV dataset.
8 versions - Latest release: about 3 years ago - 1 dependent repositories - 227 downloads last month - 4 stars on GitHub - 1 maintainer
neviseh 0.1.3
Simple text processing tools in persian
3 versions - Latest release: about 7 years ago - 1 dependent repositories - 67 downloads last month - 0 stars on GitHub - 1 maintainer
diff-match-patch-cython 20121119
The Diff Match and Patch libraries offer robust algorithms to perform the operations required for...
1 version - Latest release: over 9 years ago - 2 dependent repositories - 27 downloads last month - 6,733 stars on GitHub - 2 maintainers
odin-ai 1.2.5
Deep learning for research and production
6 versions - Latest release: almost 5 years ago - 1 dependent repositories - 200 downloads last month - 22 stars on GitHub - 1 maintainer
cutters 0.1.4
A rule based sentence segmentation library.
5 versions - Latest release: almost 2 years ago - 1 dependent repositories - 875 downloads last month - 13 stars on GitHub - 1 maintainer
embedisualization 0.4
Visualization of text embeddings/vectorization with clustering
4 versions - Latest release: almost 7 years ago - 1 dependent repositories - 162 downloads last month - 2 stars on GitHub - 1 maintainer
blabla 0.2.2
Novoic linguistics feature extraction package.
4 versions - Latest release: over 4 years ago - 136 downloads last month - 32 stars on GitHub - 1 maintainer
Top 7.8% on pypi.org
humanreadable 0.4.0 πŸ’°
humanreadable is a Python library to convert human-readable values to other units.
13 versions - Latest release: over 1 year ago - 3 dependent packages - 34 dependent repositories - 58.7 thousand downloads last month - 15 stars on GitHub - 1 maintainer
nlp-pipeline 0.1.5
Pipelines and management structure for NLP analysis of a corpus of texts
43 versions - Latest release: 3 days ago - 765 downloads last month - 3 stars on GitHub - 1 maintainer
pytextrust 0.11.0
Library designed as a python wrapper to unleash Rust text processing power combined with Python
29 versions - Latest release: 7 months ago - 3.51 thousand downloads last month - 1 maintainer
sesdiff 0.3.2
Generates a shortest edit script (Myers' diff algorithm) to indicate how to get from the strings ...
2 versions - Latest release: 6 months ago - 1.46 thousand downloads last month - 7 stars on GitHub - 1 maintainer
epubparser 0.1.7
Parse ebooks, extracts chapters and contents.
8 versions - Latest release: about 1 month ago - 365 downloads last month - 1 stars on GitHub - 1 maintainer
russian-names 0.1.2
Russian names generator
1 version - Latest release: almost 6 years ago - 2 dependent repositories - 1.63 thousand downloads last month - 24 stars on GitHub - 1 maintainer
paxter 0.6.11
Paxter is a document-first, text pre-processing mini-language toolchain, loosely inspired by @-ex...
20 versions - Latest release: over 4 years ago - 1 dependent repositories - 719 downloads last month - 5 stars on GitHub - 1 maintainer
nlp-preprocessing 0.2.0
A Package for text preprocessing
14 versions - Latest release: over 4 years ago - 1 dependent repositories - 471 downloads last month - 16 stars on GitHub - 1 maintainer
docdeid 1.0.0
Create your own document de-identifier using docdeid, a simple framework independent of language ...
25 versions - Latest release: over 1 year ago - 1 dependent package - 1 dependent repositories - 2.02 thousand downloads last month - 3 stars on GitHub - 1 maintainer
dict-fr-au-dela 2021.9.9
EDITABLE French dictionaries from Laboratoire d'Automatique Documentaire et Linguistique (LADL)
2 versions - Latest release: over 3 years ago - 1 dependent repositories - 264 downloads last month - 5 stars on GitHub - 1 maintainer
textnorm 1.2
A simple package for normalizing whitespace and Unicode composition forms in Python 3 strings
2 versions - Latest release: almost 5 years ago - 1 dependent package - 3 dependent repositories - 1.92 thousand downloads last month - 6 stars on GitHub - 1 maintainer
betterletter 1.2.1
Substitute alternative spellings of native characters (e.g. German umlauts [ae, oe, ue] etc. [ss]...
9 versions - Latest release: almost 2 years ago - 1 dependent repositories - 363 downloads last month - 11 stars on GitHub - 1 maintainer
ssai 1.0.10
SuperSummarizeAI is a versatile Python tool designed to extract and summarize textual content. Wh...
9 versions - Latest release: over 1 year ago - 299 downloads last month - 12 stars on GitHub - 1 maintainer
Top 10.0% on pypi.org
oneai 0.9.89
NLP as a Service
119 versions - Latest release: over 1 year ago - 2 dependent repositories - 3.46 thousand downloads last month - 37 stars on GitHub - 1 maintainer
python-recode 0.2
A Python extension to convert files between character sets
2 versions - Latest release: over 1 year ago - 2 dependent repositories - 22 downloads last month - 0 stars on GitHub - 1 maintainer
codetypo 2.3.0
Fix common misspellings in text files
3 versions - Latest release: 7 months ago - 174 downloads last month - 0 stars on GitHub - 1 maintainer
cso-classifier 2.3.2
A light-weight Python app for classifying scientific documents with the topics from the Computer ...
7 versions - Latest release: over 5 years ago - 1 dependent repositories - 326 downloads last month - 83 stars on GitHub - 1 maintainer
Top 9.1% on pypi.org
twitter-text-parser 3.0.0
A library to parse or validate Twitter texts properly
7 versions - Latest release: almost 2 years ago - 11 dependent repositories - 24.6 thousand downloads last month - 28 stars on GitHub - 1 maintainer
magicconvert 0.1.0
MagicConvert is a Python library that converts various document formats (PDF, DOCX, XLSX, PPTX, H...
2 versions - Latest release: 2 months ago - 106 downloads last month - 1 stars on GitHub - 1 maintainer
kreuzberg 3.1.3
A text extraction library supporting PDFs, images, office documents and more
19 versions - Latest release: 9 days ago - 6.53 thousand downloads last month - 1,736 stars on GitHub - 1 maintainer
autocorpus 1.1.0
A tool to standardise text and table data extracted from full text publications.
1 version - Latest release: 3 months ago - 60 downloads last month - 21 stars on GitHub - 1 maintainer
chonkie 1.0.2
πŸ¦› CHONK your texts with Chonkie ✨ - The no-nonsense chunking library
27 versions - Latest release: 11 days ago - 48.7 thousand downloads last month - 146 stars on GitHub - 2 maintainers
matcher-py 0.5.7
A high-performance matcher designed to solve LOGICAL and TEXT VARIATIONS problems in word matchin...
38 versions - Latest release: about 1 month ago - 9.89 thousand downloads last month - 15 stars on GitHub - 1 maintainer
contextgem 0.1.1
Easier and faster way to build LLM extraction workflows through powerful abstractions
3 versions - Latest release: 12 days ago - 338 downloads last month - 35 stars on GitHub - 1 maintainer
doc2term 0.1
A fast NLP tokenizer that detects tokens and remove duplications and punctuations
1 version - Latest release: almost 4 years ago - 1 dependent repositories - 33 downloads last month - 2 stars on GitHub - 1 maintainer
bloatectomy 0.0.12
Bloatectomy: a method for the identification and removal of duplicate text in the bloated notes o...
12 versions - Latest release: almost 5 years ago - 1 dependent repositories - 523 downloads last month - 36 stars on GitHub - 2 maintainers
textvec 1.0.1
Supervised text features extraction
3 versions - Latest release: almost 7 years ago - 1 dependent repositories - 120 downloads last month - 193 stars on GitHub - 1 maintainer
text-validator 0.3 πŸ’°
pluggable command-line tool for validating the formatting and orthography of text files
3 versions - Latest release: over 5 years ago - 5 dependent repositories - 130 downloads last month - 5 stars on GitHub - 1 maintainer
retexto 1.6.1
Fast text processing
30 versions - Latest release: over 4 years ago - 1 dependent repositories - 513 downloads last month - 0 stars on GitHub - 1 maintainer
dhelp 0.0.5
DH Python tools for scraping web pages, pre-processing data, and performing nlp analysis quickly.
4 versions - Latest release: almost 7 years ago - 1 dependent repositories - 135 downloads last month - 5 stars on GitHub - 1 maintainer
extract-drugs 1.3.0
A CLI for extracting drugs from text records
6 versions - Latest release: 12 months ago - 352 downloads last month - 3 stars on GitHub - 1 maintainer
Top 9.0% on pypi.org
fuzzychinese 0.1.5
A small package to fuzzy match chinese words δΈ­ζ–‡ζ¨‘η³ŠεŒΉι…
3 versions - Latest release: almost 6 years ago - 2 dependent repositories - 642 downloads last month - 85 stars on GitHub - 1 maintainer
akin 1.0.1
Akin is a Python library for detecting near-duplicate texts using min-hashing and locality sensit...
3 versions - Latest release: about 2 months ago - 1 dependent repositories - 163 downloads last month - 8 stars on GitHub - 1 maintainer
mosheh 1.3.4
Mosheh, a tool for creating docs for projects, from Python to Python.
10 versions - Latest release: 3 months ago - 380 downloads last month - 7 stars on GitHub - 1 maintainer
python-ucto 0.6.9
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost a...
24 versions - Latest release: 4 months ago - 1 dependent package - 4 dependent repositories - 3.59 thousand downloads last month - 29 stars on GitHub - 1 maintainer
rb-tocase 1.3.2 πŸ’°
RB toCase is a Case converter.
2 versions - Latest release: over 3 years ago - 1 dependent repositories - 104 downloads last month - 3 stars on GitHub - 1 maintainer
greek-normalisation 0.5.1 πŸ’°
Python 3 utilities for validating and normalising Ancient Greek text
6 versions - Latest release: almost 5 years ago - 4 dependent repositories - 271 downloads last month - 22 stars on GitHub - 1 maintainer
fsub 1.0.4
CLI SubRip editor
18 versions - Latest release: over 3 years ago - 1 dependent repositories - 807 downloads last month - 0 stars on GitHub - 1 maintainer
Top 8.0% on pypi.org
tiny-tokenizer 3.4.0 πŸ’°
🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with...
19 versions - Latest release: over 4 years ago - 16 dependent repositories - 428 downloads last month - 214 stars on GitHub - 1 maintainer
disseminate 2.3.9
A document processor and generation engine
9 versions - Latest release: almost 4 years ago - 1 dependent repositories - 349 downloads last month - 5 stars on GitHub - 2 maintainers
smoothtext 0.3.2
A Python library for text readability analysis, supporting multiple languages.
19 versions - Latest release: about 1 month ago - 540 downloads last month - 1 stars on GitHub - 1 maintainer
stramp 0.3.2
Blockchain-backed timestamp proof for structured document sections
3 versions - Latest release: almost 5 years ago - 1 dependent repositories - 104 downloads last month - 4 stars on GitHub - 1 maintainer
strkernel 0.2
Collection of string kernels
1 version - Latest release: over 6 years ago - 1 dependent repositories - 44 downloads last month - 17 stars on GitHub - 1 maintainer
colibricore 2.5.9
Colibri Core is an NLP tool as well as a C++ and Python library (all included in this package) fo...
38 versions - Latest release: almost 2 years ago - 6 dependent repositories - 1.05 thousand downloads last month - 126 stars on GitHub - 1 maintainer
yurenizer 0.2.2
A library for standardizing terms with spelling variations using a synonym dictionary.
18 versions - Latest release: 4 months ago - 647 downloads last month - 1 stars on GitHub - 1 maintainer
maleo 0.0.5
Wrapper library for text cleansing, preprocessing in NLP
7 versions - Latest release: over 4 years ago - 1 dependent repositories - 125 downloads last month - 17 stars on GitHub - 1 maintainer
thainlp 0.4.2
Thai NLP library
3 versions - Latest release: almost 6 years ago - 1 dependent repositories - 85 downloads last month - 1,026 stars on GitHub - 1 maintainer
Top 2.6% on pypi.org
jaconv 0.4.0 πŸ’°
Pure-Python Japanese character interconverter for Hiragana, Katakana, Hankaku, Zenkaku and more
12 versions - Latest release: 9 months ago - 25 dependent packages - 198 dependent repositories - 1.25 million downloads last month - 325 stars on GitHub - 1 maintainer
dom-tree-sitter-language-pack 0.4.0
Extensive Language Pack for Tree-Sitter
1 version - Latest release: 2 months ago - 91 downloads last month - 104 stars on GitHub - 1 maintainer
tree-sitter-language-pack 0.7.1
Extensive Language Pack for Tree-Sitter
12 versions - Latest release: 10 days ago - 277 thousand downloads last month - 100 stars on GitHub - 1 maintainer
finglish3 1.4.8
Finglish-to-Persian converter.
1 version - Latest release: almost 7 years ago - 1 dependent repositories - 31 downloads last month - 83 stars on GitHub - 1 maintainer
finglish 1.5.1
Finglish-to-Persian converter.
22 versions - Latest release: almost 5 years ago - 1 dependent repositories - 522 downloads last month - 83 stars on GitHub - 1 maintainer
wetextprocessing 1.0.3
WeTextProcessing, including TN & ITN
28 versions - Latest release: 10 months ago - 2 dependent packages - 335 thousand downloads last month - 562 stars on GitHub - 2 maintainers
textmining3 1.1.0
Text Mining Utilities for Python 3
2 versions - Latest release: over 6 years ago - 1 dependent repositories - 223 downloads last month - 1 stars on GitHub - 1 maintainer
de-workflow 0.2.2
A ToolBox for fuzzily extracting drugs mentions from text.
12 versions - Latest release: almost 3 years ago - 1 dependent repositories - 63 downloads last month - 3 stars on GitHub - 1 maintainer
nostril-detector 1.2.2
Nonsense String Evaluator
4 versions - Latest release: about 1 year ago - 16.5 thousand downloads last month - 194 stars on GitHub - 1 maintainer
trunajod 0.1.1
A python lib for readability analyses.
3 versions - Latest release: about 4 years ago - 1 dependent repositories - 247 downloads last month - 29 stars on GitHub - 1 maintainer
pyrefo 0.4
a fast regex for object
4 versions - Latest release: about 5 years ago - 1 dependent repositories - 149 downloads last month - 0 stars on GitHub - 1 maintainer
nepalikit 1.0.2
A Nepali language processing library
3 versions - Latest release: 9 months ago - 181 downloads last month - 7 stars on GitHub - 1 maintainer
l3wtransformer 0.3.0
A word hashing method based on vectors of letter n-grams. Currently transforms text into sequence...
2 versions - Latest release: over 6 years ago - 1 dependent repositories - 45 downloads last month - 10 stars on GitHub - 1 maintainer
pytokencounter 1.7.0
A Python library for tokenizing text and counting tokens using various encoding schemes.
16 versions - Latest release: about 1 month ago - 667 downloads last month - 2 stars on GitHub - 1 maintainer
tregex-tobiasli 1.0.3
Wrapper for more functionality out of regex parse results.
4 versions - Latest release: over 5 years ago - 2 dependent repositories - 116 downloads last month - 0 stars on GitHub - 1 maintainer
primetext 0.2.2
package for indexing text datasets using prime number factorisation for fast word frequency analysis
4 versions - Latest release: over 8 years ago - 1 dependent repositories - 70 downloads last month - 4 stars on GitHub - 1 maintainer
rs-bpe 0.1.0
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
1 version - Latest release: about 1 month ago - 1.91 thousand downloads last month - 1 stars on GitHub - 1 maintainer
Top 9.4% on pypi.org
puristaa 2022.7.24
Puristaa (Finnish for compress) - shared prefix compression of ordered string sequences.
3 versions - Latest release: over 2 years ago - 1 dependent package - 2 dependent repositories - 1.15 thousand downloads last month - 2 stars on GitHub - 1 maintainer
indoxminer 0.1.5
Indox Data Extraction
19 versions - Latest release: 2 months ago - 632 downloads last month - 18 stars on GitHub - 2 maintainers
html5lib-truncation 0.1.0
Truncating HTML with html5lib filter
1 version - Latest release: about 10 years ago - 5 dependent repositories - 428 downloads last month - 11 stars on GitHub - 1 maintainer
natsulang 1.0.0b11
A text-processing language based on Python 3.
10 versions - Latest release: over 4 years ago - 1 dependent repositories - 637 downloads last month - 8 stars on GitHub - 1 maintainer
textweaver 0.3.115
A FastAPI-based web server for working with LLMs, embedding models, and Pinecone Vector DB.
58 versions - Latest release: over 1 year ago - 1.82 thousand downloads last month - 2 stars on GitHub - 1 maintainer
mrsnippets 2.0.1
A complete collection of commonly used code Snippets in Python
6 versions - Latest release: almost 4 years ago - 1 dependent repositories - 209 downloads last month - 2 stars on GitHub - 1 maintainer
prenlp 0.0.13
Preprocessing Library for Natural Language Processing
12 versions - Latest release: over 4 years ago - 1 dependent repositories - 315 downloads last month - 159 stars on GitHub - 1 maintainer
texy 0.1.0
Supercharge text processing
5 versions - Latest release: over 2 years ago - 403 downloads last month - 0 stars on GitHub - 1 maintainer
mime-py 0.3.0
A text processing framework, inspired by Emacs lisp and keyboard macros.
1 version - Latest release: about 2 years ago - 28 downloads last month - 7 stars on GitHub - 1 maintainer
Top 2.3% on pypi.org
hazm 0.10.0
Persian NLP Toolkit
15 versions - Latest release: over 1 year ago - 8 dependent packages - 126 dependent repositories - 15.5 thousand downloads last month - 1,121 stars on GitHub - 1 maintainer
cinje 1.1.2
A Pythonic and ultra fast template engine DSL.
4 versions - Latest release: about 6 years ago - 4 dependent repositories - 298 downloads last month - 31 stars on GitHub - 1 maintainer
shekar 0.1.10
Simplifying Persian NLP for Everyone
9 versions - Latest release: about 1 month ago - 382 downloads last month - 3 stars on GitHub - 1 maintainer