Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "tokenization" keyword

Top 0.1% on pypi.org
spacy 3.7.4 💰
Industrial-strength Natural Language Processing (NLP) in Python
206 versions - Latest release: 4 months ago - 873 dependent packages - 15,793 dependent repositories - 13.5 million downloads last month - 28,659 stars on GitHub - 3 maintainers
ciseau 1.0.1
Word and sentence tokenization.
2 versions - Latest release: over 6 years ago - 8 dependent repositories - 124 downloads last month - 13 stars on GitHub - 1 maintainer
Top 6.0% on pypi.org
mosestokenizer 1.2.1
Wrappers for several pre-processing scripts from the Moses toolkit.
5 versions - Latest release: over 2 years ago - 9 dependent packages - 74 dependent repositories - 44.4 thousand downloads last month - 18 stars on GitHub - 1 maintainer
hangul-korean 1.0rc2
Word segmentation for the Korean Language
2 versions - Latest release: over 3 years ago - 27 downloads last month - 1 maintainer
ipa-core 0.1.3
NLP Preprocessing Pipeline Wrappers
4 versions - Latest release: about 1 year ago - 57 downloads last month - 13 stars on GitHub - 1 maintainer
nlpbrl 1.0.1
NLP algorithm integration package
5 versions - Latest release: over 1 year ago - 34 downloads last month - 0 stars on GitHub - 1 maintainer
bigrams 0.1.2
Simply create (N)grams
3 versions - Latest release: over 1 year ago - 46 downloads last month - 6 stars on GitHub - 1 maintainer
textmate-grammar-python 0.5.2
A lexer and tokenizer for grammar files as defined by TextMate and used in VSCode, implemented in...
9 versions - Latest release: 22 days ago - 1 dependent package - 235 downloads last month - 7 stars on GitHub - 1 maintainer
tivars 0.9.0
A library for interacting with TI-(e)z80 (82/83/84 series) calculator files
9 versions - Latest release: about 1 month ago - 102 downloads last month - 10 stars on GitHub - 1 maintainer
dango 0.0.1
An easy to use tokenizer for Japanese text, aimed at language learners and non-linguists
2 versions - Latest release: over 2 years ago - 3 dependent repositories - 302 downloads last month - 12 stars on GitHub - 1 maintainer
Top 5.5% on pypi.org
simplemma 0.9.1
A simple multilingual lemmatizer for Python.
15 versions - Latest release: over 1 year ago - 6 dependent packages - 25 dependent repositories - 10.9 thousand downloads last month - 129 stars on GitHub - 1 maintainer
rusyll 0.1.1
Splitting Russian words into phonetic syllables
1 version - Latest release: almost 4 years ago - 2 dependent repositories - 44 downloads last month - 6 stars on GitHub - 1 maintainer
reason 1.0.7
Natural language processing toolbox
19 versions - Latest release: 7 months ago - 9 dependent repositories - 199 downloads last month - 3 stars on GitHub - 1 maintainer
nlp-preprocessing-wrappers 0.1.3
NLP Preprocessing Pipeline Wrappers
2 versions - Latest release: about 2 years ago - 1 dependent repositories - 14 downloads last month - 1 maintainer
Top 2.4% on pypi.org
zhon 2.0.2
Zhon provides constants used in Chinese text processing.
14 versions - Latest release: 11 months ago - 7 dependent packages - 159 dependent repositories - 60.4 thousand downloads last month - 349 stars on GitHub - 1 maintainer
spag 1.0.0a0
A module containing scanner (regular expression) and parser (BNF) compilers as well as a base gen...
1 version - Latest release: over 5 years ago - 1 dependent repositories - 9 downloads last month - 8 stars on GitHub - 1 maintainer
Top 6.9% on pypi.org
text2text 1.4.4
Text2Text: Crosslingual NLP/G toolkit
142 versions - Latest release: 3 months ago - 4 dependent repositories - 1.08 thousand downloads last month - 274 stars on GitHub - 1 maintainer
sentence-tk-checker 0.0.2
Checks output of an English sentence tokenizer and modifies the output according to default or us...
2 versions - Latest release: almost 2 years ago - 17 downloads last month - 0 stars on GitHub - 1 maintainer
hanzinlp 0.1.0
A NLP package specifically for Chinese
1 version - Latest release: 8 months ago - 29 downloads last month - 15 stars on GitHub - 1 maintainer
Top 3.2% on pypi.org
razdel 0.5.0
Splits russian text into tokens, sentences, section. Rule-based
5 versions - Latest release: about 4 years ago - 7 dependent packages - 105 dependent repositories - 15.8 thousand downloads last month - 244 stars on GitHub - 1 maintainer
rftokenizer 2.2.0
A character-wise tokenizer for morphologically rich languages
5 versions - Latest release: 4 months ago - 1 dependent repositories - 39 downloads last month - 26 stars on GitHub - 1 maintainer
taibun 1.1.2
Taiwanese Hokkien Transliterator and Tokeniser
12 versions - Latest release: 19 days ago - 498 downloads last month - 10 stars on GitHub - 1 maintainer
nlpashto 0.0.23
Pashto Natural Language Processing Toolkit
22 versions - Latest release: 8 months ago - 263 downloads last month - 13 stars on GitHub - 1 maintainer
miditok-for-musiclang 0.0.1
A convenient MIDI tokenizer for Deep Learning networks, with multiple encoding strategies
1 version - Latest release: 6 months ago - 1 dependent package - 11 downloads last month - 1 stars on GitHub - 1 maintainer
mbsp-for-python 1.4
Memory-Based Shallow Parser for Python
2 versions - Latest release: 10 months ago - 1 dependent repositories - 1 maintainer
sic 1.3.3
Utility for string normalization
15 versions - Latest release: over 2 years ago - 4 dependent repositories - 263 downloads last month - 2 stars on GitHub - 1 maintainer
epub-conversion 1.0.15
Python package for converting xml and epubs to text files
14 versions - Latest release: about 4 years ago - 5 dependent repositories - 171 downloads last month - 34 stars on GitHub - 1 maintainer
tokencost 0.1.7
To calculate token and translated USD cost of string and message calls to OpenAI, for example whe...
14 versions - Latest release: 2 months ago - 1 dependent package - 1.49 thousand downloads last month - 190 stars on GitHub - 3 maintainers
pymmseg 1.2.0
pyMMSeg-cpp, a high performance Chinese word segmentation utility.
1 version - Latest release: over 11 years ago - 4 dependent repositories - 18 downloads last month - 190 stars on GitHub - 1 maintainer
Top 2.9% on pypi.org
sudachipy 0.6.8 💰
Python version of Sudachi, the Japanese Morphological Analyzer
39 versions - Latest release: 6 months ago - 23 dependent packages - 74 dependent repositories - 2.09 million downloads last month - 270 stars on GitHub - 2 maintainers
b-labs-models 2017.8.22
Ready to use CRFSuite models for sentence segmentation, tokenization and so on
3 versions - Latest release: almost 7 years ago - 10 dependent repositories - 32 downloads last month - 15 stars on GitHub - 1 maintainer
plane 0.2.1 💰
A lib for text preprocessing
20 versions - Latest release: over 3 years ago - 3 dependent repositories - 236 downloads last month - 11 stars on GitHub - 1 maintainer
example990420 1.1.2
Taiwanese Hokkien Transliterator and Tokeniser
9 versions - Latest release: 19 days ago - 307 downloads last month - 10 stars on GitHub - 1 maintainer
Top 8.7% on pypi.org
match 0.3.2
Match tokenized words and phrases within the original, untokenized, often messy, text.
6 versions - Latest release: over 1 year ago - 10 dependent repositories - 3.29 thousand downloads last month - 20 stars on GitHub - 3 maintainers
butch 1.0.0
The free Batch interpreter
2 versions - Latest release: almost 3 years ago - 1 dependent repositories - 27 downloads last month - 1 stars on GitHub - 1 maintainer
subtokenizer 0.0.19
Subwords tokenizer for neural natural language processing
16 versions - Latest release: almost 5 years ago - 1 dependent repositories - 395 downloads last month - 5 stars on GitHub - 1 maintainer
bpeasy 0.1.2
Fast bare-bones BPE for modern tokenizer training
3 versions - Latest release: 6 months ago - 397 downloads last month - 125 stars on GitHub - 1 maintainer
lughaatnlp 1.0.5
A Python package for natural language processing tasks for the Urdu language, including normaliza...
5 versions - Latest release: about 2 months ago - 175 downloads last month - 0 stars on GitHub - 1 maintainer
Top 7.0% on pypi.org
rosette-api 1.28.0
Rosette API Python client SDK
33 versions - Latest release: 5 months ago - 3 dependent repositories - 1.49 thousand downloads last month - 37 stars on GitHub - 2 maintainers
huspacy-nightly 0.11.0.dev261 💰
HuSpaCy: industrial strength Hungarian natural language processing
126 versions - Latest release: 5 months ago - 1 dependent repositories - 303 downloads last month - 142 stars on GitHub - 1 maintainer
Top 6.6% on pypi.org
huspacy 0.11.0 💰
HuSpaCy: industrial strength Hungarian natural language processing
21 versions - Latest release: 7 months ago - 1 dependent package - 6 dependent repositories - 933 downloads last month - 142 stars on GitHub - 1 maintainer
Top 4.6% on pypi.org
trankit 1.1.1
Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing
11 versions - Latest release: about 2 years ago - 2 dependent packages - 6 dependent repositories - 2.41 thousand downloads last month - 705 stars on GitHub - 1 maintainer
Top 4.5% on pypi.org
ekphrasis 0.5.4
Text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekph...
54 versions - Latest release: about 2 years ago - 48 dependent repositories - 1.71 thousand downloads last month - 656 stars on GitHub - 1 maintainer
handict 0.2.0 💰
Yet another word segmentation tool.
3 versions - Latest release: about 4 years ago - 1 dependent repositories - 39 downloads last month - 1 stars on GitHub - 1 maintainer
aymara 0.4.1
Python bindings to the LIMA linguistic analyzer
22 versions - Latest release: almost 2 years ago - 1 dependent repositories - 303 downloads last month - 102 stars on GitHub - 1 maintainer
quebra-frases 0.3.7
quebra_frases chunks strings into byte sized pieces
12 versions - Latest release: about 3 years ago - 4 dependent packages - 2 dependent repositories - 12.4 thousand downloads last month - 1 stars on GitHub - 2 maintainers
Top 4.6% on pypi.org
spacy-streamlit 1.0.6
Visualize spaCy with streamlit
17 versions - Latest release: about 1 year ago - 68 dependent repositories - 7.11 thousand downloads last month - 766 stars on GitHub - 2 maintainers
Top 3.6% on pypi.org
pyonmttok 1.37.1
Fast and customizable text tokenization library with BPE and SentencePiece support
66 versions - Latest release: over 1 year ago - 3 dependent packages - 103 dependent repositories - 23.1 thousand downloads last month - 259 stars on GitHub - 4 maintainers
Top 2.2% on pypi.org
youtokentome 1.0.6
Unsupervised text tokenizer focused on computational efficiency
8 versions - Latest release: over 4 years ago - 8 dependent packages - 228 dependent repositories - 43.4 thousand downloads last month - 941 stars on GitHub - 3 maintainers
Top 9.8% on pypi.org
tokenmonster 1.1.12
Tokenize and decode text with TokenMonster vocabularies.
15 versions - Latest release: 9 months ago - 2 dependent packages - 1 dependent repositories - 1.4 thousand downloads last month - 485 stars on GitHub - 1 maintainer
Top 3.7% on pypi.org
sentence-splitter 1.4
Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder
4 versions - Latest release: over 5 years ago - 12 dependent packages - 42 dependent repositories - 190 thousand downloads last month - 216 stars on GitHub - 4 maintainers
python-ucto 0.6.7
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost a...
22 versions - Latest release: 7 months ago - 1 dependent package - 4 dependent repositories - 856 downloads last month - 29 stars on GitHub - 1 maintainer
Top 6.0% on pypi.org
ff3 1.0.2
Format Preserving Encryption (FPE) with FF3
7 versions - Latest release: 3 months ago - 3 dependent packages - 3 dependent repositories - 105 thousand downloads last month - 79 stars on GitHub - 1 maintainer
Top 4.5% on pypi.org
nlpcloud 1.1.46
Python client for the NLP Cloud API
42 versions - Latest release: 4 months ago - 28 dependent packages - 574 dependent repositories - 19.9 thousand downloads last month - 66 stars on GitHub - 1 maintainer
Top 5.9% on pypi.org
miditok 3.0.3
MIDI / symbolic music tokenizers for Deep Learning models.
62 versions - Latest release: about 1 month ago - 2 dependent packages - 2 dependent repositories - 1.47 thousand downloads last month - 595 stars on GitHub - 1 maintainer
tokviz 0.1
Library for visualizing tokenization patterns across different language models
1 version - Latest release: 4 months ago - 36 downloads last month - 6 stars on GitHub - 1 maintainer
spacywb 0.1.1 💰
Industrial-strength Natural Language Processing (NLP) in Python
2 versions - Latest release: over 2 years ago - 1 dependent repositories - 31 downloads last month - 28,659 stars on GitHub - 1 maintainer
semantic-split 0.1.0 💰
A better way to split (chunk/group) your text before inserting them into an LLM/Vector DB.
1 version - Latest release: 12 months ago - 246 downloads last month - 28,659 stars on GitHub - 1 maintainer
count-tokens 0.7.0
Count number of tokens in the text file using toktoken tokenizer from OpenAI.
7 versions - Latest release: 8 months ago - 2.04 thousand downloads last month - 3 stars on GitHub - 1 maintainer
tokenization-scorer 1.0.1
Package for evaluating text tokenizations.
2 versions - Latest release: about 1 year ago - 174 downloads last month - 21 stars on GitHub - 1 maintainer
vaporetto 0.3.0
Python wrapper of Vaporetto tokenizer
5 versions - Latest release: about 1 year ago - 1 dependent repositories - 3.66 thousand downloads last month - 19 stars on GitHub - 1 maintainer
witokit 1.1.0
A python module to generate a tokenized dump of Wikipedia for NLP
20 versions - Latest release: over 4 years ago - 1 dependent repositories - 66 downloads last month - 9 stars on GitHub - 1 maintainer
wikipedia-ner 0.0.24
Python package for creating labeled examples from wiki dumps
22 versions - Latest release: about 9 years ago - 3 dependent repositories - 93 downloads last month - 68 stars on GitHub - 1 maintainer
vtext 0.2.0
Natural Language Processing in Rust with Python bidings
4 versions - Latest release: almost 4 years ago - 4 dependent repositories - 191 downloads last month - 147 stars on GitHub - 1 maintainer
ud-toolkit 0.0.2
NLP toolkit built around UDPipe.
2 versions - Latest release: over 5 years ago - 1 dependent repositories - 10 downloads last month - 4 stars on GitHub - 1 maintainer
tokenization-layer 0.0.2
An NLP tokenization algorithm that is a trainable layer for neural networks.
2 versions - Latest release: almost 3 years ago - 1 dependent repositories - 23 downloads last month - 2 stars on GitHub - 1 maintainer
tokenize-output 0.4.10 💰
Get identifiers, names, paths, URLs and words from the command output.
9 versions - Latest release: about 1 year ago - 1 dependent package - 3 dependent repositories - 166 downloads last month - 6 stars on GitHub - 1 maintainer
spacy-weibo 2.3.0 💰
Industrial-strength Natural Language Processing (NLP) in Python
1 version - Latest release: over 2 years ago - 1 dependent repositories - 24 downloads last month - 28,659 stars on GitHub - 1 maintainer
Top 8.6% on pypi.org
ssg 0.0.8
Thai syllable segmentation using Conditional Random Fields
6 versions - Latest release: almost 3 years ago - 1 dependent package - 16 dependent repositories - 2.51 thousand downloads last month - 23 stars on GitHub - 1 maintainer
Top 4.5% on pypi.org
spacy-nightly 3.0.0rc5 💰
Industrial-strength Natural Language Processing (NLP) in Python
74 versions - Latest release: over 3 years ago - 2 dependent packages - 9 dependent repositories - 2.88 thousand downloads last month - 28,659 stars on GitHub - 2 maintainers
spacyface 0.3.0
Aligner for spacy and huggingface tokenization
10 versions - Latest release: about 3 years ago - 1 dependent repositories - 109 downloads last month - 44 stars on GitHub - 1 maintainer
sept 0.4.2
The Simple Extensible Path Template (sept) is a simple to configure templating system designed at...
6 versions - Latest release: over 2 years ago - 1 dependent repositories - 37 downloads last month - 8 stars on GitHub - 1 maintainer
pyfpe 0.10.3
Python FPE- Does Format preserving Encryption of values
3 versions - Latest release: over 2 years ago - 1 dependent repositories - 134 downloads last month - 2 stars on GitHub - 1 maintainer
ponrawee-ssg 0.0.8
Thai syllable segmentation using Conditional Random Fields
1 version - Latest release: almost 3 years ago - 1 dependent repositories - 18 downloads last month - 23 stars on GitHub - 1 maintainer
pithy 0.0.13
Pithy is a collection of utility libraries for Python 3.
11 versions - Latest release: about 4 years ago - 8 dependent repositories - 42 downloads last month - 5 stars on GitHub - 1 maintainer
overtokenizer 0.2.0
Unicode-based language-agnostic (over-) tokenizer.
2 versions - Latest release: about 6 years ago - 1 dependent repositories - 19 downloads last month - 1 maintainer
nlp-preprocessing 0.2.0
A Package for text preprocessing
14 versions - Latest release: almost 4 years ago - 1 dependent repositories - 141 downloads last month - 16 stars on GitHub - 1 maintainer
naivenlp 0.0.9
NLP toolkit, including tokenization, sequence tagging, etc.
9 versions - Latest release: almost 4 years ago - 1 dependent repositories - 54 downloads last month - 2 stars on GitHub - 1 maintainer
grigora 0.0.3
Optimised implementation of common deep learning preprocessing utilities.
3 versions - Latest release: almost 5 years ago - 1 dependent repositories - 14 downloads last month - 2 stars on GitHub - 1 maintainer
datesearch 0.0.1
Поиск токенов, относящихся к датам, по регулярным выражениям
1 version - Latest release: over 3 years ago - 1 dependent repositories - 16 downloads last month - 0 stars on GitHub - 1 maintainer
code-tokenize 0.2.0
Fast program tokenization and structural analysis in Python
3 versions - Latest release: almost 2 years ago - 2 dependent repositories - 498 downloads last month - 38 stars on GitHub - 1 maintainer
charformer-pytorch 0.0.4
Charformer - Pytorch
4 versions - Latest release: almost 3 years ago - 1 dependent repositories - 110 downloads last month - 119 stars on GitHub - 1 maintainer
beanstream 1.0.1
Beanstream SDK library for processing credit card payments.
5 versions - Latest release: almost 8 years ago - 3 dependent repositories - 1.25 thousand downloads last month - 8 stars on GitHub - 1 maintainer
spacy-ci-improve 2.0.5 💰
Industrial-strength Natural Language Processing (NLP) with Python and Cython
1 version - Latest release: over 6 years ago - 1 dependent repositories - 21 downloads last month - 28,659 stars on GitHub - 1 maintainer
nlpannotator 1.0.6
Annotator combining different NLP pipelines
7 versions - Latest release: 8 months ago - 46 downloads last month - 0 stars on GitHub - 1 maintainer
nlpcube 0.3.1.2
Natural Language Processing Toolkit with support for tokenization, sentence splitting, lemmatizat...
22 versions - Latest release: 12 months ago - 1 dependent repositories - 223 downloads last month - 551 stars on GitHub - 4 maintainers
Top 5.9% on pypi.org
attacut 1.0.6
Fast and Reasonably Accurate Word Tokenizer for Thai
17 versions - Latest release: over 4 years ago - 1 dependent package - 11 dependent repositories - 2.46 thousand downloads last month - 71 stars on GitHub - 1 maintainer
xontrib-output-search 0.6.5 💰
Get identifiers, names, paths, URLs and words from the previous command output and use them for t...
13 versions - Latest release: 3 months ago - 1 dependent package - 5 dependent repositories - 180 downloads last month - 35 stars on GitHub - 1 maintainer
anyks-lm 3.5.0 💰
Smart language model
43 versions - Latest release: over 1 year ago - 1 dependent repositories - 383 downloads last month - 47 stars on GitHub - 1 maintainer
tolkien 0.0.1
Token class for lexers and parsers.
1 version - Latest release: over 4 years ago - 1 dependent repositories - 9 downloads last month - 5 stars on GitHub - 1 maintainer
hebpipe 3.0.0.6
A pipeline for Hebrew NLP
14 versions - Latest release: 10 months ago - 1 dependent repositories - 176 downloads last month - 28 stars on GitHub - 1 maintainer
mrs-spellings 1.0.3
a micro utility for generating plausible misspellings
7 versions - Latest release: almost 4 years ago - 1 dependent repositories - 50 downloads last month - 2 stars on GitHub - 1 maintainer
Top 3.8% on pypi.org
icetk 0.0.7
A unified tokenization tool for Images, Chinese and English.
7 versions - Latest release: about 1 year ago - 18 dependent packages - 552 dependent repositories - 80.7 thousand downloads last month - 145 stars on GitHub - 1 maintainer
Top 9.0% on pypi.org
spacy-wheel 3.5.0 removed
Reupload of SpaCy 3.4.4 with Global Wheel
1 version - Latest release: over 1 year ago - 25,021 stars on GitHub - 1 maintainer