An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "Tokenizer" keyword

View the packages on the pypi.org package registry that are tagged with the "Tokenizer" keyword.

pytokenizer 1.1.4
A streaming tokenizer.
6 versions - Latest release: about 5 years ago - 1 dependent repositories - 32 downloads last month - 0 stars on GitHub - 1 maintainer
alta-tokenizer 1.2.3
ALTA tokenizer for encoding and decoding Kinyarwanda language text
7 versions - Latest release: 3 months ago - 49 downloads last month - 1 maintainer
sentencex 0.6.1
Sentence segmenter that supports ~300 languages
7 versions - Latest release: almost 2 years ago - 7.47 thousand downloads last month - 65 stars on GitHub - 1 maintainer
tinybpe 0.1.1
This is an ultra-fast, lightweight and clean code implementation of the Byte Pair Encoding (BPE) ...
2 versions - Latest release: 5 months ago - 231 downloads last month - 4 stars on GitHub - 1 maintainer
zicutter 0.0.10
ZiCutter: cut character smaller
11 versions - Latest release: over 2 years ago - 12 downloads last month - 1 stars on GitHub - 1 maintainer
zh-sentence 0.0.5
Light-weight sentence tokenizer for Chinese languages.
5 versions - Latest release: almost 4 years ago - 1 dependent repositories - 125 downloads last month - 2 stars on GitHub - 1 maintainer
ja-sentence 0.0.5
Light-weight sentence tokenizer for Japanese.
5 versions - Latest release: almost 4 years ago - 1 dependent repositories - 108 downloads last month - 1 stars on GitHub - 1 maintainer
zitokenizer 0.0.8
ZiTokenizer: tokenize world text as Zi
8 versions - Latest release: over 2 years ago - 16 downloads last month - 1 stars on GitHub - 1 maintainer
bleuscore 0.1.4
A fast bleu score calculator
5 versions - Latest release: 3 months ago - 674 downloads last month - 11 stars on GitHub - 1 maintainer
tokeniser-py 0.1.4
A custom tokeniser with a 131,072-token vocabulary derived from 0.5B (val) and 1B (val+test) toke...
5 versions - Latest release: 5 months ago - 26 downloads last month - 0 stars on GitHub - 1 maintainer
texo 0.0.4
Sentiment Analysis Multiple language and for all products
4 versions - Latest release: about 2 years ago - 10 downloads last month - 1 stars on GitHub - 1 maintainer
kotokenizer 0.1.1
Korean tokenizer, sentence classification, and spacing model.
2 versions - Latest release: over 1 year ago - 24 downloads last month - 1 maintainer
unicodetokenizer 0.2.2
UnicodeTokenizer: tokenize all Unicode text
25 versions - Latest release: almost 2 years ago - 53 downloads last month - 0 stars on GitHub - 1 maintainer
kin-tokenizer 3.3.2
Kinyarwanda tokenizer for encoding and decoding Kinyarwanda language text
7 versions - Latest release: about 1 year ago - 36 downloads last month - 1 maintainer
kr-sentence 0.0.3
Light-weight sentence tokenizer for Korean.
3 versions - Latest release: almost 4 years ago - 1 dependent repositories - 78 downloads last month - 1 stars on GitHub - 1 maintainer
atma 0.4.0
Commonly-used & tested NLP tools, include bleu, tokenizer and so on
1 version - Latest release: over 8 years ago - 1 dependent repositories - 7 downloads last month - 6 stars on GitHub - 1 maintainer
tokeniser-py-lite 0.1.1
A custom tokeniser with a 131,072-token vocabulary derived from 0.5B (val) and 1B (val+test) toke...
2 versions - Latest release: 5 months ago - 13 downloads last month - 0 stars on GitHub - 1 maintainer
jieba3 1.0.2
“结巴 3”中文分词:做最好的 Modern Python 3 中文分词组件
3 versions - Latest release: 11 months ago - 759 downloads last month - 12 stars on GitHub - 1 maintainer
sumire 1.0.2
Scikit-learn compatible Japanese text vectorizer for CPU-based Japanese natural language processing.
2 versions - Latest release: over 1 year ago - 11 downloads last month - 0 stars on GitHub - 1 maintainer
pykomoran 0.1.6
PyKomoran is Python wrapper for KOMORAN, KOrean MORphical ANalyzer.
7 versions - Latest release: over 4 years ago - 1 dependent repositories - 541 downloads last month - 42 stars on GitHub - 1 maintainer