pypi.org "Tokenizer" keyword
View the packages on the pypi.org package registry that are tagged with the "Tokenizer" keyword.
pytokenizer 1.1.4
A streaming tokenizer.6 versions - Latest release: about 5 years ago - 1 dependent repositories - 32 downloads last month - 0 stars on GitHub - 1 maintainer
alta-tokenizer 1.2.3
ALTA tokenizer for encoding and decoding Kinyarwanda language text7 versions - Latest release: 3 months ago - 49 downloads last month - 1 maintainer
sentencex 0.6.1
Sentence segmenter that supports ~300 languages7 versions - Latest release: almost 2 years ago - 7.47 thousand downloads last month - 65 stars on GitHub - 1 maintainer
tinybpe 0.1.1
This is an ultra-fast, lightweight and clean code implementation of the Byte Pair Encoding (BPE) ...2 versions - Latest release: 5 months ago - 231 downloads last month - 4 stars on GitHub - 1 maintainer
zicutter 0.0.10
ZiCutter: cut character smaller11 versions - Latest release: over 2 years ago - 12 downloads last month - 1 stars on GitHub - 1 maintainer
zh-sentence 0.0.5
Light-weight sentence tokenizer for Chinese languages.5 versions - Latest release: almost 4 years ago - 1 dependent repositories - 125 downloads last month - 2 stars on GitHub - 1 maintainer
ja-sentence 0.0.5
Light-weight sentence tokenizer for Japanese.5 versions - Latest release: almost 4 years ago - 1 dependent repositories - 108 downloads last month - 1 stars on GitHub - 1 maintainer
zitokenizer 0.0.8
ZiTokenizer: tokenize world text as Zi8 versions - Latest release: over 2 years ago - 16 downloads last month - 1 stars on GitHub - 1 maintainer
bleuscore 0.1.4
A fast bleu score calculator5 versions - Latest release: 3 months ago - 674 downloads last month - 11 stars on GitHub - 1 maintainer
tokeniser-py 0.1.4
A custom tokeniser with a 131,072-token vocabulary derived from 0.5B (val) and 1B (val+test) toke...5 versions - Latest release: 5 months ago - 26 downloads last month - 0 stars on GitHub - 1 maintainer
texo 0.0.4
Sentiment Analysis Multiple language and for all products4 versions - Latest release: about 2 years ago - 10 downloads last month - 1 stars on GitHub - 1 maintainer
kotokenizer 0.1.1
Korean tokenizer, sentence classification, and spacing model.2 versions - Latest release: over 1 year ago - 24 downloads last month - 1 maintainer
unicodetokenizer 0.2.2
UnicodeTokenizer: tokenize all Unicode text25 versions - Latest release: almost 2 years ago - 53 downloads last month - 0 stars on GitHub - 1 maintainer
kin-tokenizer 3.3.2
Kinyarwanda tokenizer for encoding and decoding Kinyarwanda language text7 versions - Latest release: about 1 year ago - 36 downloads last month - 1 maintainer
kr-sentence 0.0.3
Light-weight sentence tokenizer for Korean.3 versions - Latest release: almost 4 years ago - 1 dependent repositories - 78 downloads last month - 1 stars on GitHub - 1 maintainer
atma 0.4.0
Commonly-used & tested NLP tools, include bleu, tokenizer and so on1 version - Latest release: over 8 years ago - 1 dependent repositories - 7 downloads last month - 6 stars on GitHub - 1 maintainer
tokeniser-py-lite 0.1.1
A custom tokeniser with a 131,072-token vocabulary derived from 0.5B (val) and 1B (val+test) toke...2 versions - Latest release: 5 months ago - 13 downloads last month - 0 stars on GitHub - 1 maintainer
jieba3 1.0.2
“结巴 3”中文分词:做最好的 Modern Python 3 中文分词组件3 versions - Latest release: 11 months ago - 759 downloads last month - 12 stars on GitHub - 1 maintainer
sumire 1.0.2
Scikit-learn compatible Japanese text vectorizer for CPU-based Japanese natural language processing.2 versions - Latest release: over 1 year ago - 11 downloads last month - 0 stars on GitHub - 1 maintainer
pykomoran 0.1.6
PyKomoran is Python wrapper for KOMORAN, KOrean MORphical ANalyzer.7 versions - Latest release: over 4 years ago - 1 dependent repositories - 541 downloads last month - 42 stars on GitHub - 1 maintainer
Related Keywords
NLP
8
tokenizer
7
LLM
4
sentence
4
Korean
3
Sentence
3
Large Language Model
3
laohur
3
Unicode
3
Natural Language Processing
3
ZiTokenizer
3
nlp
3
UnicodeTokenizer
3
ZiCutter
3
Large Language Models
2
python
2
Chinese
2
Japanese
2
bleu
2
korean
2
Kinyarwanda
2
Language Models
2
Language Model
2
LM
2
LMs
2
LLMs
2
Tokeniser
2
Tokens
2
Python
1
Korean Tokenizer
1
torchvision
1
Natural Language Process
1
unicode
1
KinGPT
1
PIL
1
Python3
1
pypi-packages
1
py4j
1
morphological-analyser
1
korean-tokenizer
1
korean-text-processing
1
korean-nlp
1
korean-analysis
1
komoran
1
Linguistic
1
PoS Tagger
1
Text Analyzer
1
MORphical Analyzer
1
KOrean MORphical ANalyzer
1
KOMORAN
1
Scikit-learn
1
Analysis
1
Proxy
1
Crawler
1
Tool
1
Token
1
Streaming
1
python3
1
streaming-parser
1
streaming-tokenizer
1
ALTA Model
1
natural-language-processing
1
sentence-segmentation
1
BPE
1
Byte Pair Encoding
1
bpe
1
bpe-tokenizer
1
cpython-extensions
1
llm
1
chinese
1
japanese
1
BLEU
1
DeepLearning
1
bleu-score
1
deep-learning
1
maturin
1
ngrams
1
pyo3
1
rust
1
Sentiment
1
Text-Analyzer
1
sockets
1
torch
1