Ecosyste.ms: Packages
An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.
pypi.org "tokenizer" keyword
lexikanon 0.6.5
A Python Library for Tokenizers26 versions - Latest release: about 2 months ago - 3 dependent packages - 122 downloads last month - 1 stars on GitHub - 1 maintainer
kimchima 0.5.0
The collections of tools for ML model development.13 versions - Latest release: about 1 month ago - 216 downloads last month - 0 stars on GitHub - 1 maintainer
crossandra 2.1.0
A fast and simple enum/regex-based tokenizer with decent configurability10 versions - Latest release: 21 days ago - 1 dependent package - 1 dependent repositories - 1.84 thousand downloads last month - 8 stars on GitHub - 1 maintainer
plane 0.2.1 ๐ฐ
A lib for text preprocessing20 versions - Latest release: over 3 years ago - 3 dependent repositories - 236 downloads last month - 11 stars on GitHub - 1 maintainer
example990420 1.1.2
Taiwanese Hokkien Transliterator and Tokeniser9 versions - Latest release: 8 days ago - 307 downloads last month - 10 stars on GitHub - 1 maintainer
taibun 1.1.2
Taiwanese Hokkien Transliterator and Tokeniser10 versions - Latest release: 8 days ago - 419 downloads last month - 10 stars on GitHub - 1 maintainer
bodotokenizer 0.1.1
Package for Bodo Tokenizer2 versions - Latest release: about 2 years ago - 1 dependent repositories - 30 downloads last month - 0 stars on GitHub - 1 maintainer
jk-php-tokenizer 0.2020.3.9
This python module is a tokenizer for configuration files written in PHP.1 version - Latest release: about 4 years ago - 1 dependent repositories - 15 downloads last month - 1 stars on GitHub - 1 maintainer
Top 5.5% on pypi.org
14 versions - Latest release: over 1 year ago - 6 dependent packages - 25 dependent repositories - 10.7 thousand downloads last month - 128 stars on GitHub - 1 maintainer
simplemma 0.9.1
A simple multilingual lemmatizer for Python.14 versions - Latest release: over 1 year ago - 6 dependent packages - 25 dependent repositories - 10.7 thousand downloads last month - 128 stars on GitHub - 1 maintainer
djurl 0.2.0
Simple yet helpful library for writing Django urls by an easy, short an intuitive way.4 versions - Latest release: almost 7 years ago - 2 dependent repositories - 16 downloads last month - 80 stars on GitHub - 1 maintainer
space-wrap 0.0.3
Automated Spacy wrapper to turn plain text into Spacy doc objects3 versions - Latest release: over 1 year ago - 29 downloads last month - 1 maintainer
pyregtokenizer 0.0.1
A BPE Tokenizer using regex2 versions - Latest release: about 1 month ago - 240 downloads last month - 1 maintainer
mecab-text-cleaner 0.1.1 ๐ฐ
Simple Python package for getting japanese reading (yomigana) using MeCab2 versions - Latest release: 5 months ago - 18 downloads last month - 3 stars on GitHub - 1 maintainer
unico 0.0.0
Unico provides Unicode metadata parsed directly from the published standard data.1 version - Latest release: 9 months ago - 1 dependent repositories - 0 stars on GitHub - 1 maintainer
openai-function-tokens 0.1.2
A package to estimate token counts for messages AND functions in openai's chat completion API.3 versions - Latest release: 8 months ago - 1.07 thousand downloads last month - 14 stars on GitHub - 1 maintainer
bpeasy 0.1.2
Fast bare-bones BPE for modern tokenizer training3 versions - Latest release: 5 months ago - 397 downloads last month - 125 stars on GitHub - 1 maintainer
sengirifix 0.1.3
Yet another fork of sentence-level tokenizer for the Japanese text1 version - Latest release: over 3 years ago - 1 dependent repositories - 14 downloads last month - 0 stars on GitHub - 1 maintainer
Top 2.3% on pypi.org
15 versions - Latest release: 4 months ago - 8 dependent packages - 126 dependent repositories - 7.78 thousand downloads last month - 1,116 stars on GitHub - 1 maintainer
hazm 0.10.0
Persian NLP Toolkit15 versions - Latest release: 4 months ago - 8 dependent packages - 126 dependent repositories - 7.78 thousand downloads last month - 1,116 stars on GitHub - 1 maintainer
tokenizers-gt 0.15.2.post0
๐ฅ Fast State-of-the-Art Tokenizers optimized for Research and Production3 versions - Latest release: 3 months ago - 1.21 thousand downloads last month - 8,489 stars on GitHub - 1 maintainer
Top 0.6% on pypi.org
95 versions - Latest release: about 1 month ago - 380 dependent packages - 14,571 dependent repositories - 25.7 million downloads last month - 8,489 stars on GitHub - 4 maintainers
tokenizers 0.19.1
๐ฅ Fast State-of-the-Art Tokenizers optimized for Research and Production95 versions - Latest release: about 1 month ago - 380 dependent packages - 14,571 dependent repositories - 25.7 million downloads last month - 8,489 stars on GitHub - 4 maintainers
pyvgram 0.1.2
VGram tokenization5 versions - Latest release: over 2 years ago - 1 dependent repositories - 38 downloads last month - 0 stars on GitHub - 1 maintainer
ilmulti 0.0.1
Multilingual Text Tooling around Indian Languages2 versions - Latest release: over 3 years ago - 1 dependent repositories - 10 downloads last month - 21 stars on GitHub - 1 maintainer
tokengeex 1.0.1
TokenGeeX is an efficient tokenizer for code based on UnigramLM and TokenMonster.9 versions - Latest release: 5 days ago - 6 thousand downloads last month - 3 stars on GitHub - 1 maintainer
tokenizer-adapter 0.1.2
A simple to adapt a pretrained language model to a new vocabulary3 versions - Latest release: 4 months ago - 41 downloads last month - 1 stars on GitHub - 1 maintainer
python-vncorenlp 0.1.8
python_vncorenlp9 versions - Latest release: almost 4 years ago - 1 dependent repositories - 67 downloads last month - 2 stars on GitHub - 1 maintainer
python-rdrsegmenter 0.1.1
python_rdrsegmenter2 versions - Latest release: over 3 years ago - 1 dependent repositories - 96 downloads last month - 1 stars on GitHub - 1 maintainer
word-piece-tokenizer 1.0.1 ๐ฐ
A Lightweight Word Piece Tokenizer2 versions - Latest release: over 1 year ago - 289 downloads last month - 5 stars on GitHub - 1 maintainer
Top 4.5% on pypi.org
54 versions - Latest release: about 2 years ago - 48 dependent repositories - 1.71 thousand downloads last month - 656 stars on GitHub - 1 maintainer
ekphrasis 0.5.4
Text processing tool, geared towards text from social networks, such as Twitter or Facebook. Ekph...54 versions - Latest release: about 2 years ago - 48 dependent repositories - 1.71 thousand downloads last month - 656 stars on GitHub - 1 maintainer
handict 0.2.0 ๐ฐ
Yet another word segmentation tool.3 versions - Latest release: about 4 years ago - 1 dependent repositories - 39 downloads last month - 1 stars on GitHub - 1 maintainer
easy-tokenizer 0.0.10
tokenizer tool8 versions - Latest release: about 4 years ago - 1 dependent repositories - 103 downloads last month - 1 stars on GitHub - 1 maintainer
Top 1.9% on pypi.org
16 versions - Latest release: about 2 years ago - 6 dependent packages - 59 dependent repositories - 68.4 thousand downloads last month - 193 stars on GitHub - 1 maintainer
syntok 1.4.4
Text tokenization and sentence segmentation (segtok v2).16 versions - Latest release: about 2 years ago - 6 dependent packages - 59 dependent repositories - 68.4 thousand downloads last month - 193 stars on GitHub - 1 maintainer
hindikosh 0.0.1
Hindi corpus reader1 version - Latest release: over 5 years ago - 1 dependent repositories - 18 downloads last month - 1 stars on GitHub - 1 maintainer
Top 4.4% on pypi.org
17 versions - Latest release: almost 2 years ago - 5 dependent packages - 696 dependent repositories - 267 thousand downloads last month - 29 stars on GitHub - 4 maintainers
segments 2.2.1
Unicode Standard tokenization routines and orthography profile segmentation17 versions - Latest release: almost 2 years ago - 5 dependent packages - 696 dependent repositories - 267 thousand downloads last month - 29 stars on GitHub - 4 maintainers
quebra-frases 0.3.7
quebra_frases chunks strings into byte sized pieces12 versions - Latest release: almost 3 years ago - 4 dependent packages - 2 dependent repositories - 12.4 thousand downloads last month - 1 stars on GitHub - 2 maintainers
sengiri 0.2.1 ๐ฐ
Yet another sentence-level tokenizer for the Japanese text3 versions - Latest release: over 4 years ago - 7 dependent repositories - 460 downloads last month - 21 stars on GitHub - 1 maintainer
bleuscore 0.1.2
A fast(not yet :) bleu score calculator3 versions - Latest release: 23 days ago - 1.33 thousand downloads last month - 0 stars on GitHub - 1 maintainer
Top 5.3% on pypi.org
9 versions - Latest release: almost 4 years ago - 1 dependent package - 10 dependent repositories - 1.26 thousand downloads last month - 196 stars on GitHub - 1 maintainer
hangul-utils 0.4.5
An integrated library for Korean preprocessing.9 versions - Latest release: almost 4 years ago - 1 dependent package - 10 dependent repositories - 1.26 thousand downloads last month - 196 stars on GitHub - 1 maintainer
Top 5.1% on pypi.org
2 versions - Latest release: almost 6 years ago - 4 dependent packages - 31 dependent repositories - 2.15 thousand downloads last month - 55 stars on GitHub - 1 maintainer
vncorenlp 1.0.3
A Python wrapper for VnCoreNLP using a bidirectional communication channel.2 versions - Latest release: almost 6 years ago - 4 dependent packages - 31 dependent repositories - 2.15 thousand downloads last month - 55 stars on GitHub - 1 maintainer
Top 3.6% on pypi.org
66 versions - Latest release: about 1 year ago - 3 dependent packages - 103 dependent repositories - 23.1 thousand downloads last month - 259 stars on GitHub - 4 maintainers
pyonmttok 1.37.1
Fast and customizable text tokenization library with BPE and SentencePiece support66 versions - Latest release: about 1 year ago - 3 dependent packages - 103 dependent repositories - 23.1 thousand downloads last month - 259 stars on GitHub - 4 maintainers
Top 1.6% on pypi.org
52 versions - Latest release: 7 months ago - 120 dependent packages - 5,564 dependent repositories - 2.07 million downloads last month - 479 stars on GitHub - 3 maintainers
sacremoses 0.1.1
SacreMoses52 versions - Latest release: 7 months ago - 120 dependent packages - 5,564 dependent repositories - 2.07 million downloads last month - 479 stars on GitHub - 3 maintainers
unicodetokenizer 0.2.2
UnicodeTokenizer: tokenize all Unicode text25 versions - Latest release: 6 months ago - 270 downloads last month - 0 stars on GitHub - 1 maintainer
Top 9.8% on pypi.org
15 versions - Latest release: 9 months ago - 2 dependent packages - 1 dependent repositories - 1.4 thousand downloads last month - 485 stars on GitHub - 1 maintainer
tokenmonster 1.1.12
Tokenize and decode text with TokenMonster vocabularies.15 versions - Latest release: 9 months ago - 2 dependent packages - 1 dependent repositories - 1.4 thousand downloads last month - 485 stars on GitHub - 1 maintainer
Top 6.2% on pypi.org
53 versions - Latest release: 9 months ago - 6 dependent packages - 75 dependent repositories - 20.9 thousand downloads last month - 27 stars on GitHub - 3 maintainers
tokenizer 3.4.3
A tokenizer for Icelandic text53 versions - Latest release: 9 months ago - 6 dependent packages - 75 dependent repositories - 20.9 thousand downloads last month - 27 stars on GitHub - 3 maintainers
Top 5.2% on pypi.org
9 versions - Latest release: 7 months ago - 6 dependent packages - 14 dependent repositories - 3.96 thousand downloads last month - 94 stars on GitHub - 1 maintainer
spacy-experimental 0.6.4
Cutting-edge experimental spaCy components and features9 versions - Latest release: 7 months ago - 6 dependent packages - 14 dependent repositories - 3.96 thousand downloads last month - 94 stars on GitHub - 1 maintainer
Top 6.5% on pypi.org
59 versions - Latest release: 3 months ago - 1 dependent package - 10 dependent repositories - 1.8 thousand downloads last month - 133 stars on GitHub - 1 maintainer
somajo 2.4.2
A tokenizer and sentence splitter for German and English web and social media texts.59 versions - Latest release: 3 months ago - 1 dependent package - 10 dependent repositories - 1.8 thousand downloads last month - 133 stars on GitHub - 1 maintainer
Top 3.7% on pypi.org
4 versions - Latest release: over 5 years ago - 12 dependent packages - 42 dependent repositories - 190 thousand downloads last month - 216 stars on GitHub - 4 maintainers
sentence-splitter 1.4
Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder4 versions - Latest release: over 5 years ago - 12 dependent packages - 42 dependent repositories - 190 thousand downloads last month - 216 stars on GitHub - 4 maintainers
semantic-text-splitter 0.13.1
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by chara...24 versions - Latest release: 14 days ago - 1 dependent package - 11.8 thousand downloads last month - 135 stars on GitHub - 1 maintainer
Top 2.9% on pypi.org
23 versions - Latest release: over 2 years ago - 8 dependent packages - 353 dependent repositories - 327 thousand downloads last month - 166 stars on GitHub - 1 maintainer
segtok 1.5.11
sentence segmentation and word tokenization tools23 versions - Latest release: over 2 years ago - 8 dependent packages - 353 dependent repositories - 327 thousand downloads last month - 166 stars on GitHub - 1 maintainer
ai21-tokenizer 0.9.1
AI21's Jurassic models tokenizers16 versions - Latest release: 7 days ago - 1 dependent package - 71.8 thousand downloads last month - 26 stars on GitHub - 1 maintainer
python-ucto 0.6.7
This is a Python binding to the tokenizer Ucto. Tokenisation is one of the first step in almost a...22 versions - Latest release: 7 months ago - 1 dependent package - 4 dependent repositories - 856 downloads last month - 29 stars on GitHub - 1 maintainer
Top 2.4% on pypi.org
13 versions - Latest release: 10 months ago - 6 dependent packages - 73 dependent repositories - 9.39 thousand downloads last month - 1,149 stars on GitHub - 2 maintainers
natasha 1.6.0
Named-entity recognition for russian language13 versions - Latest release: 10 months ago - 6 dependent packages - 73 dependent repositories - 9.39 thousand downloads last month - 1,149 stars on GitHub - 2 maintainers
Top 3.3% on pypi.org
25 versions - Latest release: 4 months ago - 5 dependent packages - 28 dependent repositories - 346 thousand downloads last month - 367 stars on GitHub - 1 maintainer
nagisa 0.2.11
A Japanese tokenizer based on recurrent neural networks25 versions - Latest release: 4 months ago - 5 dependent packages - 28 dependent repositories - 346 thousand downloads last month - 367 stars on GitHub - 1 maintainer
Top 2.3% on pypi.org
71 versions - Latest release: about 1 month ago - 32 dependent packages - 243 dependent repositories - 217 thousand downloads last month - 366 stars on GitHub - 1 maintainer
fugashi 1.3.2 ๐ฐ
A Cython MeCab wrapper for fast, pythonic Japanese tokenization.71 versions - Latest release: about 1 month ago - 32 dependent packages - 243 dependent repositories - 217 thousand downloads last month - 366 stars on GitHub - 1 maintainer
ebnfparser 2.1.3
very powerful and optional parser framework for python24 versions - Latest release: about 6 years ago - 1 dependent repositories - 115 downloads last month - 64 stars on GitHub - 1 maintainer
tiniestsegmenter 0.1.0
Compact Japanese segmenter1 version - Latest release: 11 days ago - 0 stars on GitHub - 1 maintainer
gpt3_tokenizer 0.1.5
Encoder/Decoder and tokens counter for GPT36 versions - Latest release: 26 days ago - 796 downloads last month - 7 stars on GitHub - 1 maintainer
alt-eval 1.1.0
Automatic lyrics transcription evaluation toolkit3 versions - Latest release: 2 months ago - 35 downloads last month - 479 stars on GitHub - 1 maintainer
rs-bytepiece 0.2.2
bytepiece-rs Python binding7 versions - Latest release: 6 months ago - 68 downloads last month - 14 stars on GitHub - 1 maintainer
semiformal 0.7.0
Tokenizer for semiformal unicode text using TR-29 segmentation2 versions - Latest release: 9 months ago - 1 dependent repositories - 187 downloads last month - 0 stars on GitHub - 1 maintainer
optimal-data-selector 1.2.1
('A Package for to optimize models, use for nlp short word treatment, choosing optimal data for M...21 versions - Latest release: 5 months ago - 30 downloads last month - 1 maintainer
count-tokens 0.7.0
Count number of tokens in the text file using toktoken tokenizer from OpenAI.7 versions - Latest release: 8 months ago - 2.04 thousand downloads last month - 3 stars on GitHub - 1 maintainer
mwtokenizer 0.2.0
Wikipedia Tokenizer Utility3 versions - Latest release: 5 months ago - 1 dependent repositories - 17 downloads last month - 1 maintainer
nepalitokenizers 0.0.2
Pre-trained Tokenizers for the Nepali language with an interface to HuggingFace's tokenizers libr...2 versions - Latest release: 11 months ago - 74 downloads last month - 2 stars on GitHub - 1 maintainer
wyzard 1.0
Run various transformers models from one packages.3 versions - Latest release: about 1 year ago - 33 downloads last month - 0 stars on GitHub - 2 maintainers
zltk 0.0.1
A collection of commonly used functions.2 versions - Latest release: 5 months ago - 24 downloads last month - 1 maintainer
basictokenizer 0.0.4 removed
A basic and useful tokenizer.3 versions - Latest release: over 1 year ago - 49 downloads last month - 1 maintainer
extractionstring 0.8.2
Basic tools to tokenize (i.e. to construct atomic-entities/sub-strings of) a string, for Natural ...1 version - Latest release: over 1 year ago - 6 downloads last month - 0 stars on framagit.org - 1 maintainer
vibrato 0.2.0
Viterbi-based accelerated tokenizer (Python wrapper)3 versions - Latest release: about 1 year ago - 2.87 thousand downloads last month - 34 stars on GitHub - 1 maintainer
ipa-core 0.1.3
NLP Preprocessing Pipeline Wrappers4 versions - Latest release: about 1 year ago - 27 downloads last month - 13 stars on GitHub - 1 maintainer
scanpars 0.0.0 removed
scanpars umbrella project1 version - Latest release: over 1 year ago - 11 downloads last month - 0 stars on GitHub - 1 maintainer
jf-tokenize-package 1.0.3
A simple tokenizer function for NLP4 versions - Latest release: almost 2 years ago - 21 downloads last month - 0 stars on GitHub - 1 maintainer
vaporetto 0.3.0
Python wrapper of Vaporetto tokenizer5 versions - Latest release: about 1 year ago - 1 dependent repositories - 3.66 thousand downloads last month - 19 stars on GitHub - 1 maintainer
zh-sentence 0.0.5
Light-weight sentence tokenizer for Chinese languages.5 versions - Latest release: over 2 years ago - 1 dependent repositories - 167 downloads last month - 1 stars on GitHub - 1 maintainer
youcab 0.1.3
Converts MeCab parsing results to Python objects.4 versions - Latest release: over 3 years ago - 1 dependent repositories - 36 downloads last month - 0 stars on GitHub - 1 maintainer
xml-cleaner 2.0.4
Word and sentence tokenization.27 versions - Latest release: over 7 years ago - 4 dependent repositories - 116 downloads last month - 13 stars on GitHub - 1 maintainer
whoosh-igo 0.7
tokenizers for Whoosh designed for Japanese language6 versions - Latest release: almost 12 years ago - 2 dependent repositories - 21 downloads last month - 6 stars on GitHub - 1 maintainer
unitok 3.5.2
Unified Tokenizer99 versions - Latest release: about 2 months ago - 1 dependent repositories - 200 downloads last month - 4 stars on GitHub - 2 maintainers
twokenize 1.0.0
Word segmentation / tokenization focussed on Twitter1 version - Latest release: almost 6 years ago - 6 dependent repositories - 90 downloads last month - 7 stars on GitHub - 1 maintainer
twkorean 0.1.5
Python interface to twitter-korean-text, a Korean morphological analyzer.6 versions - Latest release: over 9 years ago - 4 dependent repositories - 43 downloads last month - 33 stars on GitHub - 1 maintainer
twitter-korean 0.1.0.dev522
Python port to the normalizer in https://github.com/twitter/twitter-korean-text2 versions - Latest release: 9 months ago - 2 dependent repositories - 13 downloads last month - 1 maintainer
transformer-embedder 1.7.16
Word level transformer based embeddings52 versions - Latest release: over 2 years ago - 2 dependent repositories - 93 downloads last month - 34 stars on GitHub - 1 maintainer
toktok 0.0.2
Toktok tokenizer2 versions - Latest release: over 5 years ago - 1 dependent repositories - 47 downloads last month - 1 stars on GitHub - 1 maintainer
tokenregex 0.1.14
NLP at your fingertips15 versions - Latest release: over 7 years ago - 1 dependent repositories - 46 downloads last month - 28 stars on GitHub - 1 maintainer
tokenize-output 0.4.10 ๐ฐ
Get identifiers, names, paths, URLs and words from the command output.9 versions - Latest release: about 1 year ago - 1 dependent package - 3 dependent repositories - 166 downloads last month - 6 stars on GitHub - 1 maintainer
thai-tokenizer 0.2.5
Fast and accurate Thai tokenization library.7 versions - Latest release: about 3 years ago - 1 dependent repositories - 8.55 thousand downloads last month - 5 stars on GitHub - 1 maintainer
tglex 0.2.1
Lexical analysis base for telegram bots4 versions - Latest release: about 4 years ago - 1 dependent repositories - 44 downloads last month - 0 stars on GitHub - 1 maintainer
tftokenizers 0.1.8
Use Huggingface Transformer and Tokenizers as Tensorflow Reusable SavedModels.9 versions - Latest release: about 2 years ago - 1 dependent repositories - 122 downloads last month - 5 stars on GitHub - 1 maintainer
Top 6.9% on pypi.org
142 versions - Latest release: 3 months ago - 4 dependent repositories - 984 downloads last month - 272 stars on GitHub - 1 maintainer
text2text 1.4.4
Text2Text: Crosslingual NLP/G toolkit142 versions - Latest release: 3 months ago - 4 dependent repositories - 984 downloads last month - 272 stars on GitHub - 1 maintainer
testasasnkaonlytest 0.1.3
A very basic calculator4 versions - Latest release: about 3 years ago - 1 dependent repositories - 17 downloads last month - 46 stars on GitHub - 1 maintainer
tensorflow-onmttok-ops 0.4.0
OpenNMT Tokenizer as TensorFlow Operations5 versions - Latest release: almost 4 years ago - 1 dependent repositories - 66 downloads last month - 1 maintainer
spag 1.0.0a0
A module containing scanner (regular expression) and parser (BNF) compilers as well as a base gen...1 version - Latest release: over 5 years ago - 1 dependent repositories - 13 downloads last month - 8 stars on GitHub - 1 maintainer
Top 2.5% on pypi.org
31 versions - Latest release: over 4 years ago - 4 dependent packages - 48 dependent repositories - 4.2 thousand downloads last month - 900 stars on GitHub - 1 maintainer
soynlp 0.0.493
Unsupervised Korean Natural Language Processing Toolkits31 versions - Latest release: over 4 years ago - 4 dependent packages - 48 dependent repositories - 4.2 thousand downloads last month - 900 stars on GitHub - 1 maintainer
Top 9.7% on pypi.org
7 versions - Latest release: over 3 years ago - 2 dependent packages - 3 dependent repositories - 472 downloads last month - 46 stars on GitHub - 1 maintainer
sinling 0.3.6
A language processing tool for Sinhalese (เทเทเถเทเถฝ)7 versions - Latest release: over 3 years ago - 2 dependent packages - 3 dependent repositories - 472 downloads last month - 46 stars on GitHub - 1 maintainer
sept 0.4.2
The Simple Extensible Path Template (sept) is a simple to configure templating system designed at...6 versions - Latest release: over 2 years ago - 1 dependent repositories - 37 downloads last month - 8 stars on GitHub - 1 maintainer
separatrice-temp 1.6.4
Separatrice is able to split a text into sentences and a sentence into clauses (russian). See doc...3 versions - Latest release: about 3 years ago - 1 dependent repositories - 16 downloads last month - 0 stars on GitHub - 1 maintainer
separatrice 1.6.2
Separatrice is able to split a text into sentences and a sentence into clauses (russian). See doc...9 versions - Latest release: over 3 years ago - 1 dependent repositories - 99 downloads last month - 0 stars on GitHub - 1 maintainer
sctokenizer 0.0.8
A Source Code Tokenizer8 versions - Latest release: about 1 year ago - 4 dependent repositories - 1.49 thousand downloads last month - 12 stars on GitHub - 1 maintainer
rusyll 0.1.1
Splitting Russian words into phonetic syllables1 version - Latest release: almost 4 years ago - 2 dependent repositories - 50 downloads last month - 6 stars on GitHub - 1 maintainer
re101 0.4.0
A back-pocket regex cookbook10 versions - Latest release: over 5 years ago - 8 dependent repositories - 340 downloads last month - 5 stars on GitHub - 1 maintainer
pytokenizer 1.1.4
A streaming tokenizer.6 versions - Latest release: over 3 years ago - 1 dependent repositories - 72 downloads last month - 0 stars on GitHub - 1 maintainer
Related Keywords
nlp
61
python
42
natural-language-processing
27
tokenization
26
NLP
21
token
13
parser
12
lexer
12
japanese
9
text-processing
9
transformers
9
python3
9
text
9
language
8
nlp-library
8
regex
8
bert
7
transformer
7
language-model
7
word-segmentation
7
sentence
7
pytorch
7
embeddings
7
parsing
6
Tokenizer
6
analyzer
5
korean
5
morphology
5
rust
5
pos-tagging
5
mecab
5
natural language processing
5
dependency-parser
5
tokenize
5
segmentation
5
ai
5
tokeniser
5
natural
5
ner
5
tensorflow
5
machine-translation
4
named-entity-recognition
4
lex
4
bpe
4
tokenisation
4
unicode
4
gpt
4
scanner
4
splitter
4
lexing
4
console
4
processing
4
russian
4
huggingface
4
spacy
4
learning
4
lemmatizer
4
vietnamese-nlp
4
python-library
3
thai
3
preprocess
3
terminal
3
morphological analyzer
3
grammar
3
information-retrieval
3
openai
3
lemmatization
3
deep-learning
3
deep
3
tokens
3
tokenizing
3
machine-learning
3
persian-nlp
3
japanese-language
3
persian
3
word
3
Sentence
3
deep learning
3
xml
3
postagging
3
part-of-speech
3
hacktoberfest
3
postagger
3
cpp
3
phonetics
3
morphological-analysis
3
XML
2
nodejs
2
thai-language
2
corpus
2
twitter-korean-text
2
parse
2
chatbot
2
chinese-word-segmentation
2
text-analysis
2
Korean
2
allennlp
2
Japanese
2
hidden-states
2
parser-generator
2