crates.io "tokenization" keyword
View the packages on the crates.io package registry that are tagged with the "tokenization" keyword.
colorblast-cli 0.0.1
Syntax highlighting CLI for various programming languages, markup languages and various other for...1 version - Latest release: about 2 years ago - 1.3 thousand downloads total - 0 stars on GitHub - 1 maintainer
wordpieces 0.6.1
Split tokens into word pieces10 versions - Latest release: almost 3 years ago - 3 dependent packages - 3 dependent repositories - 19.7 thousand downloads total - 5 stars on GitHub - 1 maintainer
classi-cine 0.4.2
A tool that builds smart video playlists by learning your preferences through Bayesian classifica...10 versions - Latest release: 2 months ago - 9.93 thousand downloads total - 4 stars on GitHub - 1 maintainer
tokenizer-lib 1.6.0
Tokenization utilities for building parsers in Rust15 versions - Latest release: about 1 year ago - 2 dependent packages - 1 dependent repositories - 21.6 thousand downloads total - 2 stars on GitHub - 1 maintainer
build-trie 0.1.1
Procedural macro for generating match and state code representing a trie structure2 versions - Latest release: over 4 years ago - 2.69 thousand downloads total - 3 stars on GitHub - 1 maintainer
libtqsm 0.6.1
Sentence segmenter that supports ~300 languages1 version - Latest release: over 1 year ago - 1 dependent package - 1.75 thousand downloads total - 2 stars on GitHub - 1 maintainer
crossandra 0.0.2 💰
A straightforward tokenization library for seamless text processing.2 versions - Latest release: 7 months ago - 1.54 thousand downloads total - 8 stars on GitHub - 1 maintainer
sentence 0.0.2
Sentence tokenizes English language sentences for use in TTS applications.2 versions - Latest release: about 5 years ago - 2.66 thousand downloads total - 2 stars on GitHub - 1 maintainer
vaporetto_rules 0.6.5
Rule-base filters for Vaporetto12 versions - Latest release: 4 months ago - 1 dependent package - 1 dependent repositories - 55.8 thousand downloads total - 238 stars on GitHub - 1 maintainer
vaporetto 0.6.5
Vaporetto: a pointwise prediction based tokenizer18 versions - Latest release: 4 months ago - 3 dependent packages - 1 dependent repositories - 113 thousand downloads total - 238 stars on GitHub - 1 maintainer
vaporetto_tantivy 0.24.0
Vaporetto Tokenizer for Tantivy15 versions - Latest release: about 1 month ago - 17.9 thousand downloads total - 238 stars on GitHub - 1 maintainer
vibrato 0.5.2
Vibrato: viterbi-based accelerated tokenizer12 versions - Latest release: 4 months ago - 1 dependent package - 1 dependent repositories - 37.2 thousand downloads total - 360 stars on GitHub - 2 maintainers
agrocrypto-core 0.1.0
The core engine of AgroCrypto: a blockchain-native asset tokenization and settlement layer.1 version - Latest release: 4 months ago - 530 downloads total - 1 maintainer
bpetok 0.1.2
A simple CLI for tokenizing text input using Byte Pair Encoding (BPE).3 versions - Latest release: 10 months ago - 2.5 thousand downloads total - 1 maintainer
strizer 0.1.0
minimal and fast library for text tokenization1 version - Latest release: over 4 years ago - 1.69 thousand downloads total - 1 stars on GitHub - 1 maintainer
rustrawi 0.1.2 💰
Rust port of the original PHP Sastrawi3 versions - Latest release: over 2 years ago - 3.14 thousand downloads total - 0 stars on GitHub - 1 maintainer
fern-tokenization 0.0.0
Empty crate, used only to reserve the name.1 version - Latest release: almost 3 years ago - 1.41 thousand downloads total - 14 stars on GitHub - 1 maintainer
tuck5 0.2.0
A pragmatic lexer/parser generator4 versions - Latest release: over 1 year ago - 4.2 thousand downloads total - 0 stars on GitHub - 1 maintainer
pretok 0.1.0
A string pre-tokenizer for C-like syntaxes.1 version - Latest release: almost 5 years ago - 1 dependent repositories - 1.43 thousand downloads total - 0 stars on GitHub - 1 maintainer
blex 0.2.2
A lightweight lexing framework4 versions - Latest release: over 2 years ago - 1 dependent package - 4.83 thousand downloads total - 0 stars on GitHub - 1 maintainer
derive-finite-automaton 0.3.0
Procedural macro for generating finite automaton6 versions - Latest release: 25 days ago - 1 dependent package - 1 dependent repositories - 10.7 thousand downloads total - 2 stars on GitHub - 1 maintainer
derive-finite-automaton-derive 0.3.0
Procedural macro for generating finite automaton6 versions - Latest release: 25 days ago - 1 dependent package - 1 dependent repositories - 11.1 thousand downloads total - 2 stars on GitHub - 1 maintainer
unscanny 0.1.0 💰
Painless string scanning.1 version - Latest release: over 3 years ago - 8 dependent packages - 28 dependent repositories - 3.48 million downloads total - 55 stars on GitHub - 1 maintainer
any-lexer 0.0.3
Lexers for various programming languages and formats3 versions - Latest release: about 2 years ago - 2 dependent packages - 5.12 thousand downloads total - 0 stars on GitHub - 1 maintainer
colorblast 0.0.3
Syntax highlighting library for various programming languages, markup languages and various other...3 versions - Latest release: about 2 years ago - 1 dependent package - 3.68 thousand downloads total - 0 stars on GitHub - 1 maintainer
vtext 0.2.0
NLP with Rust4 versions - Latest release: about 5 years ago - 3 dependent repositories - 12.1 thousand downloads total - 150 stars on GitHub - 1 maintainer
text-scanner 0.0.3
A UTF-8 char-oriented, zero-copy, text and code scanning library3 versions - Latest release: about 2 years ago - 1 dependent package - 4.87 thousand downloads total - 0 stars on GitHub - 1 maintainer
Related Keywords
tokenizer
10
rust
10
parsing
9
lexer
8
nlp
7
utils
4
analyzer
4
japanese
4
text-scanner
4
syntax-highlighting
4
render-code
4
lexers
4
html-syntax-highlighter
4
code2image
4
morphological-analysis
4
segmentation
4
morphological
3
text
3
token
3
lex
2
text-processing
2
streaming
2
syntax
2
highlighting
2
highlighter
2
bpe
1
cli
1
stem
1
sastrawi
1
stopword
1
tf-idf
1
information-retrieval
1
indonesian
1
stemmer
1
stopwords
1
database
1
fern
1
bag-of-words
1
proxy
1
tfidf
1
privacy
1
parser
1
levenshtein
1
parse
1
tokenizing
1
lexical-analysis
1
scanning
1
wordpiece
1
piece
1
word
1
bayes
1
classification
1
playlist
1
vlc
1
bayesian-inference
1
http
1
naive-bayes-classifier
1
reqwest
1
serde-json
1
rust-crate
1
parser-combinators
1
proc-macro
1
ml
1
lexing
1
regex
1
sentence
1
english
1
tts
1
tantivy
1
blockchain
1
carbon-credits
1
settlement
1
web3
1