crates.io "tokenizer" keyword
View the packages on the crates.io package registry that are tagged with the "tokenizer" keyword.
fileql 0.10.0 💰
A tool to run SQL-like query on local files using GitQL SDK10 versions - Latest release: 5 months ago - 9.49 thousand downloads total - 71 stars on GitHub - 1 maintainer
generic_tokenizer 0.1.0
A generic tokenizer that tracks line and column numbers as it goes.1 version - Latest release: 9 months ago - 958 downloads total - 1 stars on GitHub - 1 maintainer
rye-grain 0.0.1
A Python to Rust translator1 version - Latest release: over 2 years ago - 1.38 thousand downloads total - 1 stars on GitHub - 1 maintainer
scnr 0.8.0
Scanner/Lexer with regex patterns and multiple modes13 versions - Latest release: 5 months ago - 15 thousand downloads total - 3 stars on GitHub - 1 maintainer
tokeneer 0.1.0
Another tokenizer crate4 versions - Latest release: 5 months ago - 3.88 thousand downloads total - 1 stars on GitHub - 1 maintainer
Top 6.2% on crates.io
7 versions - Latest release: 8 months ago - 2 dependent packages - 28 dependent repositories - 10.8 million downloads total - 2,771 stars on GitHub - 2 maintainers
logos-codegen 0.15.0 💰
Create ridiculously fast Lexers7 versions - Latest release: 8 months ago - 2 dependent packages - 28 dependent repositories - 10.8 million downloads total - 2,771 stars on GitHub - 2 maintainers
jayce 12.1.0
jayce is a tokenizer 🌌34 versions - Latest release: over 1 year ago - 36.8 thousand downloads total - 1 stars on GitHub - 1 maintainer
rust_transformers 0.2.0
High performance tokenizers for Rust2 versions - Latest release: over 5 years ago - 1 dependent package - 2.83 thousand downloads total - 313 stars on GitHub - 1 maintainer
tekken-rs 0.1.0
Rust implementation of Mistral Tekken tokenizer with audio support1 version - Latest release: 2 days ago - 0 downloads total - 0 stars on GitHub - 1 maintainer
bpe 0.2.1
Fast byte-pair encoding implementation.5 versions - Latest release: 3 months ago - 28.3 thousand downloads total - 71 stars on GitHub - 3 maintainers
bpe-openai 0.3.0
Prebuilt fast byte-pair encoders for OpenAI.4 versions - Latest release: 3 months ago - 33.1 thousand downloads total - 71 stars on GitHub - 3 maintainers
nlpo3 1.4.0
Thai natural language processing library, with Python and Node bindings8 versions - Latest release: 9 months ago - 1 dependent package - 1 dependent repositories - 17.6 thousand downloads total - 35 stars on GitHub - 2 maintainers
logos-cli2 💰
Create ridiculously fast Lexers6 versions - Latest release: 2 days ago - 6.54 thousand downloads total - 2,771 stars on GitHub - 1 maintainer
nlpo3-cli 0.2.0
Command line interface for nlpO3, a Thai natural language processing library3 versions - Latest release: almost 4 years ago - 3.57 thousand downloads total - 35 stars on GitHub - 2 maintainers
text-splitter 0.27.0
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by chara...55 versions - Latest release: about 2 months ago - 5 dependent packages - 1 dependent repositories - 480 thousand downloads total - 455 stars on GitHub - 1 maintainer
tokenizer-lib 1.6.0
Tokenization utilities for building parsers in Rust15 versions - Latest release: about 1 year ago - 2 dependent packages - 1 dependent repositories - 21.6 thousand downloads total - 2 stars on GitHub - 1 maintainer
punkt 1.0.5
An implementation of a Punkt sentence tokenizer8 versions - Latest release: over 6 years ago - 3 dependent packages - 3 dependent repositories - 23.3 thousand downloads total - 37 stars on GitHub - 1 maintainer
bundle_repo 0.6.0 💰
Pack a local or remote Git Repository to XML for LLM Consumption.6 versions - Latest release: 5 months ago - 4.49 thousand downloads total - 22 stars on GitHub - 1 maintainer
tocken 0.1.0 💰
Clustering algorithms.1 version - Latest release: 7 months ago - 2.51 thousand downloads total - 0 stars on GitHub - 1 maintainer
logos-derive2 💰
Create ridiculously fast Lexers6 versions - Latest release: 5 days ago - 1 dependent package - 6.64 thousand downloads total - 2,771 stars on GitHub - 1 maintainer
Top 3.5% on crates.io
51 versions - Latest release: 8 months ago - 7 dependent packages - 539 dependent repositories - 17.6 million downloads total - 2,771 stars on GitHub - 2 maintainers
logos-derive 0.15.0 💰
Create ridiculously fast Lexers51 versions - Latest release: 8 months ago - 7 dependent packages - 539 dependent repositories - 17.6 million downloads total - 2,771 stars on GitHub - 2 maintainers
chat-splitter 0.1.1 💰
Never exceed OpenAI's chat models' maximum number of tokens when using the async_openai Rust crate2 versions - Latest release: almost 2 years ago - 2.5 thousand downloads total - 3 stars on GitHub - 1 maintainer
logos2 💰
Create ridiculously fast Lexers6 versions - Latest release: 6 days ago - 8.04 thousand downloads total - 2,771 stars on GitHub - 1 maintainer
kitoken 0.10.1 💰
Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization2 versions - Latest release: 7 months ago - 6.03 thousand downloads total - 26 stars on GitHub - 1 maintainer
logos-cli 0.15.0 💰
Create ridiculously fast Lexers7 versions - Latest release: 8 months ago - 6.1 thousand downloads total - 2,771 stars on GitHub - 2 maintainers
rustpostal 0.3.0
Rust bindings to libpostal4 versions - Latest release: over 3 years ago - 6.03 thousand downloads total - 14 stars on GitHub - 1 maintainer
Top 5.4% on crates.io
98 versions - Latest release: 27 days ago - 10 dependent packages - 124 dependent repositories - 506 thousand downloads total - 516 stars on GitHub - 4 maintainers
lindera 0.44.1 💰
A morphological analysis library.98 versions - Latest release: 27 days ago - 10 dependent packages - 124 dependent repositories - 506 thousand downloads total - 516 stars on GitHub - 4 maintainers
lindera-ipadic-neologd 0.44.1 💰
A Japanese morphological dictionary for IPADIC NEologd.34 versions - Latest release: 27 days ago - 1 dependent package - 199 thousand downloads total - 516 stars on GitHub - 1 maintainer
Top 5.2% on crates.io
77 versions - Latest release: 27 days ago - 10 dependent packages - 238 dependent repositories - 719 thousand downloads total - 516 stars on GitHub - 4 maintainers
lindera-dictionary 0.44.1 💰
A morphological analysis library.77 versions - Latest release: 27 days ago - 10 dependent packages - 238 dependent repositories - 719 thousand downloads total - 516 stars on GitHub - 4 maintainers
Top 7.2% on crates.io
63 versions - Latest release: 27 days ago - 3 dependent packages - 24 dependent repositories - 475 thousand downloads total - 516 stars on GitHub - 1 maintainer
lindera-ko-dic 0.44.1 💰
A Japanese morphological dictionary for ko-dic.63 versions - Latest release: 27 days ago - 3 dependent packages - 24 dependent repositories - 475 thousand downloads total - 516 stars on GitHub - 1 maintainer
Top 10.0% on crates.io
62 versions - Latest release: 27 days ago - 2 dependent packages - 2 dependent repositories - 314 thousand downloads total - 516 stars on GitHub - 1 maintainer
lindera-cc-cedict 0.44.1 💰
A Japanese morphological dictionary for CC-CEDICT.62 versions - Latest release: 27 days ago - 2 dependent packages - 2 dependent repositories - 314 thousand downloads total - 516 stars on GitHub - 1 maintainer
lindera-cli 0.44.1 💰
A morphological analysis command line interface.97 versions - Latest release: 27 days ago - 91.5 thousand downloads total - 516 stars on GitHub - 4 maintainers
Top 6.1% on crates.io
85 versions - Latest release: 27 days ago - 4 dependent packages - 126 dependent repositories - 540 thousand downloads total - 516 stars on GitHub - 4 maintainers
lindera-ipadic 0.44.1 💰
A Japanese morphological dictionary for IPADIC.85 versions - Latest release: 27 days ago - 4 dependent packages - 126 dependent repositories - 540 thousand downloads total - 516 stars on GitHub - 4 maintainers
Top 8.7% on crates.io
62 versions - Latest release: 27 days ago - 2 dependent packages - 3 dependent repositories - 318 thousand downloads total - 516 stars on GitHub - 1 maintainer
lindera-unidic 0.44.1 💰
A Japanese morphological dictionary for UniDic.62 versions - Latest release: 27 days ago - 2 dependent packages - 3 dependent repositories - 318 thousand downloads total - 516 stars on GitHub - 1 maintainer
tiniestsegmenter 0.3.0
Compact Japanese segmenter4 versions - Latest release: 10 months ago - 4.56 thousand downloads total - 3 stars on GitHub - 1 maintainer
unobtanium-segmenter 0.2.1
A text segmentation toolbox for search applications inspired by charabia and tantivy.4 versions - Latest release: 7 days ago - 620 downloads total - 1 maintainer
bytepiece_rs 0.2.2
The Bytepiece Tokenizer Implemented in Rust7 versions - Latest release: over 1 year ago - 1 dependent package - 10 thousand downloads total - 14 stars on GitHub - 1 maintainer
xxcalc 0.2.1
Embeddable or standalone robust floating-point polynomial calculator4 versions - Latest release: over 8 years ago - 1 dependent repositories - 11.9 thousand downloads total - 13 stars on GitHub - 1 maintainer
Top 6.5% on crates.io
25 versions - Latest release: 6 months ago - 26 dependent packages - 532 dependent repositories - 6.09 million downloads total - 74 stars on GitHub - 3 maintainers
svgtypes 0.15.3
SVG types parser.25 versions - Latest release: 6 months ago - 26 dependent packages - 532 dependent repositories - 6.09 million downloads total - 74 stars on GitHub - 3 maintainers
tuker 0.1.0
A small tokenizer/parser library with an emphasis on usability1 version - Latest release: 10 months ago - 814 downloads total - 1 maintainer
lang_pt 0.1.2
A parser tool to generate recursive descent top down parser.3 versions - Latest release: over 2 years ago - 4.25 thousand downloads total - 6 stars on GitHub - 1 maintainer
erl_tokenize 0.8.1 💰
Erlang source code tokenizer32 versions - Latest release: 5 months ago - 5 dependent packages - 3 dependent repositories - 114 thousand downloads total - 12 stars on GitHub - 1 maintainer
Top 2.8% on crates.io
55 versions - Latest release: 8 months ago - 235 dependent packages - 606 dependent repositories - 17.5 million downloads total - 2,771 stars on GitHub - 2 maintainers
logos 0.15.0 💰
Create ridiculously fast Lexers55 versions - Latest release: 8 months ago - 235 dependent packages - 606 dependent repositories - 17.5 million downloads total - 2,771 stars on GitHub - 2 maintainers
logos-codegen2 💰
Create ridiculously fast Lexers6 versions - Latest release: 8 days ago - 2 dependent packages - 6.67 thousand downloads total - 2,771 stars on GitHub - 1 maintainer
lexxor 0.9.1
A fast, extensible, greedy, single-pass text tokenizer for Rust2 versions - Latest release: 2 months ago - 690 downloads total - 1 stars on GitHub - 1 maintainer
byteforge 0.1.1
A next-generation byte-level transformer with multi-signal patching and SIMD optimization2 versions - Latest release: 15 days ago - 370 downloads total - 1 stars on GitHub - 1 maintainer
svgrtypes 0.44.2
SVG types parser.27 versions - Latest release: about 1 month ago - 2 dependent packages - 25.6 thousand downloads total - 1 maintainer
token-counter 0.1.0
`wc` for tokens: count tokens in files with HF Tokenizers1 version - Latest release: about 1 year ago - 1.12 thousand downloads total - 7 stars on GitHub - 1 maintainer
tantivy-tokenizer-tiny-segmenter 0.3.0
A Japanese tokenizer for Tantivy, based on TinySegmenter.3 versions - Latest release: over 5 years ago - 1 dependent repositories - 4.71 thousand downloads total - 14 stars on GitHub - 1 maintainer
bracoxide 0.1.6
A feature-rich library for brace pattern combination, permutation generation, and error handling.7 versions - Latest release: 3 months ago - 2 dependent packages - 6 dependent repositories - 272 thousand downloads total - 2 stars on GitHub - 1 maintainer
tele_tokenizer 0.2.0
A CSS tokenizer2 versions - Latest release: over 3 years ago - 3 dependent packages - 1 dependent repositories - 3.71 thousand downloads total - 198 stars on GitHub - 1 maintainer
indent_tokenizer 0.4.0
Generate tokens based on indentation4 versions - Latest release: over 7 years ago - 6.28 thousand downloads total - 1 stars on GitHub - 1 maintainer
tinytoken 0.1.4
Library for tokenizing text into words, numbers, symbols, and more, with customizable parsing opt...5 versions - Latest release: 9 months ago - 3.45 thousand downloads total - 0 stars on GitHub - 1 maintainer
regex-lexer 0.2.0
A regex-based lexer (tokenizer)3 versions - Latest release: almost 3 years ago - 3 dependent packages - 4 dependent repositories - 14.9 thousand downloads total - 6 stars on GitHub - 1 maintainer
lox-scanner 0.1.0
lexical scanner for Lox3 versions - Latest release: almost 4 years ago - 3.67 thousand downloads total - 0 stars on GitHub - 1 maintainer
aleph-alpha-tokenizer 0.3.1
A fast implementation of a wordpiece-inspired tokenizer4 versions - Latest release: about 5 years ago - 5.7 thousand downloads total - 6 stars on GitHub - 1 maintainer
regex-lexer-lalrpop 0.3.0
A regex-based lexer (tokenizer)4 versions - Latest release: over 3 years ago - 4.6 thousand downloads total - 0 stars on GitHub - 1 maintainer
tokenizers-enfer 0.21.1
Provides an implementation of today's most used tokenizers, with a focus on performances and vers...3 versions - Latest release: 8 months ago - 3.18 thousand downloads total - 1 maintainer
langbox 0.6.0
A simple framework to build compilers and interpreters9 versions - Latest release: about 1 year ago - 10.4 thousand downloads total - 0 stars on GitHub - 1 maintainer
mini-c-parser 0.12.2
minimal C language lexer & parser & virtual executer from scratch11 versions - Latest release: about 1 year ago - 11.9 thousand downloads total - 12 stars on GitHub - 1 maintainer
c_lexer 0.1.1
C lexer2 versions - Latest release: over 6 years ago - 1 dependent package - 1 dependent repositories - 4.45 thousand downloads total - 7 stars on GitHub - 1 maintainer
pgn-lexer 0.1.1
A lexer for PGN files for chess. Provides an iterator over the tokens from a byte stream.3 versions - Latest release: almost 8 years ago - 4.48 thousand downloads total - 1 stars on GitHub - 1 maintainer
lexerus 0.1.7
Simple annotated lexer8 versions - Latest release: 9 months ago - 6.73 thousand downloads total - 1 stars on GitHub - 1 maintainer
crossandra 0.0.2 💰
A straightforward tokenization library for seamless text processing.2 versions - Latest release: 7 months ago - 1.54 thousand downloads total - 8 stars on GitHub - 1 maintainer
scnr2_macro 0.2.0
Scanner/Lexer with regex patterns and multiple modes2 versions - Latest release: 30 days ago - 423 downloads total - 2 stars on GitHub - 1 maintainer
scnr2 0.2.0
Scanner/Lexer with regex patterns and multiple modes2 versions - Latest release: 30 days ago - 408 downloads total - 2 stars on GitHub - 1 maintainer
scnr2_generate 0.2.0
Scanner/Lexer with regex patterns and multiple modes2 versions - Latest release: 30 days ago - 424 downloads total - 2 stars on GitHub - 1 maintainer
rust-forth-tokenizer 0.2.1
A Forth tokenizer written in Rust.10 versions - Latest release: about 2 months ago - 1 dependent package - 12.8 thousand downloads total - 1 stars on GitHub - 1 maintainer
simple-cursor 0.1.1
A super simple character cursor implementation geared towards lexers/tokenizers.2 versions - Latest release: about 2 years ago - 2.39 thousand downloads total - 0 stars on GitHub - 1 maintainer
alpino-tokenizer-sys 0.2.1
Low-level wrapper around the Alpino tokenizer for Dutch3 versions - Latest release: about 5 years ago - 1 dependent package - 1 dependent repositories - 6.26 thousand downloads total - 3 stars on GitHub - 1 maintainer
regex-tokenizer 0.1.1
A regex tokenizer2 versions - Latest release: over 2 years ago - 2.46 thousand downloads total - 1 stars on GitHub - 2 maintainers
Top 7.0% on crates.io
28 versions - Latest release: about 2 months ago - 3 dependent packages - 33 dependent repositories - 442 thousand downloads total - 299 stars on GitHub - 2 maintainers
charabia 0.9.6
A simple library to detect the language, tokenize the text and normalize the tokens28 versions - Latest release: about 2 months ago - 3 dependent packages - 33 dependent repositories - 442 thousand downloads total - 299 stars on GitHub - 2 maintainers
vaporetto_rules 0.6.5
Rule-base filters for Vaporetto12 versions - Latest release: 4 months ago - 1 dependent package - 1 dependent repositories - 55.8 thousand downloads total - 238 stars on GitHub - 1 maintainer
vaporetto 0.6.5
Vaporetto: a pointwise prediction based tokenizer18 versions - Latest release: 4 months ago - 3 dependent packages - 1 dependent repositories - 113 thousand downloads total - 238 stars on GitHub - 1 maintainer
vaporetto_tantivy 0.24.0
Vaporetto Tokenizer for Tantivy15 versions - Latest release: about 1 month ago - 17.9 thousand downloads total - 238 stars on GitHub - 1 maintainer
Top 6.2% on crates.io
45 versions - Latest release: 4 months ago - 6 dependent packages - 40 dependent repositories - 555 thousand downloads total - 514 stars on GitHub - 4 maintainers
lindera-ko-dic-builder 0.32.3 💰
A Korean morphological dictionary builder for ko-dic.45 versions - Latest release: 4 months ago - 6 dependent packages - 40 dependent repositories - 555 thousand downloads total - 514 stars on GitHub - 4 maintainers
bareun_rs 0.1.0
Bareun is a Korean Morphological analyzer for Rust1 version - Latest release: about 1 year ago - 1.15 thousand downloads total - 0 stars on GitHub - 1 maintainer
pkl-parser 0.8.1
A rust Pkl Parser!2 versions - Latest release: 11 months ago - 1.87 thousand downloads total - 6 stars on GitHub - 1 maintainer
condex 1.0.0 💰
Extract tokens by simple condition expression.1 version - Latest release: over 3 years ago - 1.39 thousand downloads total - 2 stars on GitHub - 1 maintainer
c-lexer-stable 0.1.4
C lexer4 versions - Latest release: over 4 years ago - 41.8 thousand downloads total - 2 stars on GitHub - 1 maintainer
vibrato 0.5.2
Vibrato: viterbi-based accelerated tokenizer12 versions - Latest release: 4 months ago - 1 dependent package - 1 dependent repositories - 37.2 thousand downloads total - 360 stars on GitHub - 2 maintainers
bpetok 0.1.2
A simple CLI for tokenizing text input using Byte Pair Encoding (BPE).3 versions - Latest release: 10 months ago - 2.5 thousand downloads total - 1 maintainer
alpino-tokenizer 0.4.0
Wrapper around the Alpino tokenizer for Dutch4 versions - Latest release: over 1 year ago - 1 dependent package - 2 dependent repositories - 7.49 thousand downloads total - 3 stars on GitHub - 1 maintainer
rustfst 1.2.1
Library for constructing, combining, optimizing, and searching weighted finite-state transducers ...52 versions - Latest release: 16 days ago - 3 dependent packages - 1 dependent repositories - 601 thousand downloads total - 163 stars on GitHub - 1 maintainer
chunk_norris 0.2.1
A Rust library for splitting large text into smaller batches for LLM input.3 versions - Latest release: 6 months ago - 1.76 thousand downloads total - 1 stars on GitHub - 1 maintainer
sana_core 0.1.1
The core of Sana2 versions - Latest release: almost 5 years ago - 2 dependent packages - 1 dependent repositories - 4.09 thousand downloads total - 1 maintainer
Top 7.9% on crates.io
11 versions - Latest release: about 1 year ago - 4 dependent packages - 20 dependent repositories - 56.4 thousand downloads total - 412 stars on GitHub - 1 maintainer
plex 0.3.1
A syntax extension for writing lexers and parsers.11 versions - Latest release: about 1 year ago - 4 dependent packages - 20 dependent repositories - 56.4 thousand downloads total - 412 stars on GitHub - 1 maintainer
rustfst-ffi 1.1.2
Library for constructing, combining, optimizing, and searching weighted finite-state transducers ...16 versions - Latest release: 11 months ago - 16.4 thousand downloads total - 163 stars on GitHub - 1 maintainer
luther-derive 0.1.0
The proc macro generator for the Luther lexer generator.1 version - Latest release: about 7 years ago - 2.55 thousand downloads total - 5 stars on GitHub - 1 maintainer
smoltoken 0.2.0
A fast library for Byte Pair Encoding (BPE) tokenization.5 versions - Latest release: 4 months ago - 2.69 thousand downloads total - 7 stars on GitHub - 1 maintainer
blingfire 1.0.0
Wrapper for the BlingFire tokenization library5 versions - Latest release: about 5 years ago - 84.5 thousand downloads total - 15 stars on GitHub - 1 maintainer
svgparser 0.8.1
Featureful, pull-based, zero-allocation SVG parser.21 versions - Latest release: over 7 years ago - 4 dependent packages - 98 dependent repositories - 194 thousand downloads total - 22 stars on GitHub - 1 maintainer
lexers 0.1.4
Tools for tokenizing and scanning11 versions - Latest release: over 3 years ago - 8 dependent packages - 4 dependent repositories - 38.3 thousand downloads total - 66 stars on GitHub - 1 maintainer
parsit 0.2.0
very simple lib, the parsing combinators, recursive descendent that uses logos as lexer17 versions - Latest release: about 2 years ago - 1 dependent package - 1 dependent repositories - 19.4 thousand downloads total - 7 stars on GitHub - 1 maintainer
html5tokenizer 0.5.2
An HTML5 tokenizer with code span support.7 versions - Latest release: almost 2 years ago - 1 dependent repositories - 8.37 thousand downloads total - 1 maintainer
char-lex 1.0.5
Create easy enum based lexers11 versions - Latest release: about 5 years ago - 13 thousand downloads total - 1 maintainer
Top 6.2% on crates.io
40 versions - Latest release: 4 months ago - 6 dependent packages - 40 dependent repositories - 548 thousand downloads total - 375 stars on GitHub - 1 maintainer
lindera-cc-cedict-builder 0.32.3 💰
A Chinese morphological dictionary builder for CC-CEDICT.40 versions - Latest release: 4 months ago - 6 dependent packages - 40 dependent repositories - 548 thousand downloads total - 375 stars on GitHub - 1 maintainer
bytepiece 0.2.0
Rust version of bytepiece tokenizer2 versions - Latest release: almost 2 years ago - 2.34 thousand downloads total - 12 stars on GitHub - 1 maintainer
fuzzy-pickles 0.1.1
A low-level parser of Rust source code with high-level visitor implementations2 versions - Latest release: almost 5 years ago - 1 dependent repositories - 3.5 thousand downloads total - 7 stars on GitHub - 1 maintainer
rustrawi 0.1.2 💰
Rust port of the original PHP Sastrawi3 versions - Latest release: over 2 years ago - 3.14 thousand downloads total - 0 stars on GitHub - 1 maintainer
Related Keywords
lexer
54
parser
48
rust
43
nlp
34
analyzer
26
morphological
24
library
23
multilingual
21
parsing
18
scanner
17
japanese
14
bpe
13
dictionary
12
lexical
11
token
10
tokenization
10
text
10
lexer-generator
9
analysis
9
ai
8
no_std
8
python
7
machine-learning
6
generator
6
tantivy
6
builder
6
sql
6
cli
6
segmentation
5
llm
5
rust-lang
5
openai
4
wordpiece
4
ipadic
4
sentence
4
language
4
deep-learning
4
split
4
text-processing
4
regex
4
natural-language-processing
4
morphological-analysis
4
tokeniser
3
lex
3
alpino
3
parser-generator
3
stemmer
3
dutch
3
html
3
korean
3
compiler
3
chinese
3
unidic
3
segmenter
3
c
3
svg
3
gpt
3
transformer
3
nodejs
3
rust-crate
3
encoding
3
thai
3
word-segmentation
3
splitter
2
css
2
acceptor
2
fst
2
graph
2
thai-language
2
shortest-path
2
transducer
2
asr
2
automata
2
javascript
2
composition
2
finite-state-acceptors
2
tiktoken
2
indentation
2
processing
2
crate
2
huggingface
2
apple
2
hacktoberfest
2
pkl
2
virtual-machine
2
algorithm
2
lalr
2
sql-parser
2
lexing
2
sqlite
2
string
2
dfa
2
chatgpt
2
sentencepiece
2
word
2
blingfire
2
rust-wrapper
2
html5
2
whatwg
2
neologd
2