crates.io "tokenizer" keyword
View the packages on the crates.io package registry that are tagged with the "tokenizer" keyword.
tokenise 0.1.0
A flexible tokeniser library for parsing text1 version - Latest release: 6 months ago - 547 downloads total - 0 stars on GitHub - 1 maintainer
lindera-cli 1.1.2 💰
A morphological analysis command line interface.102 versions - Latest release: 1 day ago - 99 thousand downloads total - 531 stars on GitHub - 4 maintainers
Top 7.2% on crates.io
68 versions - Latest release: 1 day ago - 3 dependent packages - 24 dependent repositories - 547 thousand downloads total - 531 stars on GitHub - 1 maintainer
lindera-ko-dic 1.1.2 💰
A Korean morphological dictionary for Ko-Dic.68 versions - Latest release: 1 day ago - 3 dependent packages - 24 dependent repositories - 547 thousand downloads total - 531 stars on GitHub - 1 maintainer
Top 5.4% on crates.io
103 versions - Latest release: 1 day ago - 10 dependent packages - 124 dependent repositories - 582 thousand downloads total - 531 stars on GitHub - 4 maintainers
lindera 1.1.2 💰
A morphological analysis library.103 versions - Latest release: 1 day ago - 10 dependent packages - 124 dependent repositories - 582 thousand downloads total - 531 stars on GitHub - 4 maintainers
Top 8.7% on crates.io
67 versions - Latest release: 1 day ago - 2 dependent packages - 3 dependent repositories - 384 thousand downloads total - 531 stars on GitHub - 1 maintainer
lindera-unidic 1.1.2 💰
A Japanese morphological dictionary for UniDic.67 versions - Latest release: 1 day ago - 2 dependent packages - 3 dependent repositories - 384 thousand downloads total - 531 stars on GitHub - 1 maintainer
sqlite3-parser 0.15.0
SQL parser (as understood by SQLite)14 versions - Latest release: 3 months ago - 3 dependent packages - 2 dependent repositories - 2.1 million downloads total - 54 stars on GitHub - 1 maintainer
turso_sqlite3_parser 0.1.4
SQL parser (as understood by SQLite)7 versions - Latest release: 18 days ago - 3.87 thousand downloads total - 54 stars on GitHub - 1 maintainer
limbo_sqlite3_parser 0.0.22
SQL parser (as understood by SQLite)6 versions - Latest release: 3 months ago - 3.92 thousand downloads total - 54 stars on GitHub - 1 maintainer
bundle_repo 0.6.0 💰
Pack a local or remote Git Repository to XML for LLM Consumption.6 versions - Latest release: 6 months ago - 5.01 thousand downloads total - 22 stars on GitHub - 1 maintainer
tokengeex 1.1.0
TokenGeeX is an efficient tokenizer for code based on UnigramLM and TokenMonster.11 versions - Latest release: over 1 year ago - 12.8 thousand downloads total - 4 stars on GitHub - 1 maintainer
erl_tokenize 0.8.3 💰
Erlang source code tokenizer34 versions - Latest release: about 1 month ago - 5 dependent packages - 3 dependent repositories - 121 thousand downloads total - 12 stars on GitHub - 1 maintainer
ast-rs 0.0.1
AST Toolkit for Rust1 version - Latest release: about 3 years ago - 1.6 thousand downloads total - 0 stars on GitHub - 1 maintainer
lindera-ipadic-neologd 1.1.2 💰
A Japanese morphological dictionary for IPADIC NEologd.39 versions - Latest release: 1 day ago - 1 dependent package - 262 thousand downloads total - 530 stars on GitHub - 1 maintainer
Top 10.0% on crates.io
67 versions - Latest release: 1 day ago - 2 dependent packages - 2 dependent repositories - 384 thousand downloads total - 530 stars on GitHub - 1 maintainer
lindera-cc-cedict 1.1.2 💰
A Japanese morphological dictionary for CC-CEDICT.67 versions - Latest release: 1 day ago - 2 dependent packages - 2 dependent repositories - 384 thousand downloads total - 530 stars on GitHub - 1 maintainer
Top 6.1% on crates.io
90 versions - Latest release: 1 day ago - 4 dependent packages - 126 dependent repositories - 616 thousand downloads total - 530 stars on GitHub - 4 maintainers
lindera-ipadic 1.1.2 💰
A Japanese morphological dictionary for IPADIC.90 versions - Latest release: 1 day ago - 4 dependent packages - 126 dependent repositories - 616 thousand downloads total - 530 stars on GitHub - 4 maintainers
Top 5.2% on crates.io
82 versions - Latest release: 1 day ago - 10 dependent packages - 238 dependent repositories - 798 thousand downloads total - 530 stars on GitHub - 4 maintainers
lindera-dictionary 1.1.2 💰
A morphological analysis library.82 versions - Latest release: 1 day ago - 10 dependent packages - 238 dependent repositories - 798 thousand downloads total - 530 stars on GitHub - 4 maintainers
sentience-tokenize 0.2.3 💰
Tiny Rust zero-dep tokenizer (ident, number, string, parens, operators, keywords).8 versions - Latest release: 9 days ago - 1.53 thousand downloads total - 0 stars on GitHub - 1 maintainer
pkl_fast 0.1.1
A library aiming to easily and efficiently work with Apple's PKL format.2 versions - Latest release: over 1 year ago - 2.62 thousand downloads total - 6 stars on GitHub - 1 maintainer
sentencepiece-model 0.1.4 💰
SentencePiece model parser generated from the SentencePiece protobuf definition5 versions - Latest release: 11 months ago - 31.4 thousand downloads total - 0 stars on GitHub - 1 maintainer
better_peekable 0.2.4
Create a Peekable structure like Rust's Peekable except allowing for peeking n items ahead7 versions - Latest release: over 3 years ago - 1 dependent repositories - 9.03 thousand downloads total - 1 stars on GitHub - 1 maintainer
mini-c-parser 0.12.2
minimal C language lexer & parser & virtual executer from scratch11 versions - Latest release: about 1 year ago - 12.8 thousand downloads total - 12 stars on GitHub - 1 maintainer
Top 2.5% on crates.io
35 versions - Latest release: 3 months ago - 60 dependent packages - 281 dependent repositories - 3.42 million downloads total - 10,054 stars on GitHub - 3 maintainers
tokenizers 0.21.2
Provides an implementation of today's most used tokenizers, with a focus on performances and vers...35 versions - Latest release: 3 months ago - 60 dependent packages - 281 dependent repositories - 3.42 million downloads total - 10,054 stars on GitHub - 3 maintainers
nlpo3 1.4.0
Thai natural language processing library, with Python and Node bindings8 versions - Latest release: 10 months ago - 1 dependent package - 1 dependent repositories - 20.1 thousand downloads total - 35 stars on GitHub - 2 maintainers
nlpo3-cli 0.2.0
Command line interface for nlpO3, a Thai natural language processing library3 versions - Latest release: about 4 years ago - 3.79 thousand downloads total - 35 stars on GitHub - 2 maintainers
lindera-dictionary-builder 0.32.3 💰
Shared code for building Lindera dictionary files4 versions - Latest release: 6 months ago - 108 thousand downloads total - 530 stars on GitHub - 1 maintainer
nipah_tokenizer 0.1.0
A powerful yet simple text tokenizer for your everyday needs!1 version - Latest release: over 2 years ago - 1.47 thousand downloads total - 0 stars on GitHub - 1 maintainer
text-splitter 0.27.0
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by chara...55 versions - Latest release: 3 months ago - 5 dependent packages - 1 dependent repositories - 559 thousand downloads total - 473 stars on GitHub - 1 maintainer
lexariel 0.1.0
Lexical analyzer for Asmodeus language1 version - Latest release: 2 months ago - 431 downloads total - 1 stars on GitHub - 1 maintainer
sql-script-parser 0.1.2 💰
sql-script-parser iterates over SQL statements in SQL script.3 versions - Latest release: over 4 years ago - 4.38 thousand downloads total - 2 stars on GitHub - 1 maintainer
scnr 0.8.0
Scanner/Lexer with regex patterns and multiple modes13 versions - Latest release: 7 months ago - 16.6 thousand downloads total - 3 stars on GitHub - 1 maintainer
sana 0.1.1
Create lexers easily2 versions - Latest release: about 5 years ago - 1 dependent repositories - 3.24 thousand downloads total - 1 maintainer
bpe 0.2.1
Fast byte-pair encoding implementation.5 versions - Latest release: 4 months ago - 35.9 thousand downloads total - 79 stars on GitHub - 3 maintainers
bpe-openai 0.3.0
Prebuilt fast byte-pair encoders for OpenAI.4 versions - Latest release: 4 months ago - 40.4 thousand downloads total - 79 stars on GitHub - 3 maintainers
rustpotion 0.3.0
Blazingly fast word embeddings with Tokenlearn3 versions - Latest release: 9 months ago - 2.27 thousand downloads total - 4 stars on GitHub - 1 maintainer
lindera-assets 0.32.3 💰
A helper crate to fetch assets and build dictionary for lindera.2 versions - Latest release: 6 months ago - 93.4 thousand downloads total - 530 stars on GitHub - 1 maintainer
bracoxide 0.1.6
A feature-rich library for brace pattern combination, permutation generation, and error handling.7 versions - Latest release: 4 months ago - 2 dependent packages - 6 dependent repositories - 345 thousand downloads total - 2 stars on GitHub - 1 maintainer
segtok 0.1.5
Sentence segmentation and word tokenization tools6 versions - Latest release: 7 months ago - 5.14 thousand downloads total - 2 stars on GitHub - 1 maintainer
noa-parser 0.7.4
Noa parser is an extensible general purpose framework parser allowing to parser any type of data ...12 versions - Latest release: 3 months ago - 4.22 thousand downloads total - 3 stars on GitHub - 1 maintainer
elyze 1.5.5
Elyze is an extensible general purpose framework parser allowing to parser any type of data witho...19 versions - Latest release: about 1 month ago - 7.02 thousand downloads total - 0 stars on GitHub - 1 maintainer
code-splitter 0.1.5
Split code into semantic chunks using tree-sitter5 versions - Latest release: 12 months ago - 4.83 thousand downloads total - 3 stars on GitHub - 1 maintainer
cang-jie 0.18.0
A Chinese tokenizer for tantivy20 versions - Latest release: almost 2 years ago - 6 dependent packages - 13 dependent repositories - 45.5 thousand downloads total - 80 stars on GitHub - 1 maintainer
tokenizers-enfer 0.21.1
Provides an implementation of today's most used tokenizers, with a focus on performances and vers...3 versions - Latest release: 9 months ago - 3.41 thousand downloads total - 1 maintainer
xlex-lexer 0.0.1
Fast and composable lexer for Rust1 version - Latest release: 4 months ago - 414 downloads total - 0 stars on GitHub - 1 maintainer
rust_transformers 0.2.0
High performance tokenizers for Rust2 versions - Latest release: over 5 years ago - 1 dependent package - 2.94 thousand downloads total - 323 stars on GitHub - 1 maintainer
Top 6.3% on crates.io
34 versions - Latest release: almost 2 years ago - 11 dependent packages - 225 dependent repositories - 304 thousand downloads total - 323 stars on GitHub - 1 maintainer
rust_tokenizers 8.1.1
High performance tokenizers for Rust34 versions - Latest release: almost 2 years ago - 11 dependent packages - 225 dependent repositories - 304 thousand downloads total - 323 stars on GitHub - 1 maintainer
tokeneer 0.1.0
Another tokenizer crate4 versions - Latest release: 7 months ago - 4.14 thousand downloads total - 1 stars on GitHub - 1 maintainer
another-tiktoken-rs 0.1.2
Library for encoding and decoding with the tiktoken library in Rust3 versions - Latest release: about 2 years ago - 4.92 thousand downloads total - 325 stars on GitHub - 1 maintainer
Top 7.2% on crates.io
30 versions - Latest release: 4 months ago - 39 dependent packages - 73 dependent repositories - 1.87 million downloads total - 325 stars on GitHub - 1 maintainer
tiktoken-rs 0.7.0
Library for encoding and decoding with the tiktoken library in Rust30 versions - Latest release: 4 months ago - 39 dependent packages - 73 dependent repositories - 1.87 million downloads total - 325 stars on GitHub - 1 maintainer
fileql 0.10.0 💰
A tool to run SQL-like query on local files using GitQL SDK10 versions - Latest release: 7 months ago - 9.99 thousand downloads total - 71 stars on GitHub - 1 maintainer
tekken-rs 0.1.1
Rust implementation of Mistral Tekken tokenizer with audio support2 versions - Latest release: about 1 month ago - 836 downloads total - 5 stars on GitHub - 1 maintainer
marqant 0.2.0
Quantum-compressed markdown format for AI consumption with 90% token reduction4 versions - Latest release: 23 days ago - 840 downloads total - 0 stars on GitHub - 1 maintainer
Top 5.7% on crates.io
39 versions - Latest release: 6 months ago - 11 dependent packages - 41 dependent repositories - 567 thousand downloads total - 514 stars on GitHub - 1 maintainer
lindera-decompress 0.32.3 💰
A morphological analysis library.39 versions - Latest release: 6 months ago - 11 dependent packages - 41 dependent repositories - 567 thousand downloads total - 514 stars on GitHub - 1 maintainer
lindera-tantivy 1.0.0 💰
Lindera Tokenizer for Tantivy.52 versions - Latest release: 10 days ago - 5 dependent packages - 7 dependent repositories - 103 thousand downloads total - 60 stars on GitHub - 4 maintainers
indentation_flattener 0.1.0
From indented input, generate plain output with indentation PUSH and POP codes.1 version - Latest release: over 8 years ago - 2.04 thousand downloads total - 0 stars on GitHub - 1 maintainer
vaporetto_rules 0.6.5
Rule-base filters for Vaporetto12 versions - Latest release: 6 months ago - 1 dependent package - 1 dependent repositories - 60 thousand downloads total - 243 stars on GitHub - 1 maintainer
vaporetto 0.6.5
Vaporetto: a pointwise prediction based tokenizer18 versions - Latest release: 6 months ago - 3 dependent packages - 1 dependent repositories - 124 thousand downloads total - 243 stars on GitHub - 1 maintainer
vaporetto_tantivy 0.24.0
Vaporetto Tokenizer for Tantivy15 versions - Latest release: 3 months ago - 21.2 thousand downloads total - 243 stars on GitHub - 1 maintainer
Top 7.0% on crates.io
29 versions - Latest release: 17 days ago - 3 dependent packages - 33 dependent repositories - 475 thousand downloads total - 308 stars on GitHub - 2 maintainers
charabia 0.9.7
A simple library to detect the language, tokenize the text and normalize the tokens29 versions - Latest release: 17 days ago - 3 dependent packages - 33 dependent repositories - 475 thousand downloads total - 308 stars on GitHub - 2 maintainers
Top 8.0% on crates.io
18 versions - Latest release: 6 months ago - 3 dependent packages - 3 dependent repositories - 373 thousand downloads total - 527 stars on GitHub - 4 maintainers
lindera-ipadic-neologd-builder 0.32.3 💰
A Japanese morphological dictionary builder for IPADIC NEologd.18 versions - Latest release: 6 months ago - 3 dependent packages - 3 dependent repositories - 373 thousand downloads total - 527 stars on GitHub - 4 maintainers
punkt 1.0.5
An implementation of a Punkt sentence tokenizer8 versions - Latest release: over 6 years ago - 3 dependent packages - 3 dependent repositories - 24 thousand downloads total - 38 stars on GitHub - 1 maintainer
tokenizer-lib 1.6.0
Tokenization utilities for building parsers in Rust15 versions - Latest release: over 1 year ago - 2 dependent packages - 1 dependent repositories - 22.6 thousand downloads total - 2 stars on GitHub - 1 maintainer
rye-grain 0.0.1
A Python to Rust translator1 version - Latest release: almost 3 years ago - 1.44 thousand downloads total - 1 stars on GitHub - 1 maintainer
jayce 12.1.0
jayce is a tokenizer 🌌34 versions - Latest release: over 1 year ago - 38.5 thousand downloads total - 1 stars on GitHub - 1 maintainer
octofhir-fhirpath-parser 0.4.18
Parser and tokenizer for FHIRPath expressions17 versions - Latest release: 11 days ago - 2.89 thousand downloads total - 15 stars on GitHub - 1 maintainer
unobtanium-segmenter 0.2.1
A text segmentation toolbox for search applications inspired by charabia and tantivy.4 versions - Latest release: about 2 months ago - 1.2 thousand downloads total - 1 maintainer
lang_pt 0.1.2
A parser tool to generate recursive descent top down parser.3 versions - Latest release: over 2 years ago - 4.53 thousand downloads total - 6 stars on GitHub - 1 maintainer
bareun_rs 0.1.0
Bareun is a Korean Morphological analyzer for Rust1 version - Latest release: over 1 year ago - 1.25 thousand downloads total - 0 stars on GitHub - 1 maintainer
kitoken 0.10.1 💰
Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization2 versions - Latest release: 9 months ago - 7 thousand downloads total - 29 stars on GitHub - 1 maintainer
rustfst 1.2.6
Library for constructing, combining, optimizing, and searching weighted finite-state transducers ...57 versions - Latest release: about 2 months ago - 3 dependent packages - 1 dependent repositories - 619 thousand downloads total - 167 stars on GitHub - 1 maintainer
sqlite-simple-tokenizer 0.2.1
This's a run-time loadable extension of SQLite fts5, supports Chinese and pinyin word segmentatio...3 versions - Latest release: 14 days ago - 259 downloads total - 0 stars on GitHub - 1 maintainer
bytepiece_rs 0.2.2
The Bytepiece Tokenizer Implemented in Rust7 versions - Latest release: almost 2 years ago - 1 dependent package - 10.5 thousand downloads total - 14 stars on GitHub - 1 maintainer
tuker 0.1.0
A small tokenizer/parser library with an emphasis on usability1 version - Latest release: 11 months ago - 897 downloads total - 1 maintainer
chat-splitter 0.1.1 💰
Never exceed OpenAI's chat models' maximum number of tokens when using the async_openai Rust crate2 versions - Latest release: about 2 years ago - 2.6 thousand downloads total - 3 stars on GitHub - 1 maintainer
lindera-analyzer 0.32.3 💰
A morphological analysis library.12 versions - Latest release: 6 months ago - 2 dependent packages - 1 dependent repositories - 134 thousand downloads total - 527 stars on GitHub - 1 maintainer
tocken 0.1.0 💰
Clustering algorithms.1 version - Latest release: 8 months ago - 2.6 thousand downloads total - 0 stars on GitHub - 1 maintainer
lexxor 0.9.1
A fast, extensible, greedy, single-pass text tokenizer for Rust2 versions - Latest release: 4 months ago - 822 downloads total - 1 stars on GitHub - 1 maintainer
tiktokenx 0.1.0
A high-performance Rust implementation of OpenAI's tiktoken library1 version - Latest release: 16 days ago - 0 downloads total - 0 stars on GitHub - 1 maintainer
Top 2.8% on crates.io
56 versions - Latest release: about 1 month ago - 235 dependent packages - 606 dependent repositories - 19 million downloads total - 2,771 stars on GitHub - 2 maintainers
logos 0.15.1 💰
Create ridiculously fast Lexers56 versions - Latest release: about 1 month ago - 235 dependent packages - 606 dependent repositories - 19 million downloads total - 2,771 stars on GitHub - 2 maintainers
Top 3.5% on crates.io
52 versions - Latest release: about 1 month ago - 7 dependent packages - 539 dependent repositories - 19 million downloads total - 2,771 stars on GitHub - 2 maintainers
logos-derive 0.15.1 💰
Create ridiculously fast Lexers52 versions - Latest release: about 1 month ago - 7 dependent packages - 539 dependent repositories - 19 million downloads total - 2,771 stars on GitHub - 2 maintainers
logos2 💰
Create ridiculously fast Lexers6 versions - Latest release: 16 days ago - 8.41 thousand downloads total - 2,771 stars on GitHub - 1 maintainer
logos-derive2 💰
Create ridiculously fast Lexers6 versions - Latest release: 16 days ago - 1 dependent package - 6.9 thousand downloads total - 2,771 stars on GitHub - 1 maintainer
logos-cli 0.15.1 💰
Create ridiculously fast Lexers8 versions - Latest release: about 1 month ago - 6.6 thousand downloads total - 2,771 stars on GitHub - 2 maintainers
logos-cli2 💰
Create ridiculously fast Lexers6 versions - Latest release: 16 days ago - 6.74 thousand downloads total - 2,771 stars on GitHub - 1 maintainer
logos-codegen2 💰
Create ridiculously fast Lexers6 versions - Latest release: 16 days ago - 2 dependent packages - 6.94 thousand downloads total - 2,771 stars on GitHub - 1 maintainer
Top 6.2% on crates.io
8 versions - Latest release: about 1 month ago - 2 dependent packages - 28 dependent repositories - 11.7 million downloads total - 2,771 stars on GitHub - 2 maintainers
logos-codegen 0.15.1 💰
Create ridiculously fast Lexers8 versions - Latest release: about 1 month ago - 2 dependent packages - 28 dependent repositories - 11.7 million downloads total - 2,771 stars on GitHub - 2 maintainers
bytepiece 0.2.0
Rust version of bytepiece tokenizer2 versions - Latest release: almost 2 years ago - 2.52 thousand downloads total - 12 stars on GitHub - 1 maintainer
generic_tokenizer 0.1.0
A generic tokenizer that tracks line and column numbers as it goes.1 version - Latest release: 11 months ago - 1.03 thousand downloads total - 0 stars on GitHub - 1 maintainer
indent_tokenizer 0.4.0
Generate tokens based on indentation4 versions - Latest release: over 7 years ago - 6.48 thousand downloads total - 1 stars on GitHub - 1 maintainer
rustpostal 0.3.0
Rust bindings to libpostal4 versions - Latest release: over 3 years ago - 6.45 thousand downloads total - 14 stars on GitHub - 1 maintainer
Top 4.8% on crates.io
24 versions - Latest release: almost 2 years ago - 35 dependent packages - 2,453 dependent repositories - 45.6 million downloads total - 135 stars on GitHub - 2 maintainers
xmlparser 0.13.6
Pull-based, zero-allocation XML parser.24 versions - Latest release: almost 2 years ago - 35 dependent packages - 2,453 dependent repositories - 45.6 million downloads total - 135 stars on GitHub - 2 maintainers
tiniestsegmenter 0.3.0
Compact Japanese segmenter4 versions - Latest release: 12 months ago - 4.77 thousand downloads total - 3 stars on GitHub - 1 maintainer
xxcalc 0.2.1
Embeddable or standalone robust floating-point polynomial calculator4 versions - Latest release: almost 9 years ago - 1 dependent repositories - 12.2 thousand downloads total - 13 stars on GitHub - 1 maintainer
Top 6.5% on crates.io
25 versions - Latest release: 8 months ago - 26 dependent packages - 532 dependent repositories - 6.56 million downloads total - 77 stars on GitHub - 3 maintainers
svgtypes 0.15.3
SVG types parser.25 versions - Latest release: 8 months ago - 26 dependent packages - 532 dependent repositories - 6.56 million downloads total - 77 stars on GitHub - 3 maintainers
luther-derive 0.1.0
The proc macro generator for the Luther lexer generator.1 version - Latest release: over 7 years ago - 2.67 thousand downloads total - 5 stars on GitHub - 1 maintainer
sana_core 0.1.1
The core of Sana2 versions - Latest release: about 5 years ago - 2 dependent packages - 1 dependent repositories - 4.26 thousand downloads total - 1 maintainer
token-counter 0.1.0
`wc` for tokens: count tokens in files with HF Tokenizers1 version - Latest release: about 1 year ago - 1.19 thousand downloads total - 7 stars on GitHub - 1 maintainer
aleph-alpha-tokenizer 0.3.1
A fast implementation of a wordpiece-inspired tokenizer4 versions - Latest release: about 5 years ago - 5.95 thousand downloads total - 6 stars on GitHub - 1 maintainer
pkl-parser 0.8.1
A rust Pkl Parser!2 versions - Latest release: about 1 year ago - 2 thousand downloads total - 6 stars on GitHub - 1 maintainer
rustrawi 0.1.2 💰
Rust port of the original PHP Sastrawi3 versions - Latest release: over 2 years ago - 3.33 thousand downloads total - 0 stars on GitHub - 1 maintainer
tantivy-tokenizer-tiny-segmenter 0.3.0
A Japanese tokenizer for Tantivy, based on TinySegmenter.3 versions - Latest release: almost 6 years ago - 1 dependent repositories - 4.88 thousand downloads total - 14 stars on GitHub - 1 maintainer
Related Keywords
lexer
56
parser
51
rust
46
nlp
35
analyzer
26
morphological
24
library
23
multilingual
21
parsing
19
scanner
17
bpe
14
japanese
14
dictionary
12
lexical
11
token
11
tokenization
11
text
10
ai
10
analysis
9
lexer-generator
9
no_std
8
python
7
builder
6
machine-learning
6
sql
6
cli
6
tantivy
6
generator
6
openai
5
segmentation
5
rust-lang
5
llm
5
gpt
4
natural-language-processing
4
deep-learning
4
text-processing
4
split
4
wordpiece
4
language
4
morphological-analysis
4
sentence
4
regex
4
chinese
4
ipadic
4
parser-generator
4
tokeniser
3
dutch
3
alpino
3
thai
3
word-segmentation
3
nodejs
3
html
3
stemmer
3
lex
3
svg
3
encoding
3
segmenter
3
sqlite
3
rust-crate
3
transformer
3
korean
3
unidic
3
c
3
lalr
3
compiler
3
sql-parser
3
sentencepiece
2
transducers
2
blingfire
2
rust-wrapper
2
wfst
2
word
2
extension
2
javascript
2
fst
2
transducer
2
graph
2
crate
2
speech-recognition
2
openfst
2
kaldi-asr
2
kaldi
2
fsts
2
lexing
2
finite-state-transducers
2
finite-state-acceptors
2
composition
2
automata
2
asr
2
acceptor
2
shortest-path
2
huggingface
2
language-model
2
virtual-machine
2
dfa
2
hacktoberfest
2
xml
2
css
2
ko-dic
2
thai-language
2