An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

crates.io "tokenizer" keyword

View the packages on the crates.io package registry that are tagged with the "tokenizer" keyword.

tokenise 0.1.0
A flexible tokeniser library for parsing text
1 version - Latest release: 6 months ago - 547 downloads total - 0 stars on GitHub - 1 maintainer
lindera-cli 1.1.2 💰
A morphological analysis command line interface.
102 versions - Latest release: 1 day ago - 99 thousand downloads total - 531 stars on GitHub - 4 maintainers
Top 7.2% on crates.io
lindera-ko-dic 1.1.2 💰
A Korean morphological dictionary for Ko-Dic.
68 versions - Latest release: 1 day ago - 3 dependent packages - 24 dependent repositories - 547 thousand downloads total - 531 stars on GitHub - 1 maintainer
Top 5.4% on crates.io
lindera 1.1.2 💰
A morphological analysis library.
103 versions - Latest release: 1 day ago - 10 dependent packages - 124 dependent repositories - 582 thousand downloads total - 531 stars on GitHub - 4 maintainers
Top 8.7% on crates.io
lindera-unidic 1.1.2 💰
A Japanese morphological dictionary for UniDic.
67 versions - Latest release: 1 day ago - 2 dependent packages - 3 dependent repositories - 384 thousand downloads total - 531 stars on GitHub - 1 maintainer
sqlite3-parser 0.15.0
SQL parser (as understood by SQLite)
14 versions - Latest release: 3 months ago - 3 dependent packages - 2 dependent repositories - 2.1 million downloads total - 54 stars on GitHub - 1 maintainer
turso_sqlite3_parser 0.1.4
SQL parser (as understood by SQLite)
7 versions - Latest release: 18 days ago - 3.87 thousand downloads total - 54 stars on GitHub - 1 maintainer
limbo_sqlite3_parser 0.0.22
SQL parser (as understood by SQLite)
6 versions - Latest release: 3 months ago - 3.92 thousand downloads total - 54 stars on GitHub - 1 maintainer
bundle_repo 0.6.0 💰
Pack a local or remote Git Repository to XML for LLM Consumption.
6 versions - Latest release: 6 months ago - 5.01 thousand downloads total - 22 stars on GitHub - 1 maintainer
tokengeex 1.1.0
TokenGeeX is an efficient tokenizer for code based on UnigramLM and TokenMonster.
11 versions - Latest release: over 1 year ago - 12.8 thousand downloads total - 4 stars on GitHub - 1 maintainer
erl_tokenize 0.8.3 💰
Erlang source code tokenizer
34 versions - Latest release: about 1 month ago - 5 dependent packages - 3 dependent repositories - 121 thousand downloads total - 12 stars on GitHub - 1 maintainer
ast-rs 0.0.1
AST Toolkit for Rust
1 version - Latest release: about 3 years ago - 1.6 thousand downloads total - 0 stars on GitHub - 1 maintainer
lindera-ipadic-neologd 1.1.2 💰
A Japanese morphological dictionary for IPADIC NEologd.
39 versions - Latest release: 1 day ago - 1 dependent package - 262 thousand downloads total - 530 stars on GitHub - 1 maintainer
Top 10.0% on crates.io
lindera-cc-cedict 1.1.2 💰
A Japanese morphological dictionary for CC-CEDICT.
67 versions - Latest release: 1 day ago - 2 dependent packages - 2 dependent repositories - 384 thousand downloads total - 530 stars on GitHub - 1 maintainer
Top 6.1% on crates.io
lindera-ipadic 1.1.2 💰
A Japanese morphological dictionary for IPADIC.
90 versions - Latest release: 1 day ago - 4 dependent packages - 126 dependent repositories - 616 thousand downloads total - 530 stars on GitHub - 4 maintainers
Top 5.2% on crates.io
lindera-dictionary 1.1.2 💰
A morphological analysis library.
82 versions - Latest release: 1 day ago - 10 dependent packages - 238 dependent repositories - 798 thousand downloads total - 530 stars on GitHub - 4 maintainers
sentience-tokenize 0.2.3 💰
Tiny Rust zero-dep tokenizer (ident, number, string, parens, operators, keywords).
8 versions - Latest release: 9 days ago - 1.53 thousand downloads total - 0 stars on GitHub - 1 maintainer
pkl_fast 0.1.1
A library aiming to easily and efficiently work with Apple's PKL format.
2 versions - Latest release: over 1 year ago - 2.62 thousand downloads total - 6 stars on GitHub - 1 maintainer
sentencepiece-model 0.1.4 💰
SentencePiece model parser generated from the SentencePiece protobuf definition
5 versions - Latest release: 11 months ago - 31.4 thousand downloads total - 0 stars on GitHub - 1 maintainer
better_peekable 0.2.4
Create a Peekable structure like Rust's Peekable except allowing for peeking n items ahead
7 versions - Latest release: over 3 years ago - 1 dependent repositories - 9.03 thousand downloads total - 1 stars on GitHub - 1 maintainer
mini-c-parser 0.12.2
minimal C language lexer & parser & virtual executer from scratch
11 versions - Latest release: about 1 year ago - 12.8 thousand downloads total - 12 stars on GitHub - 1 maintainer
Top 2.5% on crates.io
tokenizers 0.21.2
Provides an implementation of today's most used tokenizers, with a focus on performances and vers...
35 versions - Latest release: 3 months ago - 60 dependent packages - 281 dependent repositories - 3.42 million downloads total - 10,054 stars on GitHub - 3 maintainers
nlpo3 1.4.0
Thai natural language processing library, with Python and Node bindings
8 versions - Latest release: 10 months ago - 1 dependent package - 1 dependent repositories - 20.1 thousand downloads total - 35 stars on GitHub - 2 maintainers
nlpo3-cli 0.2.0
Command line interface for nlpO3, a Thai natural language processing library
3 versions - Latest release: about 4 years ago - 3.79 thousand downloads total - 35 stars on GitHub - 2 maintainers
lindera-dictionary-builder 0.32.3 💰
Shared code for building Lindera dictionary files
4 versions - Latest release: 6 months ago - 108 thousand downloads total - 530 stars on GitHub - 1 maintainer
nipah_tokenizer 0.1.0
A powerful yet simple text tokenizer for your everyday needs!
1 version - Latest release: over 2 years ago - 1.47 thousand downloads total - 0 stars on GitHub - 1 maintainer
text-splitter 0.27.0
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by chara...
55 versions - Latest release: 3 months ago - 5 dependent packages - 1 dependent repositories - 559 thousand downloads total - 473 stars on GitHub - 1 maintainer
lexariel 0.1.0
Lexical analyzer for Asmodeus language
1 version - Latest release: 2 months ago - 431 downloads total - 1 stars on GitHub - 1 maintainer
sql-script-parser 0.1.2 💰
sql-script-parser iterates over SQL statements in SQL script.
3 versions - Latest release: over 4 years ago - 4.38 thousand downloads total - 2 stars on GitHub - 1 maintainer
scnr 0.8.0
Scanner/Lexer with regex patterns and multiple modes
13 versions - Latest release: 7 months ago - 16.6 thousand downloads total - 3 stars on GitHub - 1 maintainer
sana 0.1.1
Create lexers easily
2 versions - Latest release: about 5 years ago - 1 dependent repositories - 3.24 thousand downloads total - 1 maintainer
bpe 0.2.1
Fast byte-pair encoding implementation.
5 versions - Latest release: 4 months ago - 35.9 thousand downloads total - 79 stars on GitHub - 3 maintainers
bpe-openai 0.3.0
Prebuilt fast byte-pair encoders for OpenAI.
4 versions - Latest release: 4 months ago - 40.4 thousand downloads total - 79 stars on GitHub - 3 maintainers
rustpotion 0.3.0
Blazingly fast word embeddings with Tokenlearn
3 versions - Latest release: 9 months ago - 2.27 thousand downloads total - 4 stars on GitHub - 1 maintainer
lindera-assets 0.32.3 💰
A helper crate to fetch assets and build dictionary for lindera.
2 versions - Latest release: 6 months ago - 93.4 thousand downloads total - 530 stars on GitHub - 1 maintainer
bracoxide 0.1.6
A feature-rich library for brace pattern combination, permutation generation, and error handling.
7 versions - Latest release: 4 months ago - 2 dependent packages - 6 dependent repositories - 345 thousand downloads total - 2 stars on GitHub - 1 maintainer
segtok 0.1.5
Sentence segmentation and word tokenization tools
6 versions - Latest release: 7 months ago - 5.14 thousand downloads total - 2 stars on GitHub - 1 maintainer
noa-parser 0.7.4
Noa parser is an extensible general purpose framework parser allowing to parser any type of data ...
12 versions - Latest release: 3 months ago - 4.22 thousand downloads total - 3 stars on GitHub - 1 maintainer
elyze 1.5.5
Elyze is an extensible general purpose framework parser allowing to parser any type of data witho...
19 versions - Latest release: about 1 month ago - 7.02 thousand downloads total - 0 stars on GitHub - 1 maintainer
code-splitter 0.1.5
Split code into semantic chunks using tree-sitter
5 versions - Latest release: 12 months ago - 4.83 thousand downloads total - 3 stars on GitHub - 1 maintainer
cang-jie 0.18.0
A Chinese tokenizer for tantivy
20 versions - Latest release: almost 2 years ago - 6 dependent packages - 13 dependent repositories - 45.5 thousand downloads total - 80 stars on GitHub - 1 maintainer
tokenizers-enfer 0.21.1
Provides an implementation of today's most used tokenizers, with a focus on performances and vers...
3 versions - Latest release: 9 months ago - 3.41 thousand downloads total - 1 maintainer
xlex-lexer 0.0.1
Fast and composable lexer for Rust
1 version - Latest release: 4 months ago - 414 downloads total - 0 stars on GitHub - 1 maintainer
rust_transformers 0.2.0
High performance tokenizers for Rust
2 versions - Latest release: over 5 years ago - 1 dependent package - 2.94 thousand downloads total - 323 stars on GitHub - 1 maintainer
Top 6.3% on crates.io
rust_tokenizers 8.1.1
High performance tokenizers for Rust
34 versions - Latest release: almost 2 years ago - 11 dependent packages - 225 dependent repositories - 304 thousand downloads total - 323 stars on GitHub - 1 maintainer
tokeneer 0.1.0
Another tokenizer crate
4 versions - Latest release: 7 months ago - 4.14 thousand downloads total - 1 stars on GitHub - 1 maintainer
another-tiktoken-rs 0.1.2
Library for encoding and decoding with the tiktoken library in Rust
3 versions - Latest release: about 2 years ago - 4.92 thousand downloads total - 325 stars on GitHub - 1 maintainer
Top 7.2% on crates.io
tiktoken-rs 0.7.0
Library for encoding and decoding with the tiktoken library in Rust
30 versions - Latest release: 4 months ago - 39 dependent packages - 73 dependent repositories - 1.87 million downloads total - 325 stars on GitHub - 1 maintainer
fileql 0.10.0 💰
A tool to run SQL-like query on local files using GitQL SDK
10 versions - Latest release: 7 months ago - 9.99 thousand downloads total - 71 stars on GitHub - 1 maintainer
tekken-rs 0.1.1
Rust implementation of Mistral Tekken tokenizer with audio support
2 versions - Latest release: about 1 month ago - 836 downloads total - 5 stars on GitHub - 1 maintainer
marqant 0.2.0
Quantum-compressed markdown format for AI consumption with 90% token reduction
4 versions - Latest release: 23 days ago - 840 downloads total - 0 stars on GitHub - 1 maintainer
Top 5.7% on crates.io
lindera-decompress 0.32.3 💰
A morphological analysis library.
39 versions - Latest release: 6 months ago - 11 dependent packages - 41 dependent repositories - 567 thousand downloads total - 514 stars on GitHub - 1 maintainer
lindera-tantivy 1.0.0 💰
Lindera Tokenizer for Tantivy.
52 versions - Latest release: 10 days ago - 5 dependent packages - 7 dependent repositories - 103 thousand downloads total - 60 stars on GitHub - 4 maintainers
indentation_flattener 0.1.0
From indented input, generate plain output with indentation PUSH and POP codes.
1 version - Latest release: over 8 years ago - 2.04 thousand downloads total - 0 stars on GitHub - 1 maintainer
vaporetto_rules 0.6.5
Rule-base filters for Vaporetto
12 versions - Latest release: 6 months ago - 1 dependent package - 1 dependent repositories - 60 thousand downloads total - 243 stars on GitHub - 1 maintainer
vaporetto 0.6.5
Vaporetto: a pointwise prediction based tokenizer
18 versions - Latest release: 6 months ago - 3 dependent packages - 1 dependent repositories - 124 thousand downloads total - 243 stars on GitHub - 1 maintainer
vaporetto_tantivy 0.24.0
Vaporetto Tokenizer for Tantivy
15 versions - Latest release: 3 months ago - 21.2 thousand downloads total - 243 stars on GitHub - 1 maintainer
Top 7.0% on crates.io
charabia 0.9.7
A simple library to detect the language, tokenize the text and normalize the tokens
29 versions - Latest release: 17 days ago - 3 dependent packages - 33 dependent repositories - 475 thousand downloads total - 308 stars on GitHub - 2 maintainers
Top 8.0% on crates.io
lindera-ipadic-neologd-builder 0.32.3 💰
A Japanese morphological dictionary builder for IPADIC NEologd.
18 versions - Latest release: 6 months ago - 3 dependent packages - 3 dependent repositories - 373 thousand downloads total - 527 stars on GitHub - 4 maintainers
punkt 1.0.5
An implementation of a Punkt sentence tokenizer
8 versions - Latest release: over 6 years ago - 3 dependent packages - 3 dependent repositories - 24 thousand downloads total - 38 stars on GitHub - 1 maintainer
tokenizer-lib 1.6.0
Tokenization utilities for building parsers in Rust
15 versions - Latest release: over 1 year ago - 2 dependent packages - 1 dependent repositories - 22.6 thousand downloads total - 2 stars on GitHub - 1 maintainer
rye-grain 0.0.1
A Python to Rust translator
1 version - Latest release: almost 3 years ago - 1.44 thousand downloads total - 1 stars on GitHub - 1 maintainer
jayce 12.1.0
jayce is a tokenizer 🌌
34 versions - Latest release: over 1 year ago - 38.5 thousand downloads total - 1 stars on GitHub - 1 maintainer
octofhir-fhirpath-parser 0.4.18
Parser and tokenizer for FHIRPath expressions
17 versions - Latest release: 11 days ago - 2.89 thousand downloads total - 15 stars on GitHub - 1 maintainer
unobtanium-segmenter 0.2.1
A text segmentation toolbox for search applications inspired by charabia and tantivy.
4 versions - Latest release: about 2 months ago - 1.2 thousand downloads total - 1 maintainer
lang_pt 0.1.2
A parser tool to generate recursive descent top down parser.
3 versions - Latest release: over 2 years ago - 4.53 thousand downloads total - 6 stars on GitHub - 1 maintainer
bareun_rs 0.1.0
Bareun is a Korean Morphological analyzer for Rust
1 version - Latest release: over 1 year ago - 1.25 thousand downloads total - 0 stars on GitHub - 1 maintainer
kitoken 0.10.1 💰
Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization
2 versions - Latest release: 9 months ago - 7 thousand downloads total - 29 stars on GitHub - 1 maintainer
rustfst 1.2.6
Library for constructing, combining, optimizing, and searching weighted finite-state transducers ...
57 versions - Latest release: about 2 months ago - 3 dependent packages - 1 dependent repositories - 619 thousand downloads total - 167 stars on GitHub - 1 maintainer
sqlite-simple-tokenizer 0.2.1
This's a run-time loadable extension of SQLite fts5, supports Chinese and pinyin word segmentatio...
3 versions - Latest release: 14 days ago - 259 downloads total - 0 stars on GitHub - 1 maintainer
bytepiece_rs 0.2.2
The Bytepiece Tokenizer Implemented in Rust
7 versions - Latest release: almost 2 years ago - 1 dependent package - 10.5 thousand downloads total - 14 stars on GitHub - 1 maintainer
tuker 0.1.0
A small tokenizer/parser library with an emphasis on usability
1 version - Latest release: 11 months ago - 897 downloads total - 1 maintainer
chat-splitter 0.1.1 💰
Never exceed OpenAI's chat models' maximum number of tokens when using the async_openai Rust crate
2 versions - Latest release: about 2 years ago - 2.6 thousand downloads total - 3 stars on GitHub - 1 maintainer
lindera-analyzer 0.32.3 💰
A morphological analysis library.
12 versions - Latest release: 6 months ago - 2 dependent packages - 1 dependent repositories - 134 thousand downloads total - 527 stars on GitHub - 1 maintainer
tocken 0.1.0 💰
Clustering algorithms.
1 version - Latest release: 8 months ago - 2.6 thousand downloads total - 0 stars on GitHub - 1 maintainer
lexxor 0.9.1
A fast, extensible, greedy, single-pass text tokenizer for Rust
2 versions - Latest release: 4 months ago - 822 downloads total - 1 stars on GitHub - 1 maintainer
tiktokenx 0.1.0
A high-performance Rust implementation of OpenAI's tiktoken library
1 version - Latest release: 16 days ago - 0 downloads total - 0 stars on GitHub - 1 maintainer
Top 2.8% on crates.io
logos 0.15.1 💰
Create ridiculously fast Lexers
56 versions - Latest release: about 1 month ago - 235 dependent packages - 606 dependent repositories - 19 million downloads total - 2,771 stars on GitHub - 2 maintainers
Top 3.5% on crates.io
logos-derive 0.15.1 💰
Create ridiculously fast Lexers
52 versions - Latest release: about 1 month ago - 7 dependent packages - 539 dependent repositories - 19 million downloads total - 2,771 stars on GitHub - 2 maintainers
logos2 💰
Create ridiculously fast Lexers
6 versions - Latest release: 16 days ago - 8.41 thousand downloads total - 2,771 stars on GitHub - 1 maintainer
logos-derive2 💰
Create ridiculously fast Lexers
6 versions - Latest release: 16 days ago - 1 dependent package - 6.9 thousand downloads total - 2,771 stars on GitHub - 1 maintainer
logos-cli 0.15.1 💰
Create ridiculously fast Lexers
8 versions - Latest release: about 1 month ago - 6.6 thousand downloads total - 2,771 stars on GitHub - 2 maintainers
logos-cli2 💰
Create ridiculously fast Lexers
6 versions - Latest release: 16 days ago - 6.74 thousand downloads total - 2,771 stars on GitHub - 1 maintainer
logos-codegen2 💰
Create ridiculously fast Lexers
6 versions - Latest release: 16 days ago - 2 dependent packages - 6.94 thousand downloads total - 2,771 stars on GitHub - 1 maintainer
Top 6.2% on crates.io
logos-codegen 0.15.1 💰
Create ridiculously fast Lexers
8 versions - Latest release: about 1 month ago - 2 dependent packages - 28 dependent repositories - 11.7 million downloads total - 2,771 stars on GitHub - 2 maintainers
bytepiece 0.2.0
Rust version of bytepiece tokenizer
2 versions - Latest release: almost 2 years ago - 2.52 thousand downloads total - 12 stars on GitHub - 1 maintainer
generic_tokenizer 0.1.0
A generic tokenizer that tracks line and column numbers as it goes.
1 version - Latest release: 11 months ago - 1.03 thousand downloads total - 0 stars on GitHub - 1 maintainer
indent_tokenizer 0.4.0
Generate tokens based on indentation
4 versions - Latest release: over 7 years ago - 6.48 thousand downloads total - 1 stars on GitHub - 1 maintainer
rustpostal 0.3.0
Rust bindings to libpostal
4 versions - Latest release: over 3 years ago - 6.45 thousand downloads total - 14 stars on GitHub - 1 maintainer
Top 4.8% on crates.io
xmlparser 0.13.6
Pull-based, zero-allocation XML parser.
24 versions - Latest release: almost 2 years ago - 35 dependent packages - 2,453 dependent repositories - 45.6 million downloads total - 135 stars on GitHub - 2 maintainers
tiniestsegmenter 0.3.0
Compact Japanese segmenter
4 versions - Latest release: 12 months ago - 4.77 thousand downloads total - 3 stars on GitHub - 1 maintainer
xxcalc 0.2.1
Embeddable or standalone robust floating-point polynomial calculator
4 versions - Latest release: almost 9 years ago - 1 dependent repositories - 12.2 thousand downloads total - 13 stars on GitHub - 1 maintainer
Top 6.5% on crates.io
svgtypes 0.15.3
SVG types parser.
25 versions - Latest release: 8 months ago - 26 dependent packages - 532 dependent repositories - 6.56 million downloads total - 77 stars on GitHub - 3 maintainers
luther-derive 0.1.0
The proc macro generator for the Luther lexer generator.
1 version - Latest release: over 7 years ago - 2.67 thousand downloads total - 5 stars on GitHub - 1 maintainer
sana_core 0.1.1
The core of Sana
2 versions - Latest release: about 5 years ago - 2 dependent packages - 1 dependent repositories - 4.26 thousand downloads total - 1 maintainer
token-counter 0.1.0
`wc` for tokens: count tokens in files with HF Tokenizers
1 version - Latest release: about 1 year ago - 1.19 thousand downloads total - 7 stars on GitHub - 1 maintainer
aleph-alpha-tokenizer 0.3.1
A fast implementation of a wordpiece-inspired tokenizer
4 versions - Latest release: about 5 years ago - 5.95 thousand downloads total - 6 stars on GitHub - 1 maintainer
pkl-parser 0.8.1
A rust Pkl Parser!
2 versions - Latest release: about 1 year ago - 2 thousand downloads total - 6 stars on GitHub - 1 maintainer
rustrawi 0.1.2 💰
Rust port of the original PHP Sastrawi
3 versions - Latest release: over 2 years ago - 3.33 thousand downloads total - 0 stars on GitHub - 1 maintainer
tantivy-tokenizer-tiny-segmenter 0.3.0
A Japanese tokenizer for Tantivy, based on TinySegmenter.
3 versions - Latest release: almost 6 years ago - 1 dependent repositories - 4.88 thousand downloads total - 14 stars on GitHub - 1 maintainer