An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

crates.io "tokenizer" keyword

View the packages on the crates.io package registry that are tagged with the "tokenizer" keyword.

fileql 0.10.0 💰
A tool to run SQL-like query on local files using GitQL SDK
10 versions - Latest release: 5 months ago - 9.49 thousand downloads total - 71 stars on GitHub - 1 maintainer
generic_tokenizer 0.1.0
A generic tokenizer that tracks line and column numbers as it goes.
1 version - Latest release: 9 months ago - 958 downloads total - 1 stars on GitHub - 1 maintainer
rye-grain 0.0.1
A Python to Rust translator
1 version - Latest release: over 2 years ago - 1.38 thousand downloads total - 1 stars on GitHub - 1 maintainer
scnr 0.8.0
Scanner/Lexer with regex patterns and multiple modes
13 versions - Latest release: 5 months ago - 15 thousand downloads total - 3 stars on GitHub - 1 maintainer
tokeneer 0.1.0
Another tokenizer crate
4 versions - Latest release: 5 months ago - 3.88 thousand downloads total - 1 stars on GitHub - 1 maintainer
Top 6.2% on crates.io
logos-codegen 0.15.0 💰
Create ridiculously fast Lexers
7 versions - Latest release: 8 months ago - 2 dependent packages - 28 dependent repositories - 10.8 million downloads total - 2,771 stars on GitHub - 2 maintainers
jayce 12.1.0
jayce is a tokenizer 🌌
34 versions - Latest release: over 1 year ago - 36.8 thousand downloads total - 1 stars on GitHub - 1 maintainer
rust_transformers 0.2.0
High performance tokenizers for Rust
2 versions - Latest release: over 5 years ago - 1 dependent package - 2.83 thousand downloads total - 313 stars on GitHub - 1 maintainer
tekken-rs 0.1.0
Rust implementation of Mistral Tekken tokenizer with audio support
1 version - Latest release: 2 days ago - 0 downloads total - 0 stars on GitHub - 1 maintainer
bpe 0.2.1
Fast byte-pair encoding implementation.
5 versions - Latest release: 3 months ago - 28.3 thousand downloads total - 71 stars on GitHub - 3 maintainers
bpe-openai 0.3.0
Prebuilt fast byte-pair encoders for OpenAI.
4 versions - Latest release: 3 months ago - 33.1 thousand downloads total - 71 stars on GitHub - 3 maintainers
nlpo3 1.4.0
Thai natural language processing library, with Python and Node bindings
8 versions - Latest release: 9 months ago - 1 dependent package - 1 dependent repositories - 17.6 thousand downloads total - 35 stars on GitHub - 2 maintainers
logos-cli2 💰
Create ridiculously fast Lexers
6 versions - Latest release: 2 days ago - 6.54 thousand downloads total - 2,771 stars on GitHub - 1 maintainer
nlpo3-cli 0.2.0
Command line interface for nlpO3, a Thai natural language processing library
3 versions - Latest release: almost 4 years ago - 3.57 thousand downloads total - 35 stars on GitHub - 2 maintainers
text-splitter 0.27.0
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by chara...
55 versions - Latest release: about 2 months ago - 5 dependent packages - 1 dependent repositories - 480 thousand downloads total - 455 stars on GitHub - 1 maintainer
tokenizer-lib 1.6.0
Tokenization utilities for building parsers in Rust
15 versions - Latest release: about 1 year ago - 2 dependent packages - 1 dependent repositories - 21.6 thousand downloads total - 2 stars on GitHub - 1 maintainer
punkt 1.0.5
An implementation of a Punkt sentence tokenizer
8 versions - Latest release: over 6 years ago - 3 dependent packages - 3 dependent repositories - 23.3 thousand downloads total - 37 stars on GitHub - 1 maintainer
bundle_repo 0.6.0 💰
Pack a local or remote Git Repository to XML for LLM Consumption.
6 versions - Latest release: 5 months ago - 4.49 thousand downloads total - 22 stars on GitHub - 1 maintainer
tocken 0.1.0 💰
Clustering algorithms.
1 version - Latest release: 7 months ago - 2.51 thousand downloads total - 0 stars on GitHub - 1 maintainer
logos-derive2 💰
Create ridiculously fast Lexers
6 versions - Latest release: 5 days ago - 1 dependent package - 6.64 thousand downloads total - 2,771 stars on GitHub - 1 maintainer
Top 3.5% on crates.io
logos-derive 0.15.0 💰
Create ridiculously fast Lexers
51 versions - Latest release: 8 months ago - 7 dependent packages - 539 dependent repositories - 17.6 million downloads total - 2,771 stars on GitHub - 2 maintainers
chat-splitter 0.1.1 💰
Never exceed OpenAI's chat models' maximum number of tokens when using the async_openai Rust crate
2 versions - Latest release: almost 2 years ago - 2.5 thousand downloads total - 3 stars on GitHub - 1 maintainer
logos2 💰
Create ridiculously fast Lexers
6 versions - Latest release: 6 days ago - 8.04 thousand downloads total - 2,771 stars on GitHub - 1 maintainer
kitoken 0.10.1 💰
Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization
2 versions - Latest release: 7 months ago - 6.03 thousand downloads total - 26 stars on GitHub - 1 maintainer
logos-cli 0.15.0 💰
Create ridiculously fast Lexers
7 versions - Latest release: 8 months ago - 6.1 thousand downloads total - 2,771 stars on GitHub - 2 maintainers
rustpostal 0.3.0
Rust bindings to libpostal
4 versions - Latest release: over 3 years ago - 6.03 thousand downloads total - 14 stars on GitHub - 1 maintainer
Top 5.4% on crates.io
lindera 0.44.1 💰
A morphological analysis library.
98 versions - Latest release: 27 days ago - 10 dependent packages - 124 dependent repositories - 506 thousand downloads total - 516 stars on GitHub - 4 maintainers
lindera-ipadic-neologd 0.44.1 💰
A Japanese morphological dictionary for IPADIC NEologd.
34 versions - Latest release: 27 days ago - 1 dependent package - 199 thousand downloads total - 516 stars on GitHub - 1 maintainer
Top 5.2% on crates.io
lindera-dictionary 0.44.1 💰
A morphological analysis library.
77 versions - Latest release: 27 days ago - 10 dependent packages - 238 dependent repositories - 719 thousand downloads total - 516 stars on GitHub - 4 maintainers
Top 7.2% on crates.io
lindera-ko-dic 0.44.1 💰
A Japanese morphological dictionary for ko-dic.
63 versions - Latest release: 27 days ago - 3 dependent packages - 24 dependent repositories - 475 thousand downloads total - 516 stars on GitHub - 1 maintainer
Top 10.0% on crates.io
lindera-cc-cedict 0.44.1 💰
A Japanese morphological dictionary for CC-CEDICT.
62 versions - Latest release: 27 days ago - 2 dependent packages - 2 dependent repositories - 314 thousand downloads total - 516 stars on GitHub - 1 maintainer
lindera-cli 0.44.1 💰
A morphological analysis command line interface.
97 versions - Latest release: 27 days ago - 91.5 thousand downloads total - 516 stars on GitHub - 4 maintainers
Top 6.1% on crates.io
lindera-ipadic 0.44.1 💰
A Japanese morphological dictionary for IPADIC.
85 versions - Latest release: 27 days ago - 4 dependent packages - 126 dependent repositories - 540 thousand downloads total - 516 stars on GitHub - 4 maintainers
Top 8.7% on crates.io
lindera-unidic 0.44.1 💰
A Japanese morphological dictionary for UniDic.
62 versions - Latest release: 27 days ago - 2 dependent packages - 3 dependent repositories - 318 thousand downloads total - 516 stars on GitHub - 1 maintainer
tiniestsegmenter 0.3.0
Compact Japanese segmenter
4 versions - Latest release: 10 months ago - 4.56 thousand downloads total - 3 stars on GitHub - 1 maintainer
unobtanium-segmenter 0.2.1
A text segmentation toolbox for search applications inspired by charabia and tantivy.
4 versions - Latest release: 7 days ago - 620 downloads total - 1 maintainer
bytepiece_rs 0.2.2
The Bytepiece Tokenizer Implemented in Rust
7 versions - Latest release: over 1 year ago - 1 dependent package - 10 thousand downloads total - 14 stars on GitHub - 1 maintainer
xxcalc 0.2.1
Embeddable or standalone robust floating-point polynomial calculator
4 versions - Latest release: over 8 years ago - 1 dependent repositories - 11.9 thousand downloads total - 13 stars on GitHub - 1 maintainer
Top 6.5% on crates.io
svgtypes 0.15.3
SVG types parser.
25 versions - Latest release: 6 months ago - 26 dependent packages - 532 dependent repositories - 6.09 million downloads total - 74 stars on GitHub - 3 maintainers
tuker 0.1.0
A small tokenizer/parser library with an emphasis on usability
1 version - Latest release: 10 months ago - 814 downloads total - 1 maintainer
lang_pt 0.1.2
A parser tool to generate recursive descent top down parser.
3 versions - Latest release: over 2 years ago - 4.25 thousand downloads total - 6 stars on GitHub - 1 maintainer
erl_tokenize 0.8.1 💰
Erlang source code tokenizer
32 versions - Latest release: 5 months ago - 5 dependent packages - 3 dependent repositories - 114 thousand downloads total - 12 stars on GitHub - 1 maintainer
Top 2.8% on crates.io
logos 0.15.0 💰
Create ridiculously fast Lexers
55 versions - Latest release: 8 months ago - 235 dependent packages - 606 dependent repositories - 17.5 million downloads total - 2,771 stars on GitHub - 2 maintainers
logos-codegen2 💰
Create ridiculously fast Lexers
6 versions - Latest release: 8 days ago - 2 dependent packages - 6.67 thousand downloads total - 2,771 stars on GitHub - 1 maintainer
lexxor 0.9.1
A fast, extensible, greedy, single-pass text tokenizer for Rust
2 versions - Latest release: 2 months ago - 690 downloads total - 1 stars on GitHub - 1 maintainer
byteforge 0.1.1
A next-generation byte-level transformer with multi-signal patching and SIMD optimization
2 versions - Latest release: 15 days ago - 370 downloads total - 1 stars on GitHub - 1 maintainer
svgrtypes 0.44.2
SVG types parser.
27 versions - Latest release: about 1 month ago - 2 dependent packages - 25.6 thousand downloads total - 1 maintainer
token-counter 0.1.0
`wc` for tokens: count tokens in files with HF Tokenizers
1 version - Latest release: about 1 year ago - 1.12 thousand downloads total - 7 stars on GitHub - 1 maintainer
tantivy-tokenizer-tiny-segmenter 0.3.0
A Japanese tokenizer for Tantivy, based on TinySegmenter.
3 versions - Latest release: over 5 years ago - 1 dependent repositories - 4.71 thousand downloads total - 14 stars on GitHub - 1 maintainer
bracoxide 0.1.6
A feature-rich library for brace pattern combination, permutation generation, and error handling.
7 versions - Latest release: 3 months ago - 2 dependent packages - 6 dependent repositories - 272 thousand downloads total - 2 stars on GitHub - 1 maintainer
tele_tokenizer 0.2.0
A CSS tokenizer
2 versions - Latest release: over 3 years ago - 3 dependent packages - 1 dependent repositories - 3.71 thousand downloads total - 198 stars on GitHub - 1 maintainer
indent_tokenizer 0.4.0
Generate tokens based on indentation
4 versions - Latest release: over 7 years ago - 6.28 thousand downloads total - 1 stars on GitHub - 1 maintainer
tinytoken 0.1.4
Library for tokenizing text into words, numbers, symbols, and more, with customizable parsing opt...
5 versions - Latest release: 9 months ago - 3.45 thousand downloads total - 0 stars on GitHub - 1 maintainer
regex-lexer 0.2.0
A regex-based lexer (tokenizer)
3 versions - Latest release: almost 3 years ago - 3 dependent packages - 4 dependent repositories - 14.9 thousand downloads total - 6 stars on GitHub - 1 maintainer
lox-scanner 0.1.0
lexical scanner for Lox
3 versions - Latest release: almost 4 years ago - 3.67 thousand downloads total - 0 stars on GitHub - 1 maintainer
aleph-alpha-tokenizer 0.3.1
A fast implementation of a wordpiece-inspired tokenizer
4 versions - Latest release: about 5 years ago - 5.7 thousand downloads total - 6 stars on GitHub - 1 maintainer
regex-lexer-lalrpop 0.3.0
A regex-based lexer (tokenizer)
4 versions - Latest release: over 3 years ago - 4.6 thousand downloads total - 0 stars on GitHub - 1 maintainer
tokenizers-enfer 0.21.1
Provides an implementation of today's most used tokenizers, with a focus on performances and vers...
3 versions - Latest release: 8 months ago - 3.18 thousand downloads total - 1 maintainer
langbox 0.6.0
A simple framework to build compilers and interpreters
9 versions - Latest release: about 1 year ago - 10.4 thousand downloads total - 0 stars on GitHub - 1 maintainer
mini-c-parser 0.12.2
minimal C language lexer & parser & virtual executer from scratch
11 versions - Latest release: about 1 year ago - 11.9 thousand downloads total - 12 stars on GitHub - 1 maintainer
c_lexer 0.1.1
C lexer
2 versions - Latest release: over 6 years ago - 1 dependent package - 1 dependent repositories - 4.45 thousand downloads total - 7 stars on GitHub - 1 maintainer
pgn-lexer 0.1.1
A lexer for PGN files for chess. Provides an iterator over the tokens from a byte stream.
3 versions - Latest release: almost 8 years ago - 4.48 thousand downloads total - 1 stars on GitHub - 1 maintainer
lexerus 0.1.7
Simple annotated lexer
8 versions - Latest release: 9 months ago - 6.73 thousand downloads total - 1 stars on GitHub - 1 maintainer
crossandra 0.0.2 💰
A straightforward tokenization library for seamless text processing.
2 versions - Latest release: 7 months ago - 1.54 thousand downloads total - 8 stars on GitHub - 1 maintainer
scnr2_macro 0.2.0
Scanner/Lexer with regex patterns and multiple modes
2 versions - Latest release: 30 days ago - 423 downloads total - 2 stars on GitHub - 1 maintainer
scnr2 0.2.0
Scanner/Lexer with regex patterns and multiple modes
2 versions - Latest release: 30 days ago - 408 downloads total - 2 stars on GitHub - 1 maintainer
scnr2_generate 0.2.0
Scanner/Lexer with regex patterns and multiple modes
2 versions - Latest release: 30 days ago - 424 downloads total - 2 stars on GitHub - 1 maintainer
rust-forth-tokenizer 0.2.1
A Forth tokenizer written in Rust.
10 versions - Latest release: about 2 months ago - 1 dependent package - 12.8 thousand downloads total - 1 stars on GitHub - 1 maintainer
simple-cursor 0.1.1
A super simple character cursor implementation geared towards lexers/tokenizers.
2 versions - Latest release: about 2 years ago - 2.39 thousand downloads total - 0 stars on GitHub - 1 maintainer
alpino-tokenizer-sys 0.2.1
Low-level wrapper around the Alpino tokenizer for Dutch
3 versions - Latest release: about 5 years ago - 1 dependent package - 1 dependent repositories - 6.26 thousand downloads total - 3 stars on GitHub - 1 maintainer
regex-tokenizer 0.1.1
A regex tokenizer
2 versions - Latest release: over 2 years ago - 2.46 thousand downloads total - 1 stars on GitHub - 2 maintainers
Top 7.0% on crates.io
charabia 0.9.6
A simple library to detect the language, tokenize the text and normalize the tokens
28 versions - Latest release: about 2 months ago - 3 dependent packages - 33 dependent repositories - 442 thousand downloads total - 299 stars on GitHub - 2 maintainers
vaporetto_rules 0.6.5
Rule-base filters for Vaporetto
12 versions - Latest release: 4 months ago - 1 dependent package - 1 dependent repositories - 55.8 thousand downloads total - 238 stars on GitHub - 1 maintainer
vaporetto 0.6.5
Vaporetto: a pointwise prediction based tokenizer
18 versions - Latest release: 4 months ago - 3 dependent packages - 1 dependent repositories - 113 thousand downloads total - 238 stars on GitHub - 1 maintainer
vaporetto_tantivy 0.24.0
Vaporetto Tokenizer for Tantivy
15 versions - Latest release: about 1 month ago - 17.9 thousand downloads total - 238 stars on GitHub - 1 maintainer
Top 6.2% on crates.io
lindera-ko-dic-builder 0.32.3 💰
A Korean morphological dictionary builder for ko-dic.
45 versions - Latest release: 4 months ago - 6 dependent packages - 40 dependent repositories - 555 thousand downloads total - 514 stars on GitHub - 4 maintainers
bareun_rs 0.1.0
Bareun is a Korean Morphological analyzer for Rust
1 version - Latest release: about 1 year ago - 1.15 thousand downloads total - 0 stars on GitHub - 1 maintainer
pkl-parser 0.8.1
A rust Pkl Parser!
2 versions - Latest release: 11 months ago - 1.87 thousand downloads total - 6 stars on GitHub - 1 maintainer
condex 1.0.0 💰
Extract tokens by simple condition expression.
1 version - Latest release: over 3 years ago - 1.39 thousand downloads total - 2 stars on GitHub - 1 maintainer
c-lexer-stable 0.1.4
C lexer
4 versions - Latest release: over 4 years ago - 41.8 thousand downloads total - 2 stars on GitHub - 1 maintainer
vibrato 0.5.2
Vibrato: viterbi-based accelerated tokenizer
12 versions - Latest release: 4 months ago - 1 dependent package - 1 dependent repositories - 37.2 thousand downloads total - 360 stars on GitHub - 2 maintainers
bpetok 0.1.2
A simple CLI for tokenizing text input using Byte Pair Encoding (BPE).
3 versions - Latest release: 10 months ago - 2.5 thousand downloads total - 1 maintainer
alpino-tokenizer 0.4.0
Wrapper around the Alpino tokenizer for Dutch
4 versions - Latest release: over 1 year ago - 1 dependent package - 2 dependent repositories - 7.49 thousand downloads total - 3 stars on GitHub - 1 maintainer
rustfst 1.2.1
Library for constructing, combining, optimizing, and searching weighted finite-state transducers ...
52 versions - Latest release: 16 days ago - 3 dependent packages - 1 dependent repositories - 601 thousand downloads total - 163 stars on GitHub - 1 maintainer
chunk_norris 0.2.1
A Rust library for splitting large text into smaller batches for LLM input.
3 versions - Latest release: 6 months ago - 1.76 thousand downloads total - 1 stars on GitHub - 1 maintainer
sana_core 0.1.1
The core of Sana
2 versions - Latest release: almost 5 years ago - 2 dependent packages - 1 dependent repositories - 4.09 thousand downloads total - 1 maintainer
Top 7.9% on crates.io
plex 0.3.1
A syntax extension for writing lexers and parsers.
11 versions - Latest release: about 1 year ago - 4 dependent packages - 20 dependent repositories - 56.4 thousand downloads total - 412 stars on GitHub - 1 maintainer
rustfst-ffi 1.1.2
Library for constructing, combining, optimizing, and searching weighted finite-state transducers ...
16 versions - Latest release: 11 months ago - 16.4 thousand downloads total - 163 stars on GitHub - 1 maintainer
luther-derive 0.1.0
The proc macro generator for the Luther lexer generator.
1 version - Latest release: about 7 years ago - 2.55 thousand downloads total - 5 stars on GitHub - 1 maintainer
smoltoken 0.2.0
A fast library for Byte Pair Encoding (BPE) tokenization.
5 versions - Latest release: 4 months ago - 2.69 thousand downloads total - 7 stars on GitHub - 1 maintainer
blingfire 1.0.0
Wrapper for the BlingFire tokenization library
5 versions - Latest release: about 5 years ago - 84.5 thousand downloads total - 15 stars on GitHub - 1 maintainer
svgparser 0.8.1
Featureful, pull-based, zero-allocation SVG parser.
21 versions - Latest release: over 7 years ago - 4 dependent packages - 98 dependent repositories - 194 thousand downloads total - 22 stars on GitHub - 1 maintainer
lexers 0.1.4
Tools for tokenizing and scanning
11 versions - Latest release: over 3 years ago - 8 dependent packages - 4 dependent repositories - 38.3 thousand downloads total - 66 stars on GitHub - 1 maintainer
parsit 0.2.0
very simple lib, the parsing combinators, recursive descendent that uses logos as lexer
17 versions - Latest release: about 2 years ago - 1 dependent package - 1 dependent repositories - 19.4 thousand downloads total - 7 stars on GitHub - 1 maintainer
html5tokenizer 0.5.2
An HTML5 tokenizer with code span support.
7 versions - Latest release: almost 2 years ago - 1 dependent repositories - 8.37 thousand downloads total - 1 maintainer
char-lex 1.0.5
Create easy enum based lexers
11 versions - Latest release: about 5 years ago - 13 thousand downloads total - 1 maintainer
Top 6.2% on crates.io
lindera-cc-cedict-builder 0.32.3 💰
A Chinese morphological dictionary builder for CC-CEDICT.
40 versions - Latest release: 4 months ago - 6 dependent packages - 40 dependent repositories - 548 thousand downloads total - 375 stars on GitHub - 1 maintainer
bytepiece 0.2.0
Rust version of bytepiece tokenizer
2 versions - Latest release: almost 2 years ago - 2.34 thousand downloads total - 12 stars on GitHub - 1 maintainer
fuzzy-pickles 0.1.1
A low-level parser of Rust source code with high-level visitor implementations
2 versions - Latest release: almost 5 years ago - 1 dependent repositories - 3.5 thousand downloads total - 7 stars on GitHub - 1 maintainer
rustrawi 0.1.2 💰
Rust port of the original PHP Sastrawi
3 versions - Latest release: over 2 years ago - 3.14 thousand downloads total - 0 stars on GitHub - 1 maintainer