An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

crates.io "tokenization" keyword

brainwires-datasets 0.1.0
Training data pipelines for the Brainwires Agent Framework — JSONL I/O, tokenization, deduplicati...
1 version - Latest release: about 1 hour ago - 0 downloads total - 1 maintainer
fern-tokenization 0.0.0
Empty crate, used only to reserve the name.
1 version - Latest release: over 3 years ago - 1.66 thousand downloads total - 14 stars on GitHub - 1 maintainer
derive-finite-automaton-derive 0.3.0
Procedural macro for generating finite automaton
6 versions - Latest release: 8 months ago - 1 dependent package - 1 dependent repositories - 14.3 thousand downloads total - 2 stars on GitHub - 1 maintainer
crossandra 1.0.0 💰
A fast and simple lexical tokenization library.
3 versions - Latest release: 8 days ago - 1.94 thousand downloads total - 8 stars on GitHub - 1 maintainer
build-trie 0.1.1
Procedural macro for generating match and state code representing a trie structure
2 versions - Latest release: almost 5 years ago - 3.06 thousand downloads total - 3 stars on GitHub - 1 maintainer
sentence 0.0.2
Sentence tokenizes English language sentences for use in TTS applications.
2 versions - Latest release: almost 6 years ago - 3.1 thousand downloads total - 2 stars on GitHub - 1 maintainer
chunk 0.9.2
The fastest semantic text chunking library — up to 1TB/s chunking throughput
7 versions - Latest release: about 2 months ago - 373 downloads total - 1 maintainer
toon_ql 0.0.2
A query language for Toon data
2 versions - Latest release: 4 months ago - 45 downloads total - 1 maintainer
tuck5 0.2.0
A pragmatic lexer/parser generator
4 versions - Latest release: over 2 years ago - 4.92 thousand downloads total - 0 stars on GitHub - 1 maintainer
go-brrr 0.1.0
Token-efficient code analysis for LLMs - Rust implementation
1 version - Latest release: about 2 months ago - 14 downloads total - 1 maintainer
niblits 0.3.6
Token-aware, multi-format text chunking library with language-aware semantic splitting
5 versions - Latest release: about 1 month ago - 97 downloads total - 1 maintainer
memchunk 0.4.0
The fastest semantic text chunking library — up to 1TB/s chunking throughput
11 versions - Latest release: 2 months ago - 221 downloads total - 2 stars on GitHub - 1 maintainer
marqant 1.0.0
Quantum-compressed markdown format for AI consumption with 90% token reduction
5 versions - Latest release: 4 months ago - 1.78 thousand downloads total - 0 stars on GitHub - 1 maintainer
rustrawi 0.1.2 💰
Rust port of the original PHP Sastrawi
3 versions - Latest release: about 3 years ago - 3.67 thousand downloads total - 0 stars on GitHub - 1 maintainer
any-lexer 0.0.3
Lexers for various programming languages and formats
3 versions - Latest release: over 2 years ago - 2 dependent packages - 6.42 thousand downloads total - 0 stars on GitHub - 1 maintainer
vaporetto 0.6.5
Vaporetto: a pointwise prediction based tokenizer
18 versions - Latest release: 12 months ago - 3 dependent packages - 1 dependent repositories - 156 thousand downloads total - 245 stars on GitHub - 1 maintainer
wordpieces 0.6.1
Split tokens into word pieces
10 versions - Latest release: over 3 years ago - 3 dependent packages - 3 dependent repositories - 21.7 thousand downloads total - 5 stars on GitHub - 1 maintainer
tokenizer-lib 1.6.0
Tokenization utilities for building parsers in Rust
15 versions - Latest release: almost 2 years ago - 2 dependent packages - 1 dependent repositories - 25.4 thousand downloads total - 2 stars on GitHub - 1 maintainer
vibrato 0.5.2
Vibrato: viterbi-based accelerated tokenizer
12 versions - Latest release: 12 months ago - 1 dependent package - 1 dependent repositories - 50.1 thousand downloads total - 377 stars on GitHub - 2 maintainers
colorblast-cli 0.0.1
Syntax highlighting CLI for various programming languages, markup languages and various other for...
1 version - Latest release: over 2 years ago - 1.51 thousand downloads total - 0 stars on GitHub - 1 maintainer
blex 0.2.2
A lightweight lexing framework
4 versions - Latest release: almost 3 years ago - 1 dependent package - 5.74 thousand downloads total - 0 stars on GitHub - 1 maintainer
libtqsm 0.6.1
Sentence segmenter that supports ~300 languages
1 version - Latest release: almost 2 years ago - 1 dependent package - 3.34 thousand downloads total - 2 stars on GitHub - 1 maintainer
classi-cine 0.5.1
A tool that builds smart video playlists by learning your preferences through Bayesian classifica...
12 versions - Latest release: 7 months ago - 12.3 thousand downloads total - 6 stars on GitHub - 1 maintainer
vtext 0.2.0
NLP with Rust
4 versions - Latest release: over 5 years ago - 3 dependent repositories - 14.7 thousand downloads total - 153 stars on GitHub - 1 maintainer
strizer 0.1.0
minimal and fast library for text tokenization
1 version - Latest release: almost 5 years ago - 1.94 thousand downloads total - 1 stars on GitHub - 1 maintainer
esg-tokenization-protocol 0.1.2
Official Rust implementation of the ESG Tokenization Protocol (ERC-8040 / EIP-8040). MIT-grade co...
3 versions - Latest release: 4 months ago - 62 downloads total - 1 maintainer
kizzasi-tokenizer 0.1.0
Signal quantization and tokenization for Kizzasi AGSP - VQ-VAE, μ-law, continuous embeddings
1 version - Latest release: about 2 months ago - 26 downloads total - 1 maintainer
colorblast 0.0.3
Syntax highlighting library for various programming languages, markup languages and various other...
3 versions - Latest release: over 2 years ago - 1 dependent package - 4.49 thousand downloads total - 0 stars on GitHub - 1 maintainer
bytepunch-rs 0.1.0
Profile-aware semantic compression for structured documents (CML and beyond)
1 version - Latest release: 3 months ago - 0 downloads total - 1 maintainer
text-scanner 0.0.3
A UTF-8 char-oriented, zero-copy, text and code scanning library
3 versions - Latest release: over 2 years ago - 1 dependent package - 6.36 thousand downloads total - 0 stars on GitHub - 1 maintainer
bpetok 0.1.2
A simple CLI for tokenizing text input using Byte Pair Encoding (BPE).
3 versions - Latest release: over 1 year ago - 3.06 thousand downloads total - 1 maintainer
unscanny 0.1.0 💰
Painless string scanning.
1 version - Latest release: almost 4 years ago - 8 dependent packages - 28 dependent repositories - 11.5 million downloads total - 56 stars on GitHub - 1 maintainer
agrocrypto-core 0.1.0
The core engine of AgroCrypto: a blockchain-native asset tokenization and settlement layer.
1 version - Latest release: 11 months ago - 917 downloads total - 1 maintainer
pretok 0.1.0
A string pre-tokenizer for C-like syntaxes.
1 version - Latest release: over 5 years ago - 1 dependent repositories - 1.69 thousand downloads total - 0 stars on GitHub - 1 maintainer
vaporetto_tantivy 0.24.0
Vaporetto Tokenizer for Tantivy
15 versions - Latest release: 9 months ago - 28.4 thousand downloads total - 242 stars on GitHub - 1 maintainer
bitcoin-get-json-token 0.1.1
A comprehensive Rust library for parsing and tokenizing JSON data, optimized for Bitcoin applicat...
2 versions - Latest release: 3 months ago - 4.38 thousand downloads total - 1 maintainer
bpe-match 0.1.1
A pattern matching library for BPE tokenization, intended to replace regex-based approaches.
2 versions - Latest release: 5 months ago - 374 downloads total - 1 maintainer
vaporetto_rules 0.6.5
Rule-base filters for Vaporetto
12 versions - Latest release: 12 months ago - 1 dependent package - 1 dependent repositories - 71 thousand downloads total - 242 stars on GitHub - 1 maintainer
derive-finite-automaton 0.3.0
Procedural macro for generating finite automaton
6 versions - Latest release: 8 months ago - 1 dependent package - 1 dependent repositories - 13.6 thousand downloads total - 2 stars on GitHub - 1 maintainer