Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

crates.io "tokenizer" keyword

rust_transformers 0.2.0
High performance tokenizers for Rust
2 versions - Latest release: over 4 years ago - 1 dependent package - 1.12 thousand downloads total - 273 stars on GitHub - 1 maintainer
Top 6.3% on crates.io
rust_tokenizers 8.1.1
High performance tokenizers for Rust
34 versions - Latest release: 8 months ago - 11 dependent packages - 225 dependent repositories - 171 thousand downloads total - 273 stars on GitHub - 1 maintainer
xxcalc 0.2.1
Embeddable or standalone robust floating-point polynomial calculator
4 versions - Latest release: over 7 years ago - 1 dependent repositories - 8.37 thousand downloads total - 13 stars on GitHub - 1 maintainer
Top 7.0% on crates.io
charabia 0.8.10
A simple library to detect the language, tokenize the text and normalize the tokens
20 versions - Latest release: about 1 month ago - 3 dependent packages - 33 dependent repositories - 230 thousand downloads total - 222 stars on GitHub - 2 maintainers
html5tokenizer 0.5.2
An HTML5 tokenizer with code span support.
7 versions - Latest release: 8 months ago - 1 dependent repositories - 2.53 thousand downloads total - 1 maintainer
fileql 0.3.0 💰
A tool to run SQL-like query on local files using GitQL SDK
3 versions - Latest release: about 1 month ago - 1.12 thousand downloads total - 56 stars on GitHub - 1 maintainer
tele_tokenizer 0.2.0
A CSS tokenizer
2 versions - Latest release: about 2 years ago - 3 dependent packages - 1 dependent repositories - 1.83 thousand downloads total - 198 stars on GitHub - 1 maintainer
bytepiece_rs 0.2.2
The Bytepiece Tokenizer Implemented in Rust
7 versions - Latest release: 7 months ago - 1 dependent package - 2.36 thousand downloads total - 14 stars on GitHub - 1 maintainer
svgparser 0.8.1
Featureful, pull-based, zero-allocation SVG parser.
21 versions - Latest release: about 6 years ago - 4 dependent packages - 98 dependent repositories - 104 thousand downloads total - 22 stars on GitHub - 1 maintainer
Top 2.5% on crates.io
tokenizers 0.19.1
Provides an implementation of today's most used tokenizers, with a focus on performances and vers...
25 versions - Latest release: about 1 month ago - 60 dependent packages - 281 dependent repositories - 798 thousand downloads total - 8,543 stars on GitHub - 3 maintainers
lindera-analyzer 0.30.0 💰
A morphological analysis library.
10 versions - Latest release: about 2 months ago - 2 dependent packages - 1 dependent repositories - 12.9 thousand downloads total - 349 stars on GitHub - 1 maintainer
Top 7.2% on crates.io
lindera-ko-dic 0.30.0 💰
A Japanese morphological dictionary for ko-dic.
38 versions - Latest release: about 2 months ago - 3 dependent packages - 24 dependent repositories - 183 thousand downloads total - 349 stars on GitHub - 1 maintainer
Top 10.0% on crates.io
lindera-cc-cedict 0.30.0 💰
A Japanese morphological dictionary for CC-CEDICT.
37 versions - Latest release: about 2 months ago - 2 dependent packages - 2 dependent repositories - 40.7 thousand downloads total - 349 stars on GitHub - 1 maintainer
Top 5.2% on crates.io
lindera-dictionary 0.30.0 💰
A Japanese morphological dictionary.
53 versions - Latest release: about 2 months ago - 10 dependent packages - 238 dependent repositories - 341 thousand downloads total - 349 stars on GitHub - 4 maintainers
Top 5.4% on crates.io
lindera 0.30.0 💰
A morphological analysis library.
73 versions - Latest release: about 2 months ago - 10 dependent packages - 124 dependent repositories - 193 thousand downloads total - 349 stars on GitHub - 4 maintainers
Top 8.4% on crates.io
lindera-filter 0.30.0 💰
Character and token filters for Lindera.
14 versions - Latest release: about 2 months ago - 3 dependent packages - 16 dependent repositories - 29.2 thousand downloads total - 349 stars on GitHub - 1 maintainer
lindera-cli 0.30.0 💰
A morphological analysis command line interface.
72 versions - Latest release: about 2 months ago - 25.5 thousand downloads total - 349 stars on GitHub - 4 maintainers
Top 6.2% on crates.io
lindera-cc-cedict-builder 0.30.0 💰
A Chinese morphological dictionary builder for CC-CEDICT.
38 versions - Latest release: about 2 months ago - 6 dependent packages - 40 dependent repositories - 313 thousand downloads total - 349 stars on GitHub - 1 maintainer
Top 7.1% on crates.io
lindera-tokenizer 0.30.0 💰
A morphological analysis library.
10 versions - Latest release: about 2 months ago - 12 dependent packages - 3 dependent repositories - 117 thousand downloads total - 349 stars on GitHub - 1 maintainer
bleuscore 0.1.2
A fast bleu score calculator
4 versions - Latest release: about 1 month ago - 739 downloads total - 0 stars on GitHub - 1 maintainer
tusk_lexer 0.4.7
The lexical analysis component of Tusk.
21 versions - Latest release: almost 3 years ago - 1 dependent package - 7.51 thousand downloads total - 1 maintainer
lexical_scanner 0.1.18
A simple lexer which creates over 115+ various tokens based on the rust programming language. Thi...
19 versions - Latest release: about 2 years ago - 8.17 thousand downloads total - 2 stars on GitHub - 1 maintainer
Top 7.9% on crates.io
plex 0.3.0
A syntax extension for writing lexers and parsers.
10 versions - Latest release: 7 months ago - 4 dependent packages - 20 dependent repositories - 36.5 thousand downloads total - 399 stars on GitHub - 1 maintainer
rye-grain 0.0.1
A Python to Rust translator
1 version - Latest release: over 1 year ago - 455 downloads total - 1 stars on GitHub - 1 maintainer
vibrato 0.5.1
Vibrato: viterbi-based accelerated tokenizer
11 versions - Latest release: about 1 year ago - 1 dependent package - 1 dependent repositories - 12.7 thousand downloads total - 295 stars on GitHub - 2 maintainers
azul-simplecss 0.1.1
A very simple CSS 2.1 tokenizer.
2 versions - Latest release: almost 5 years ago - 1 dependent package - 4 dependent repositories - 17.7 thousand downloads total - 32 stars on GitHub - 1 maintainer
tokenizer-lib 1.5.1
Tokenization utilities for building parsers in Rust
15 versions - Latest release: 9 months ago - 2 dependent packages - 1 dependent repositories - 7.39 thousand downloads total - 2 stars on GitHub - 1 maintainer
chat-splitter 0.1.1 💰
Never exceed OpenAI's chat models' maximum number of tokens when using the async_openai Rust crate
2 versions - Latest release: 9 months ago - 646 downloads total - 2 stars on GitHub - 1 maintainer
ast-rs 0.0.1
AST Toolkit for Rust
1 version - Latest release: almost 2 years ago - 406 downloads total - 0 stars on GitHub - 1 maintainer
vaporetto_tantivy 0.20.0
Vaporetto Tokenizer for Tantivy
10 versions - Latest release: 12 months ago - 3.27 thousand downloads total - 215 stars on GitHub - 1 maintainer
vaporetto_rules 0.6.3
Rule-base filters for Vaporetto
10 versions - Latest release: about 1 year ago - 1 dependent package - 1 dependent repositories - 33.9 thousand downloads total - 215 stars on GitHub - 1 maintainer
vaporetto 0.6.3
Vaporetto: a pointwise prediction based tokenizer
16 versions - Latest release: about 1 year ago - 3 dependent packages - 1 dependent repositories - 72.6 thousand downloads total - 215 stars on GitHub - 1 maintainer
rustrawi 0.1.2 💰
Rust port of the original PHP Sastrawi
3 versions - Latest release: over 1 year ago - 830 downloads total - 0 stars on GitHub - 1 maintainer
sqlite3-parser 0.12.0
SQL parser (as understood by SQLite)
11 versions - Latest release: 7 months ago - 3 dependent packages - 2 dependent repositories - 89.7 thousand downloads total - 42 stars on GitHub - 1 maintainer
sentencepiece-model 0.1.0 💰
Sentencepiece model parser
1 version - Latest release: 7 months ago - 360 downloads total - 0 stars on GitHub - 1 maintainer
sana 0.1.1
Create lexers easily
2 versions - Latest release: almost 4 years ago - 1 dependent repositories - 1.08 thousand downloads total - 1 maintainer
libsimple 0.2.1
Rust bindings to simple, a SQLite3 fts5 tokenizer which supports Chinese and PinYin.
4 versions - Latest release: about 2 months ago - 639 downloads total - 1 maintainer
basic_lexer 0.2.1
Basic lexical analyzer for parsing and compiling
6 versions - Latest release: over 2 years ago - 2.02 thousand downloads total - 0 stars on GitHub - 1 maintainer
blex 0.2.2
A lightweight lexing framework
4 versions - Latest release: about 1 year ago - 1 dependent package - 1.34 thousand downloads total - 0 stars on GitHub - 1 maintainer
sana_derive 0.1.1
The derive macro for Sana
2 versions - Latest release: almost 4 years ago - 1 dependent package - 1 dependent repositories - 1.56 thousand downloads total - 1 maintainer
pretok 0.1.0
A string pre-tokenizer for C-like syntaxes.
1 version - Latest release: over 3 years ago - 1 dependent repositories - 506 downloads total - 0 stars on GitHub - 1 maintainer
blingfire 1.0.0
Wrapper for the BlingFire tokenization library
5 versions - Latest release: almost 4 years ago - 49.6 thousand downloads total - 15 stars on GitHub - 1 maintainer
blingfire-sys 1.0.1
Bindings to the BlingFire C++ library
5 versions - Latest release: almost 4 years ago - 1 dependent package - 50.3 thousand downloads total - 15 stars on GitHub - 1 maintainer
aleph-alpha-tokenizer 0.3.1
A fast implementation of a wordpiece-inspired tokenizer
4 versions - Latest release: almost 4 years ago - 2.05 thousand downloads total - 6 stars on GitHub - 1 maintainer
nnsplit 0.5.9
A tool to split text using a neural network. For sentence boundary detection, compound splitting ...
29 versions - Latest release: about 1 year ago - 1 dependent repositories - 13.6 thousand downloads total - 489 stars on GitHub - 1 maintainer
regex-tokenizer 0.1.1
A regex tokenizer
2 versions - Latest release: about 1 year ago - 606 downloads total - 0 stars on GitHub - 2 maintainers
token 1.0.0-rc1
A simple string-tokenizer (and sentence splitter) Note: If you find that you would like to use t...
1 version - Latest release: over 9 years ago - 1 dependent repositories - 2.01 thousand downloads total - 5 stars on GitHub - 1 maintainer
tokengeex 1.0.1
TokenGeeX is an efficient tokenizer for code based on UnigramLM and TokenMonster.
10 versions - Latest release: 15 days ago - 2.99 thousand downloads total - 3 stars on GitHub - 1 maintainer
javascript_lexer 0.1.8
Javascript lexer
9 versions - Latest release: about 4 years ago - 4.78 thousand downloads total - 8 stars on GitHub - 1 maintainer
sana_core 0.1.1
The core of Sana
2 versions - Latest release: almost 4 years ago - 2 dependent packages - 1 dependent repositories - 1.91 thousand downloads total - 1 maintainer
jayce 12.1.0
jayce is a tokenizer 🌌
34 versions - Latest release: 3 months ago - 10.3 thousand downloads total - 1 stars on GitHub - 1 maintainer
lang_pt 0.1.2
A parser tool to generate recursive descent top down parser.
3 versions - Latest release: over 1 year ago - 950 downloads total - 6 stars on GitHub - 1 maintainer
giron 0.1.2
ECMAScript parser which outputs ESTree JSON.
3 versions - Latest release: over 4 years ago - 1 dependent repositories - 1.4 thousand downloads total - 20 stars on GitHub - 1 maintainer
tantivy-tokenizer-tiny-segmenter 0.3.0
A Japanese tokenizer for Tantivy, based on TinySegmenter.
3 versions - Latest release: over 4 years ago - 1 dependent repositories - 2.23 thousand downloads total - 12 stars on GitHub - 1 maintainer
char-lex 1.0.5
Create easy enum based lexers
11 versions - Latest release: about 4 years ago - 4.24 thousand downloads total - 1 maintainer
another-tiktoken-rs 0.1.2
Library for encoding and decoding with the tiktoken library in Rust
3 versions - Latest release: 9 months ago - 849 downloads total - 200 stars on GitHub - 1 maintainer
Top 7.2% on crates.io
tiktoken-rs 0.5.9
Library for encoding and decoding with the tiktoken library in Rust
28 versions - Latest release: 17 days ago - 39 dependent packages - 73 dependent repositories - 319 thousand downloads total - 200 stars on GitHub - 1 maintainer
simple-cursor 0.1.1
A super simple character cursor implementation geared towards lexers/tokenizers.
2 versions - Latest release: 11 months ago - 629 downloads total - 0 stars on GitHub - 1 maintainer
gpt_tokenizer 0.1.0
Rust BPE Encoder Decoder (Tokenizer) for GPT-2 / GPT-3
1 version - Latest release: about 1 year ago - 1 dependent package - 478 downloads total - 12 stars on GitHub - 1 maintainer
lindera-ipadic-neologd 0.30.0 💰
A Japanese morphological dictionary for IPADIC NEologd.
8 versions - Latest release: about 2 months ago - 1 dependent package - 4.11 thousand downloads total - 349 stars on GitHub - 1 maintainer
punkt 1.0.5
An implementation of a Punkt sentence tokenizer
8 versions - Latest release: over 5 years ago - 3 dependent packages - 3 dependent repositories - 11.5 thousand downloads total - 34 stars on GitHub - 1 maintainer
Top 6.1% on crates.io
lindera-ipadic 0.30.0 💰
A Japanese morphological dictionary for IPADIC.
59 versions - Latest release: about 2 months ago - 4 dependent packages - 126 dependent repositories - 203 thousand downloads total - 349 stars on GitHub - 4 maintainers
Top 4.9% on crates.io
lindera-core 0.30.0 💰
A morphological analysis library.
54 versions - Latest release: about 2 months ago - 28 dependent packages - 239 dependent repositories - 338 thousand downloads total - 349 stars on GitHub - 4 maintainers
Top 8.7% on crates.io
lindera-unidic 0.30.0 💰
A Japanese morphological dictionary for UniDic.
36 versions - Latest release: about 2 months ago - 2 dependent packages - 3 dependent repositories - 87.3 thousand downloads total - 349 stars on GitHub - 1 maintainer
Top 6.2% on crates.io
lindera-unidic-builder 0.30.0 💰
A Japanese morphological dictionary builder for UniDic.
47 versions - Latest release: about 2 months ago - 6 dependent packages - 40 dependent repositories - 307 thousand downloads total - 349 stars on GitHub - 4 maintainers
Top 7.0% on crates.io
lindera-compress 0.30.0 💰
A morphological analysis library.
35 versions - Latest release: about 2 months ago - 5 dependent packages - 21 dependent repositories - 164 thousand downloads total - 349 stars on GitHub - 1 maintainer
Top 5.7% on crates.io
lindera-decompress 0.30.0 💰
A morphological analysis library.
35 versions - Latest release: about 2 months ago - 11 dependent packages - 41 dependent repositories - 307 thousand downloads total - 349 stars on GitHub - 1 maintainer
Top 5.6% on crates.io
lindera-ipadic-builder 0.30.0 💰
A Japanese morphological dictionary builder for IPADIC.
54 versions - Latest release: about 2 months ago - 7 dependent packages - 235 dependent repositories - 324 thousand downloads total - 349 stars on GitHub - 4 maintainers
Top 6.2% on crates.io
lindera-ko-dic-builder 0.30.0 💰
A Korean morphological dictionary builder for ko-dic.
42 versions - Latest release: about 2 months ago - 6 dependent packages - 40 dependent repositories - 305 thousand downloads total - 349 stars on GitHub - 4 maintainers
Top 8.0% on crates.io
lindera-ipadic-neologd-builder 0.30.0 💰
A Japanese morphological dictionary builder for IPADIC NEologd.
15 versions - Latest release: about 2 months ago - 3 dependent packages - 3 dependent repositories - 157 thousand downloads total - 349 stars on GitHub - 4 maintainers
Top 4.8% on crates.io
xmlparser 0.13.6
Pull-based, zero-allocation XML parser.
24 versions - Latest release: 8 months ago - 35 dependent packages - 2,453 dependent repositories - 16.8 million downloads total - 128 stars on GitHub - 2 maintainers
text-splitter 0.13.1
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by chara...
30 versions - Latest release: 25 days ago - 5 dependent packages - 1 dependent repositories - 37.3 thousand downloads total - 135 stars on GitHub - 1 maintainer
Top 6.5% on crates.io
svgtypes 0.15.1
SVG types parser.
23 versions - Latest release: 25 days ago - 26 dependent packages - 532 dependent repositories - 1.99 million downloads total - 66 stars on GitHub - 1 maintainer
logos-derive2 💰
Create ridiculously fast Lexers
6 versions - Latest release: 29 days ago - 1 dependent package - 1.54 thousand downloads total - 2,632 stars on GitHub - 1 maintainer
logos-codegen2 💰
Create ridiculously fast Lexers
6 versions - Latest release: 29 days ago - 2 dependent packages - 1.55 thousand downloads total - 2,632 stars on GitHub - 1 maintainer
Top 2.8% on crates.io
logos 0.14.0 💰
Create ridiculously fast Lexers
50 versions - Latest release: 4 months ago - 235 dependent packages - 606 dependent repositories - 5.75 million downloads total - 2,632 stars on GitHub - 2 maintainers
libsql-sqlite3-parser 0.11.1
SQL parser (as understood by SQLite) (libsql fork)
2 versions - Latest release: 3 months ago - 1 dependent package - 15.5 thousand downloads total - 1 maintainer
lexers 0.1.4
Tools for tokenizing and scanning
11 versions - Latest release: about 2 years ago - 8 dependent packages - 4 dependent repositories - 14.4 thousand downloads total - 64 stars on GitHub - 1 maintainer
htmlparser 0.1.1
Pull-based, zero-allocation HTML parser.
2 versions - Latest release: 11 months ago - 2 dependent packages - 2.62 thousand downloads total - 0 stars on GitHub - 1 maintainer
bracoxide 0.1.3
A feature-rich library for brace pattern combination, permutation generation, and error handling.
4 versions - Latest release: 9 months ago - 2 dependent packages - 6 dependent repositories - 89.4 thousand downloads total - 1 stars on GitHub - 1 maintainer
svgrtypes 0.42.2
SVG types parser.
16 versions - Latest release: 19 days ago - 2 dependent packages - 5.34 thousand downloads total - 1 maintainer
rustfst-ffi 1.0.1
Library for constructing, combining, optimizing, and searching weighted finite-state transducers ...
13 versions - Latest release: about 1 month ago - 3.61 thousand downloads total - 138 stars on GitHub - 1 maintainer
rustfst 1.0.1
Library for constructing, combining, optimizing, and searching weighted finite-state transducers ...
47 versions - Latest release: about 1 month ago - 3 dependent packages - 1 dependent repositories - 286 thousand downloads total - 138 stars on GitHub - 1 maintainer
langbox 0.5.0
A simple framework to build compilers and interpreters
8 versions - Latest release: 8 months ago - 2.61 thousand downloads total - 0 stars on GitHub - 1 maintainer
uscan 0.1.3
A universal source code scanner
4 versions - Latest release: over 1 year ago - 1.24 thousand downloads total - 0 stars on GitHub - 1 maintainer
tiniestsegmenter 0.1.1
Compact Japanese segmenter
2 versions - Latest release: 21 days ago - 226 downloads total - 0 stars on GitHub - 1 maintainer
Top 3.5% on crates.io
logos-derive 0.14.0 💰
Create ridiculously fast Lexers
46 versions - Latest release: 4 months ago - 7 dependent packages - 539 dependent repositories - 5.75 million downloads total - 2,632 stars on GitHub - 2 maintainers
Top 8.3% on crates.io
html5gum 0.5.7
A WHATWG-compliant HTML5 tokenizer and tag soup parser.
14 versions - Latest release: 10 months ago - 2 dependent packages - 220 dependent repositories - 639 thousand downloads total - 146 stars on GitHub - 1 maintainer
Top 6.2% on crates.io
logos-codegen 0.14.0 💰
Create ridiculously fast Lexers
2 versions - Latest release: 4 months ago - 2 dependent packages - 28 dependent repositories - 1.42 million downloads total - 2,632 stars on GitHub - 2 maintainers
nipah_tokenizer 0.1.0
A powerful yet simple text tokenizer for your everyday needs!
1 version - Latest release: over 1 year ago - 409 downloads total - 0 stars on GitHub - 1 maintainer
sqlite3_tokenizer 0.1.0
Tokenizes SQL strings as SQLite would
1 version - Latest release: almost 9 years ago - 1.84 thousand downloads total - 0 stars on GitHub - 1 maintainer
regex-lexer 0.2.0
A regex-based lexer (tokenizer)
3 versions - Latest release: almost 2 years ago - 3 dependent packages - 4 dependent repositories - 9.53 thousand downloads total - 6 stars on GitHub - 1 maintainer
rust-forth-tokenizer 0.2.0
A Forth tokenizer written in Rust.
9 versions - Latest release: over 4 years ago - 1 dependent package - 5.13 thousand downloads total - 1 stars on GitHub - 1 maintainer
sql-script-parser 0.1.2 💰
sql-script-parser iterates over SQL statements in SQL script.
3 versions - Latest release: about 3 years ago - 1.42 thousand downloads total - 2 stars on GitHub - 1 maintainer
tokenizer 0.1.2
Thai text tokenizer
2 versions - Latest release: about 4 years ago - 1.18 thousand downloads total - 3 stars on GitHub - 1 maintainer
regex-lexer-lalrpop 0.3.0
A regex-based lexer (tokenizer)
4 versions - Latest release: over 2 years ago - 1.35 thousand downloads total - 0 stars on GitHub - 1 maintainer
nlpo3 1.3.2
Thai natural language processing library, with Python and Node bindings
7 versions - Latest release: over 2 years ago - 1 dependent package - 1 dependent repositories - 4.6 thousand downloads total - 30 stars on GitHub - 2 maintainers
nlpo3-cli 0.2.0
Command line interface for nlpO3, a Thai natural language processing library
3 versions - Latest release: almost 3 years ago - 1.12 thousand downloads total - 30 stars on GitHub - 2 maintainers
fuzzy-pickles 0.1.1
A low-level parser of Rust source code with high-level visitor implementations
2 versions - Latest release: almost 4 years ago - 1 dependent repositories - 1.59 thousand downloads total - 7 stars on GitHub - 1 maintainer
erl_tokenize 0.6.1 💰
Erlang source code tokenizer
28 versions - Latest release: 3 months ago - 5 dependent packages - 3 dependent repositories - 22.3 thousand downloads total - 8 stars on GitHub - 1 maintainer