Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

crates.io "tokenizer" keyword

nlpo3 1.3.2
Thai natural language processing library, with Python and Node bindings
7 versions - Latest release: over 2 years ago - 1 dependent package - 1 dependent repositories - 4.6 thousand downloads total - 30 stars on GitHub - 2 maintainers
nlpo3-cli 0.2.0
Command line interface for nlpO3, a Thai natural language processing library
3 versions - Latest release: almost 3 years ago - 1.12 thousand downloads total - 30 stars on GitHub - 2 maintainers
sqlite3-parser 0.12.0
SQL parser (as understood by SQLite)
11 versions - Latest release: 6 months ago - 3 dependent packages - 2 dependent repositories - 59.3 thousand downloads total - 41 stars on GitHub - 1 maintainer
fuzzy-pickles 0.1.1
A low-level parser of Rust source code with high-level visitor implementations
2 versions - Latest release: over 3 years ago - 1 dependent repositories - 1.59 thousand downloads total - 7 stars on GitHub - 1 maintainer
erl_tokenize 0.6.1 💰
Erlang source code tokenizer
28 versions - Latest release: 3 months ago - 5 dependent packages - 3 dependent repositories - 22.3 thousand downloads total - 8 stars on GitHub - 1 maintainer
pgn-lexer 0.1.1
A lexer for PGN files for chess. Provides an iterator over the tokens from a byte stream.
3 versions - Latest release: over 6 years ago - 1.92 thousand downloads total - 1 stars on GitHub - 1 maintainer
Top 8.3% on crates.io
html5gum 0.5.7
A WHATWG-compliant HTML5 tokenizer and tag soup parser.
14 versions - Latest release: 10 months ago - 2 dependent packages - 220 dependent repositories - 639 thousand downloads total - 146 stars on GitHub - 1 maintainer
svgrtypes 0.42.0
SVG types parser.
15 versions - Latest release: 1 day ago - 2 dependent packages - 5.06 thousand downloads total - 1 maintainer
c-lexer-stable 0.1.4
C lexer
4 versions - Latest release: about 3 years ago - 17 thousand downloads total - 2 stars on GitHub - 1 maintainer
Top 2.5% on crates.io
tokenizers 0.19.1
Provides an implementation of today's most used tokenizers, with a focus on performances and vers...
25 versions - Latest release: 23 days ago - 20 dependent packages - 281 dependent repositories - 741 thousand downloads total - 8,474 stars on GitHub - 3 maintainers
indentation_flattener 0.1.0
From indented input, generate plain output with indentation PUSH and POP codes.
1 version - Latest release: about 7 years ago - 935 downloads total - 0 stars on GitHub - 1 maintainer
json-parser 1.0.2
JSON parser
3 versions - Latest release: almost 5 years ago - 2.92 thousand downloads total - 4 stars on GitHub - 1 maintainer
indent_tokenizer 0.4.0
Generate tokens based on indentation
4 versions - Latest release: over 6 years ago - 2.9 thousand downloads total - 1 stars on GitHub - 1 maintainer
rustpostal 0.3.0
Rust bindings to libpostal
4 versions - Latest release: about 2 years ago - 1.68 thousand downloads total - 14 stars on GitHub - 1 maintainer
lox-scanner 0.1.0
lexical scanner for Lox
3 versions - Latest release: over 2 years ago - 1.07 thousand downloads total - 0 stars on GitHub - 1 maintainer
condex 1.0.0 💰
Extract tokens by simple condition expression.
1 version - Latest release: over 2 years ago - 431 downloads total - 2 stars on GitHub - 1 maintainer
Top 6.5% on crates.io
svgtypes 0.15.1
SVG types parser.
23 versions - Latest release: 3 days ago - 24 dependent packages - 532 dependent repositories - 1.99 million downloads total - 66 stars on GitHub - 1 maintainer
lexers 0.1.4
Tools for tokenizing and scanning
11 versions - Latest release: about 2 years ago - 7 dependent packages - 4 dependent repositories - 14.4 thousand downloads total - 64 stars on GitHub - 1 maintainer
absolution 0.1.1
'Freedom from `syn`'. A lightweight Rust lexer designed for use in bang-style proc macros.
3 versions - Latest release: about 4 years ago - 1 dependent package - 2.14 thousand downloads total - 107 stars on GitHub - 1 maintainer
text-splitter 0.13.1
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by chara...
30 versions - Latest release: 3 days ago - 1 dependent repositories - 37.3 thousand downloads total - 135 stars on GitHub - 1 maintainer
Top 4.8% on crates.io
xmlparser 0.13.6
Pull-based, zero-allocation XML parser.
24 versions - Latest release: 7 months ago - 31 dependent packages - 2,453 dependent repositories - 16.8 million downloads total - 128 stars on GitHub - 2 maintainers
alpino-tokenize 0.4.0
Wrapper around the Alpino tokenizer for Dutch
5 versions - Latest release: 6 months ago - 2.19 thousand downloads total - 3 stars on GitHub - 1 maintainer
alpino-tokenizer-sys 0.2.1
Low-level wrapper around the Alpino tokenizer for Dutch
3 versions - Latest release: almost 4 years ago - 1 dependent package - 1 dependent repositories - 2.38 thousand downloads total - 3 stars on GitHub - 1 maintainer
alpino-tokenizer 0.4.0
Wrapper around the Alpino tokenizer for Dutch
4 versions - Latest release: 6 months ago - 1 dependent package - 2 dependent repositories - 3.34 thousand downloads total - 3 stars on GitHub - 1 maintainer
vaporetto 0.6.3
Vaporetto: a pointwise prediction based tokenizer
16 versions - Latest release: about 1 year ago - 3 dependent packages - 1 dependent repositories - 70.8 thousand downloads total - 213 stars on GitHub - 1 maintainer
vaporetto_tantivy 0.20.0
Vaporetto Tokenizer for Tantivy
10 versions - Latest release: 11 months ago - 3.14 thousand downloads total - 213 stars on GitHub - 1 maintainer
vaporetto_rules 0.6.3
Rule-base filters for Vaporetto
10 versions - Latest release: about 1 year ago - 1 dependent package - 1 dependent repositories - 33.2 thousand downloads total - 213 stars on GitHub - 1 maintainer
parsit 0.2.0
very simple lib, the parsing combinators, recursive descendent that uses logos as lexer
17 versions - Latest release: 10 months ago - 1 dependent package - 1 dependent repositories - 5.28 thousand downloads total - 7 stars on GitHub - 1 maintainer
bracoxide 0.1.3
A feature-rich library for brace pattern combination, permutation generation, and error handling.
4 versions - Latest release: 8 months ago - 1 dependent package - 6 dependent repositories - 89.4 thousand downloads total - 1 stars on GitHub - 1 maintainer
libsql-sqlite3-parser 0.11.1
SQL parser (as understood by SQLite) (libsql fork)
2 versions - Latest release: 2 months ago - 15.5 thousand downloads total - 1 maintainer
luther-derive 0.1.0
The proc macro generator for the Luther lexer generator.
1 version - Latest release: almost 6 years ago - 1.3 thousand downloads total - 5 stars on GitHub - 1 maintainer
cang-jie 0.18.0
A Chinese tokenizer for tantivy
20 versions - Latest release: 6 months ago - 6 dependent packages - 13 dependent repositories - 25 thousand downloads total - 68 stars on GitHub - 1 maintainer
bleuscore 0.1.2
A fast(not yet :) bleu score calculator
3 versions - Latest release: 11 days ago - 581 downloads total - 0 stars on GitHub - 1 maintainer
c_lexer 0.1.1
C lexer
2 versions - Latest release: about 5 years ago - 1 dependent package - 1 dependent repositories - 2.22 thousand downloads total - 6 stars on GitHub - 1 maintainer
bytepiece 0.2.0
Rust version of bytepiece tokenizer
2 versions - Latest release: 8 months ago - 527 downloads total - 9 stars on GitHub - 1 maintainer
tokengeex 1.0.0
TokenGeeX is an efficient tokenizer for code based on UnigramLM and TokenMonster.
9 versions - Latest release: 13 days ago - 2.84 thousand downloads total - 1 maintainer
lindera-tantivy 0.27.1 💰
Lindera Tokenizer for Tantivy.
40 versions - Latest release: 5 months ago - 5 dependent packages - 7 dependent repositories - 21.7 thousand downloads total - 46 stars on GitHub - 4 maintainers
tantivy-stemmers 0.2.0
A collection of Tantivy stemmer tokenizers
2 versions - Latest release: 7 days ago - 185 downloads total - 0 stars on GitHub - 2 maintainers
Top 6.2% on crates.io
logos-codegen 0.14.0 💰
Create ridiculously fast Lexers
2 versions - Latest release: 3 months ago - 2 dependent packages - 28 dependent repositories - 1.42 million downloads total - 2,632 stars on GitHub - 2 maintainers
logos2 💰
Create ridiculously fast Lexers
6 versions - Latest release: 7 days ago - 1.51 thousand downloads total - 2,632 stars on GitHub - 1 maintainer
logos-derive2 💰
Create ridiculously fast Lexers
6 versions - Latest release: 7 days ago - 1.54 thousand downloads total - 2,632 stars on GitHub - 1 maintainer
Top 2.8% on crates.io
logos 0.14.0 💰
Create ridiculously fast Lexers
50 versions - Latest release: 3 months ago - 199 dependent packages - 606 dependent repositories - 5.75 million downloads total - 2,632 stars on GitHub - 2 maintainers
logos-codegen2 💰
Create ridiculously fast Lexers
6 versions - Latest release: 7 days ago - 1.55 thousand downloads total - 2,632 stars on GitHub - 1 maintainer
Top 3.5% on crates.io
logos-derive 0.14.0 💰
Create ridiculously fast Lexers
46 versions - Latest release: 3 months ago - 7 dependent packages - 539 dependent repositories - 5.75 million downloads total - 2,632 stars on GitHub - 2 maintainers
logos-cli2 💰
Create ridiculously fast Lexers
6 versions - Latest release: 7 days ago - 1.5 thousand downloads total - 2,632 stars on GitHub - 1 maintainer
logos-cli 0.14.0 💰
Create ridiculously fast Lexers
2 versions - Latest release: 3 months ago - 608 downloads total - 2,632 stars on GitHub - 2 maintainers
tantivy-czech-stemmer 0.2.1
Czech stemmer as Tantivy tokenizer
2 versions - Latest release: 9 days ago - 281 downloads total - 0 stars on GitHub - 2 maintainers
pkl_fast 0.1.1
A library aiming to easily and efficiently work with Apple's PKL format.
2 versions - Latest release: 3 months ago - 620 downloads total - 3 stars on GitHub - 1 maintainer
luther 0.1.0
The runtime components of the Luther lexer generator.
1 version - Latest release: almost 6 years ago - 1 dependent package - 1 dependent repositories - 1.84 thousand downloads total - 5 stars on GitHub - 1 maintainer
Top 7.0% on crates.io
charabia 0.8.10
A simple library to detect the language, tokenize the text and normalize the tokens
19 versions - Latest release: 10 days ago - 3 dependent packages - 33 dependent repositories - 215 thousand downloads total - 211 stars on GitHub - 2 maintainers
rustfst-ffi 1.0.1
Library for constructing, combining, optimizing, and searching weighted finite-state transducers ...
13 versions - Latest release: 8 days ago - 3.3 thousand downloads total - 138 stars on GitHub - 1 maintainer
rustfst 1.0.1
Library for constructing, combining, optimizing, and searching weighted finite-state transducers ...
47 versions - Latest release: 8 days ago - 3 dependent packages - 1 dependent repositories - 282 thousand downloads total - 138 stars on GitHub - 1 maintainer
rust_transformers 0.2.0
High performance tokenizers for Rust
2 versions - Latest release: about 4 years ago - 1 dependent package - 1.01 thousand downloads total - 270 stars on GitHub - 1 maintainer
Top 6.3% on crates.io
rust_tokenizers 8.1.1
High performance tokenizers for Rust
34 versions - Latest release: 7 months ago - 7 dependent packages - 225 dependent repositories - 164 thousand downloads total - 270 stars on GitHub - 1 maintainer
fileql 0.3.0 💰
A tool to run SQL-like query on local files using GitQL SDK
3 versions - Latest release: 11 days ago - 919 downloads total - 55 stars on GitHub - 1 maintainer
xxcalc 0.2.1
Embeddable or standalone robust floating-point polynomial calculator
4 versions - Latest release: over 7 years ago - 1 dependent repositories - 8.17 thousand downloads total - 13 stars on GitHub - 1 maintainer
html5tokenizer 0.5.2
An HTML5 tokenizer with code span support.
7 versions - Latest release: 8 months ago - 1 dependent repositories - 2.2 thousand downloads total - 1 maintainer
tele_tokenizer 0.2.0
A CSS tokenizer
2 versions - Latest release: about 2 years ago - 3 dependent packages - 1 dependent repositories - 1.69 thousand downloads total - 199 stars on GitHub - 1 maintainer
Top 7.2% on crates.io
tiktoken-rs 0.5.8
Library for encoding and decoding with the tiktoken library in Rust
27 versions - Latest release: 5 months ago - 24 dependent packages - 73 dependent repositories - 290 thousand downloads total - 198 stars on GitHub - 1 maintainer
another-tiktoken-rs 0.1.2
Library for encoding and decoding with the tiktoken library in Rust
3 versions - Latest release: 8 months ago - 768 downloads total - 198 stars on GitHub - 1 maintainer
bytepiece_rs 0.2.2
The Bytepiece Tokenizer Implemented in Rust
7 versions - Latest release: 6 months ago - 1 dependent package - 2.07 thousand downloads total - 14 stars on GitHub - 1 maintainer
svgparser 0.8.1
Featureful, pull-based, zero-allocation SVG parser.
21 versions - Latest release: about 6 years ago - 4 dependent packages - 98 dependent repositories - 101 thousand downloads total - 22 stars on GitHub - 1 maintainer
Top 5.7% on crates.io
lindera-decompress 0.30.0 💰
A morphological analysis library.
35 versions - Latest release: 27 days ago - 11 dependent packages - 41 dependent repositories - 283 thousand downloads total - 349 stars on GitHub - 1 maintainer
Top 6.2% on crates.io
lindera-cc-cedict-builder 0.30.0 💰
A Chinese morphological dictionary builder for CC-CEDICT.
37 versions - Latest release: 27 days ago - 5 dependent packages - 40 dependent repositories - 279 thousand downloads total - 349 stars on GitHub - 1 maintainer
Top 6.2% on crates.io
lindera-ko-dic-builder 0.30.0 💰
A Korean morphological dictionary builder for ko-dic.
42 versions - Latest release: 27 days ago - 5 dependent packages - 40 dependent repositories - 281 thousand downloads total - 349 stars on GitHub - 4 maintainers
Top 6.2% on crates.io
lindera-unidic-builder 0.30.0 💰
A Japanese morphological dictionary builder for UniDic.
47 versions - Latest release: 27 days ago - 5 dependent packages - 40 dependent repositories - 282 thousand downloads total - 349 stars on GitHub - 4 maintainers
tusk_lexer 0.4.7
The lexical analysis component of Tusk.
21 versions - Latest release: almost 3 years ago - 1 dependent package - 6.9 thousand downloads total - 1 maintainer
lexical_scanner 0.1.18
A simple lexer which creates over 115+ various tokens based on the rust programming language. Thi...
19 versions - Latest release: about 2 years ago - 7.25 thousand downloads total - 2 stars on GitHub - 1 maintainer
Top 5.4% on crates.io
lindera 0.30.0 💰
A morphological analysis library.
72 versions - Latest release: 27 days ago - 10 dependent packages - 124 dependent repositories - 182 thousand downloads total - 349 stars on GitHub - 4 maintainers
Top 5.2% on crates.io
lindera-dictionary 0.30.0 💰
A Japanese morphological dictionary.
52 versions - Latest release: 27 days ago - 10 dependent packages - 238 dependent repositories - 306 thousand downloads total - 349 stars on GitHub - 4 maintainers
Top 5.6% on crates.io
lindera-ipadic-builder 0.30.0 💰
A Japanese morphological dictionary builder for IPADIC.
54 versions - Latest release: 27 days ago - 6 dependent packages - 235 dependent repositories - 299 thousand downloads total - 349 stars on GitHub - 4 maintainers
Top 4.9% on crates.io
lindera-core 0.30.0 💰
A morphological analysis library.
54 versions - Latest release: 27 days ago - 28 dependent packages - 239 dependent repositories - 313 thousand downloads total - 349 stars on GitHub - 4 maintainers
Top 6.1% on crates.io
lindera-ipadic 0.30.0 💰
A Japanese morphological dictionary for IPADIC.
59 versions - Latest release: 27 days ago - 4 dependent packages - 126 dependent repositories - 193 thousand downloads total - 349 stars on GitHub - 4 maintainers
Top 7.9% on crates.io
plex 0.3.0
A syntax extension for writing lexers and parsers.
10 versions - Latest release: 7 months ago - 4 dependent packages - 20 dependent repositories - 35 thousand downloads total - 396 stars on GitHub - 1 maintainer
rye-grain 0.0.1
A Python to Rust translator
1 version - Latest release: over 1 year ago - 401 downloads total - 1 stars on GitHub - 1 maintainer
vibrato 0.5.1
Vibrato: viterbi-based accelerated tokenizer
11 versions - Latest release: 12 months ago - 1 dependent package - 1 dependent repositories - 11.6 thousand downloads total - 292 stars on GitHub - 2 maintainers
azul-simplecss 0.1.1
A very simple CSS 2.1 tokenizer.
2 versions - Latest release: almost 5 years ago - 1 dependent package - 4 dependent repositories - 17.3 thousand downloads total - 29 stars on GitHub - 1 maintainer
chat-splitter 0.1.1 💰
Never exceed OpenAI's chat models' maximum number of tokens when using the async_openai Rust crate
2 versions - Latest release: 8 months ago - 550 downloads total - 2 stars on GitHub - 1 maintainer
ast-rs 0.0.1
AST Toolkit for Rust
1 version - Latest release: almost 2 years ago - 345 downloads total - 0 stars on GitHub - 1 maintainer
rustrawi 0.1.2 💰
Rust port of the original PHP Sastrawi
3 versions - Latest release: about 1 year ago - 713 downloads total - 0 stars on GitHub - 1 maintainer
sana 0.1.1
Create lexers easily
2 versions - Latest release: over 3 years ago - 1 dependent repositories - 992 downloads total - 1 maintainer
sentencepiece-model 0.1.0 💰
Sentencepiece model parser
1 version - Latest release: 6 months ago - 311 downloads total - 0 stars on GitHub - 1 maintainer
tokenizer-lib 1.5.1
Tokenization utilities for building parsers in Rust
14 versions - Latest release: 8 months ago - 2 dependent packages - 1 dependent repositories - 6.55 thousand downloads total - 2 stars on GitHub - 1 maintainer
blex 0.2.2
A lightweight lexing framework
4 versions - Latest release: about 1 year ago - 1 dependent package - 1.17 thousand downloads total - 0 stars on GitHub - 1 maintainer
basic_lexer 0.2.1
Basic lexical analyzer for parsing and compiling
6 versions - Latest release: over 2 years ago - 1.79 thousand downloads total - 0 stars on GitHub - 1 maintainer
pretok 0.1.0
A string pre-tokenizer for C-like syntaxes.
1 version - Latest release: over 3 years ago - 1 dependent repositories - 457 downloads total - 0 stars on GitHub - 1 maintainer
sana_derive 0.1.1
The derive macro for Sana
2 versions - Latest release: over 3 years ago - 1 dependent package - 1 dependent repositories - 1.45 thousand downloads total - 1 maintainer
blingfire 1.0.0
Wrapper for the BlingFire tokenization library
5 versions - Latest release: almost 4 years ago - 47.8 thousand downloads total - 16 stars on GitHub - 1 maintainer
blingfire-sys 1.0.1
Bindings to the BlingFire C++ library
5 versions - Latest release: almost 4 years ago - 1 dependent package - 48.5 thousand downloads total - 16 stars on GitHub - 1 maintainer
aleph-alpha-tokenizer 0.3.1
A fast implementation of a wordpiece-inspired tokenizer
4 versions - Latest release: almost 4 years ago - 1.86 thousand downloads total - 6 stars on GitHub - 1 maintainer
nnsplit 0.5.9
A tool to split text using a neural network. For sentence boundary detection, compound splitting ...
29 versions - Latest release: about 1 year ago - 1 dependent repositories - 12.4 thousand downloads total - 476 stars on GitHub - 1 maintainer
token 1.0.0-rc1
A simple string-tokenizer (and sentence splitter) Note: If you find that you would like to use t...
1 version - Latest release: about 9 years ago - 1 dependent repositories - 1.95 thousand downloads total - 5 stars on GitHub - 1 maintainer
regex-tokenizer 0.1.1
A regex tokenizer
2 versions - Latest release: about 1 year ago - 529 downloads total - 0 stars on GitHub - 2 maintainers
sana_core 0.1.1
The core of Sana
2 versions - Latest release: over 3 years ago - 2 dependent packages - 1 dependent repositories - 1.77 thousand downloads total - 1 maintainer
jayce 12.1.0
jayce is a tokenizer 🌌
34 versions - Latest release: about 2 months ago - 8.91 thousand downloads total - 1 stars on GitHub - 1 maintainer
javascript_lexer 0.1.8
Javascript lexer
9 versions - Latest release: almost 4 years ago - 4.43 thousand downloads total - 8 stars on GitHub - 1 maintainer
lang_pt 0.1.2
A parser tool to generate recursive descent top down parser.
3 versions - Latest release: over 1 year ago - 829 downloads total - 6 stars on GitHub - 1 maintainer
libsimple 0.1.0
Rust bindings to simple, a SQLite3 fts5 tokenizer which supports Chinese and PinYin.
2 versions - Latest release: 26 days ago - 136 downloads total - 1 maintainer
tantivy-tokenizer-tiny-segmenter 0.3.0
A Japanese tokenizer for Tantivy, based on TinySegmenter.
3 versions - Latest release: over 4 years ago - 1 dependent repositories - 2.08 thousand downloads total - 12 stars on GitHub - 1 maintainer
char-lex 1.0.5
Create easy enum based lexers
11 versions - Latest release: about 4 years ago - 3.81 thousand downloads total - 1 maintainer