pypi.org "bpe" keyword
View the packages on the pypi.org package registry that are tagged with the "bpe" keyword.
Top 4.7% on pypi.org
18 versions - Latest release: about 1 month ago - 6 dependent packages - 11 dependent repositories - 9.74 thousand downloads last month - 165 stars on GitHub - 2 maintainers
nfelib 2.2.0
nfelib: electronic invoicing library for Brazil18 versions - Latest release: about 1 month ago - 6 dependent packages - 11 dependent repositories - 9.74 thousand downloads last month - 165 stars on GitHub - 2 maintainers
blt-tokenizer 0.2.2
High-performance byte-level tokenizer with BPE support3 versions - Latest release: 2 months ago - 54 downloads last month - 0 stars on GitHub - 1 maintainer
Top 1.6% on pypi.org
8 versions - Latest release: over 3 years ago - 10 dependent packages - 213 dependent repositories - 18.6 thousand downloads last month - 2,146 stars on GitHub - 1 maintainer
subword-nmt 0.3.8
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation8 versions - Latest release: over 3 years ago - 10 dependent packages - 213 dependent repositories - 18.6 thousand downloads last month - 2,146 stars on GitHub - 1 maintainer
bpe-summarizer 0.2.1
This summarizer attempts to leverage Byte Pair Encoding (BPE) tokenization and the Bart vocabular...8 versions - Latest release: about 5 years ago - 1 dependent repositories - 44 downloads last month - 3 stars on GitHub - 1 maintainer
bpeasy 0.1.6
Fast bare-bones BPE for modern tokenizer training7 versions - Latest release: 3 months ago - 2.11 thousand downloads last month - 164 stars on GitHub - 1 maintainer
tinybpe 0.1.1
This is an ultra-fast, lightweight and clean code implementation of the Byte Pair Encoding (BPE) ...2 versions - Latest release: 5 months ago - 231 downloads last month - 4 stars on GitHub - 1 maintainer
nlcodec 0.4.0
nlcodec is a collection of encoding schemes for natural language sequences. nlcodec.db is a effi...10 versions - Latest release: about 4 years ago - 1 dependent package - 2 dependent repositories - 68 downloads last month - 5 stars on GitHub - 1 maintainer
kitoken 0.10.1 💰
Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization2 versions - Latest release: 9 months ago - 95 downloads last month - 29 stars on GitHub - 1 maintainer
pyregtokenizer 0.0.2
A BPE Tokenizer using regex2 versions - Latest release: over 1 year ago - 5 downloads last month - 1 maintainer
tokendagger 0.1.1
High-Performance Implementation of OpenAI's TikToken - 2x Throughput, 4x Faster Code Tokenization2 versions - Latest release: 2 months ago - 163 downloads last month - 432 stars on GitHub - 1 maintainer
Top 3.6% on pypi.org
66 versions - Latest release: over 2 years ago - 3 dependent packages - 103 dependent repositories - 22.8 thousand downloads last month - 310 stars on GitHub - 4 maintainers
pyonmttok 1.37.1
Fast and customizable text tokenization library with BPE and SentencePiece support66 versions - Latest release: over 2 years ago - 3 dependent packages - 103 dependent repositories - 22.8 thousand downloads last month - 310 stars on GitHub - 4 maintainers
ultra-tokenizer 0.1.0
Advanced tokenizer with support for BPE, WordPiece, and Unigram algorithms1 version - Latest release: 22 days ago - 1 stars on GitHub - 1 maintainer
Top 8.2% on pypi.org
2 versions - Latest release: over 6 years ago - 11 dependent repositories - 742 downloads last month - 229 stars on GitHub - 1 maintainer
bpe 0.2.1
Byte pair encoding for graceful handling of rare words in NLP2 versions - Latest release: over 6 years ago - 11 dependent repositories - 742 downloads last month - 229 stars on GitHub - 1 maintainer
rs-bpe 0.1.0
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust1 version - Latest release: 6 months ago - 475 downloads last month - 6 stars on GitHub - 1 maintainer
Top 2.2% on pypi.org
8 versions - Latest release: over 5 years ago - 8 dependent packages - 228 dependent repositories - 93.6 thousand downloads last month - 968 stars on GitHub - 3 maintainers
youtokentome 1.0.6
Unsupervised text tokenizer focused on computational efficiency8 versions - Latest release: over 5 years ago - 8 dependent packages - 228 dependent repositories - 93.6 thousand downloads last month - 968 stars on GitHub - 3 maintainers
smoltoken 0.1.4
A light-weight & fast library for Byte Pair Encoding (BPE) tokenization.5 versions - Latest release: 5 months ago - 170 downloads last month - 7 stars on GitHub - 1 maintainer
code-tokenizers 0.0.5
Aligning BPE and AST5 versions - Latest release: over 2 years ago - 1 dependent package - 21 downloads last month - 8 stars on GitHub - 1 maintainer
nfelib-xsdata 0.9.2
nfelib: electronic invoicing library for Brazil3 versions - Latest release: about 3 years ago - 8 downloads last month - 164 stars on GitHub - 1 maintainer
chatdbt 0.0.5
chatdbt is an openai-based dbt documentation robot. You can use natural language to describe your...5 versions - Latest release: over 2 years ago - 20 downloads last month - 4 stars on GitHub - 1 maintainer
pgn-tokenizer 0.1.5 💰
A byte pair encoding tokenizer for chess portable game notation (PGN)4 versions - Latest release: 7 months ago - 21 downloads last month - 0 stars on GitHub - 1 maintainer
tokenization-scorer 1.1.8
Package for evaluating text tokenizations.12 versions - Latest release: 8 months ago - 82 downloads last month - 44 stars on GitHub - 1 maintainer
Related Keywords
tokenizer
11
python
9
nlp
8
tokenization
6
natural-language-processing
3
openai
3
byte-pair-encoding
3
tiktoken
3
sentencepiece
3
wordpiece
3
unigram
3
llm
3
segmentation
2
subword-units
2
huggingface
2
tokenizers
2
bpe-tokenizer
2
rust
2
word-segmentation
2
text-processing
2
subword
2
ai
2
Odoo
2
NFe
2
ERP
2
brasil
2
cte
2
mdfe
2
nfe
2
nfse
2
nota-fiscal-eletronica
2
machine-translation
2
performance
2
sped
2
e-invoicing
2
rapid
1
scientists
1
accelerated
1
developers
1
tokenizers-library
1
huggingface-tokenizers
1
tiktoken-compatible
1
tiktoken-alternative
1
machine-learning
1
linguistics
1
large-language-models
1
deep-learning
1
research
1
data-science
1
artificial-intelligence
1
embeddings
1
transformers
1
text-generation
1
generative-ai
1
notebook
1
ast
1
jupyter
1
nbdev
1
byte pair encoding
1
pypi-package
1
byte-pair-tokenizer
1
backtracking
1
bpe-dropout
1
tool
1
package
1
library
1
stable
1
production-ready
1
chess
1
nlp-engineers
1
pgn
1
machine-learning-engineers
1
evaluation
1
natural language processing
1
data-scientists
1
researchers
1
regex
1
web
1
nodejs
1
text
1
preprocessing
1
cpython-extensions
1
Tokenizer
1
LLM
1
Byte Pair Encoding
1
BPE
1
summarization
1
nlu
1
gpt2tokenizer
1
bart
1
nmt
1
neural-machine-translation
1
byte-level
1
NFSe
1
BPe
1
MDFe
1
CTe
1
blazing-fast
1
speed
1
optimized
1