An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "bpe" keyword

View the packages on the pypi.org package registry that are tagged with the "bpe" keyword.

Top 4.7% on pypi.org
nfelib 2.2.0
nfelib: electronic invoicing library for Brazil
18 versions - Latest release: about 1 month ago - 6 dependent packages - 11 dependent repositories - 9.74 thousand downloads last month - 165 stars on GitHub - 2 maintainers
blt-tokenizer 0.2.2
High-performance byte-level tokenizer with BPE support
3 versions - Latest release: 2 months ago - 54 downloads last month - 0 stars on GitHub - 1 maintainer
Top 1.6% on pypi.org
subword-nmt 0.3.8
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
8 versions - Latest release: over 3 years ago - 10 dependent packages - 213 dependent repositories - 18.6 thousand downloads last month - 2,146 stars on GitHub - 1 maintainer
bpe-summarizer 0.2.1
This summarizer attempts to leverage Byte Pair Encoding (BPE) tokenization and the Bart vocabular...
8 versions - Latest release: about 5 years ago - 1 dependent repositories - 44 downloads last month - 3 stars on GitHub - 1 maintainer
bpeasy 0.1.6
Fast bare-bones BPE for modern tokenizer training
7 versions - Latest release: 3 months ago - 2.11 thousand downloads last month - 164 stars on GitHub - 1 maintainer
tinybpe 0.1.1
This is an ultra-fast, lightweight and clean code implementation of the Byte Pair Encoding (BPE) ...
2 versions - Latest release: 5 months ago - 231 downloads last month - 4 stars on GitHub - 1 maintainer
nlcodec 0.4.0
nlcodec is a collection of encoding schemes for natural language sequences. nlcodec.db is a effi...
10 versions - Latest release: about 4 years ago - 1 dependent package - 2 dependent repositories - 68 downloads last month - 5 stars on GitHub - 1 maintainer
kitoken 0.10.1 💰
Fast and versatile tokenizer for language models, supporting BPE, Unigram and WordPiece tokenization
2 versions - Latest release: 9 months ago - 95 downloads last month - 29 stars on GitHub - 1 maintainer
pyregtokenizer 0.0.2
A BPE Tokenizer using regex
2 versions - Latest release: over 1 year ago - 5 downloads last month - 1 maintainer
tokendagger 0.1.1
High-Performance Implementation of OpenAI's TikToken - 2x Throughput, 4x Faster Code Tokenization
2 versions - Latest release: 2 months ago - 163 downloads last month - 432 stars on GitHub - 1 maintainer
Top 3.6% on pypi.org
pyonmttok 1.37.1
Fast and customizable text tokenization library with BPE and SentencePiece support
66 versions - Latest release: over 2 years ago - 3 dependent packages - 103 dependent repositories - 22.8 thousand downloads last month - 310 stars on GitHub - 4 maintainers
ultra-tokenizer 0.1.0
Advanced tokenizer with support for BPE, WordPiece, and Unigram algorithms
1 version - Latest release: 22 days ago - 1 stars on GitHub - 1 maintainer
Top 8.2% on pypi.org
bpe 0.2.1
Byte pair encoding for graceful handling of rare words in NLP
2 versions - Latest release: over 6 years ago - 11 dependent repositories - 742 downloads last month - 229 stars on GitHub - 1 maintainer
rs-bpe 0.1.0
A ridiculously fast Python BPE (Byte Pair Encoder) implementation written in Rust
1 version - Latest release: 6 months ago - 475 downloads last month - 6 stars on GitHub - 1 maintainer
Top 2.2% on pypi.org
youtokentome 1.0.6
Unsupervised text tokenizer focused on computational efficiency
8 versions - Latest release: over 5 years ago - 8 dependent packages - 228 dependent repositories - 93.6 thousand downloads last month - 968 stars on GitHub - 3 maintainers
smoltoken 0.1.4
A light-weight & fast library for Byte Pair Encoding (BPE) tokenization.
5 versions - Latest release: 5 months ago - 170 downloads last month - 7 stars on GitHub - 1 maintainer
code-tokenizers 0.0.5
Aligning BPE and AST
5 versions - Latest release: over 2 years ago - 1 dependent package - 21 downloads last month - 8 stars on GitHub - 1 maintainer
nfelib-xsdata 0.9.2
nfelib: electronic invoicing library for Brazil
3 versions - Latest release: about 3 years ago - 8 downloads last month - 164 stars on GitHub - 1 maintainer
chatdbt 0.0.5
chatdbt is an openai-based dbt documentation robot. You can use natural language to describe your...
5 versions - Latest release: over 2 years ago - 20 downloads last month - 4 stars on GitHub - 1 maintainer
pgn-tokenizer 0.1.5 💰
A byte pair encoding tokenizer for chess portable game notation (PGN)
4 versions - Latest release: 7 months ago - 21 downloads last month - 0 stars on GitHub - 1 maintainer
tokenization-scorer 1.1.8
Package for evaluating text tokenizations.
12 versions - Latest release: 8 months ago - 82 downloads last month - 44 stars on GitHub - 1 maintainer