pypi.org : tokeniser-py-lite
A custom tokeniser with a 131,072-token vocabulary derived from 0.5B (val) and 1B (val+test) tokens in SlimPajama. Lite version of the tokeniser-py library. Uses a novel token generation algorithm and a dynamic programming-based segmentation method for fast, interpretable tokenisation, which can also be used for tokeniation on custom token maps.
Registry
-
Source
- Documentation
- JSON
purl: pkg:pypi/tokeniser-py-lite
Keywords:
Tokens
, Tokeniser
, Tokenizer
, LLMs
, LMs
, LLM
, LM
, Language Model
, Language Models
, Large Language Models
, Large Language Model
License: MIT
Latest release: 17 days ago
First release: 17 days ago
Last synced: 17 days ago