An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org : tokeniser-py-lite

A custom tokeniser with a 131,072-token vocabulary derived from 0.5B (val) and 1B (val+test) tokens in SlimPajama. Lite version of the tokeniser-py library. Uses a novel token generation algorithm and a dynamic programming-based segmentation method for fast, interpretable tokenisation, which can also be used for tokeniation on custom token maps.

Registry - Source - Documentation - JSON
purl: pkg:pypi/tokeniser-py-lite
Keywords: Tokens , Tokeniser , Tokenizer , LLMs , LMs , LLM , LM , Language Model , Language Models , Large Language Models , Large Language Model
License: MIT
Latest release: 19 days ago
First release: 19 days ago
Last synced: 19 days ago

Tasmay_P_Tibrewal
Owner
2 packages
334 downloads