Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

Top 5.2% on conda-forge.org
Top 4.5% dependent packages on conda-forge.org
Top 7.2% dependent repos on conda-forge.org
Top 5.1% forks on conda-forge.org

conda-forge.org : sentencepiece

SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training. SentencePiece implements subword units (e.g., byte-pair-encoding (BPE) [[Sennrich et al.](http://www.aclweb.org/anthology/P16-1162)]) and unigram language model [[Kudo](https://arxiv.org/abs/1804.109590)]) with the extension of direct training from raw sentences. SentencePiece allows us to make a purely end-to-end system that does not depend on language-specific pre/postprocessing.

Registry - Source - Homepage - JSON
purl: pkg:conda/sentencepiece
Keywords: natural-language-processing, neural-machine-translation, word-segmentation
License: Apache-2.0
Latest release: about 2 years ago
First release: almost 4 years ago
Dependent packages: 14
Dependent repositories: 25
Stars: 6,853 on GitHub
Forks: 902 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: 17 days ago

    Loading...
    Readme
    Loading...