Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org : utoken

utoken is a universal tokenizer (multilingual word segmenter) that divides text into words, punctuation and special tokens such as numbers, URLs, XML tags, email-addresses and hashtags. It comes with a companion detokenizer.

Registry - Source - Documentation - JSON
purl: pkg:pypi/utoken
Keywords: machine translation, datasets, NLP, natural language processing, computational linguistics
License: Apache-2.0
Latest release: over 2 years ago
First release: over 2 years ago
Dependent repositories: 1
Downloads: 78 last month
Stars: 12 on GitHub
Forks: 1 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: 19 days ago

    Loading...
    Readme
    Loading...