An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

tokenizer

A simple multilingual tokenizer for NLP tasks. This tool provides a CLI and a library for linguistic tokenization which is an anavoidable step for many HLT (human language technology) tasks in the preprocessing phase for further syntactic, semantic and other higher level processing goals. Use it for tokenization of German, English and French texts.

Ecosystem
gem.coop
Latest Release
0.3.0
over 10 years ago
Versions
6
Downloads
283,049 total
Links
Registry gem.coop
Source Repository
Docs Documentation
JSON API View JSON
CodeMeta codemeta.json
Package Details
PURL pkg:gem/tokenizer?repository_url=https://gem.coop
spec
License MIT
First Release almost 15 years ago
Last Synced about 10 hours ago
Repository
Stars 46 on GitHub
Forks 11 on GitHub
Rankings on gem.coop
Overall Top 1.9%
Downloads Top 5.6%