Ecosyste.ms: Packages
An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.
nuget.org : stanford.nlp.segmenter
Tokenization of raw text is a standard pre-processing step for many NLP tasks. For English, tokenization usually involves punctuation splitting and separation of some affixes like possessives. Other languages require more extensive token pre-processing, which is usually called segmentation.
Registry
- Homepage
- JSON
purl: pkg:nuget/stanford.nlp.segmenter
Keywords: nlp, stanford, segmenter, tokenization, splitting, IKVM
License:
Latest release: over 3 years ago
First release: over 10 years ago
Downloads: 22,841 total
Last synced: 14 days ago