nuget.org : stanford.nlp.segmenter
Tokenization of raw text is a standard pre-processing step for many NLP tasks. For English, tokenization usually involves punctuation splitting and separation of some affixes like possessives. Other languages require more extensive token pre-processing, which is usually called segmentation.
Registry
-
Source
- Homepage
- JSON
purl: pkg:nuget/stanford.nlp.segmenter
Keywords:
nlp
, stanford
, segmenter
, tokenization
, splitting
, IKVM
, dotnet
, fsharp
, recompiled-packages
, stanford-nlp
License: MIT
Latest release: over 4 years ago
First release: almost 12 years ago
Downloads: 26,575 total
Stars: 607 on GitHub
Forks: 121 on GitHub
Total Commits: 208
Committers: 11
Average commits per author: 18.909
Development Distribution Score (DDS): 0.404
More commit stats: commits.ecosyste.ms
See more repository details: repos.ecosyste.ms
Funding links: https://www.buymeacoffee.com/sergeytihon
Last synced: 5 days ago