stanford.nlp.segmenter | nuget.org | Ecosyste.ms: Packages

nuget.org : stanford.nlp.segmenter

Tokenization of raw text is a standard pre-processing step for many NLP tasks. For English, tokenization usually involves punctuation splitting and separation of some affixes like possessives. Other languages require more extensive token pre-processing, which is usually called segmentation.

Registry - Source - Homepage - JSON
purl: pkg:nuget/stanford.nlp.segmenter
Keywords: nlp , stanford , segmenter , tokenization , splitting , IKVM , dotnet , fsharp , recompiled-packages , stanford-nlp
License: MIT
Latest release: over 4 years ago
First release: almost 12 years ago
Downloads: 26,575 total
Stars: 607 on GitHub
Forks: 121 on GitHub
Total Commits: 208
Committers: 11
Average commits per author: 18.909
Development Distribution Score (DDS): 0.404
More commit stats: commits.ecosyste.ms
See more repository details: repos.ecosyste.ms
Funding links: https://www.buymeacoffee.com/sergeytihon
Last synced: 5 days ago

Readme

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Packages

nuget.org : stanford.nlp.segmenter