Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

conda-forge.org : syntok

Syntok is the successor of an earlier, very similar tool, segtok, but has evolved significantly in terms of providing better segmentation and tokenization performance and throughput (syntok can segment documents at a rate of about 100k tokens per second without problems). For example, if a sentence terminal marker is not followed by a spacing character, segtok is unable to detect that as a terminal marker, while syntok has no problem segmenting that case (as it uses tokenization first, and does segmentation afterwards).

Registry - Source - JSON
purl: pkg:conda/syntok
Keywords: nlp, segmentation, sentence-boundary-detection, tokenizer
License: MIT
Latest release: over 2 years ago
First release: about 3 years ago
Stars: 168 on GitHub
Forks: 31 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: 21 days ago

    Loading...
    Readme
    Loading...