An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org : nemo-curator

Scalable Data Preprocessing Tool for Training Large Language Models

Registry - Source - Documentation - JSON
purl: pkg:pypi/nemo-curator
Keywords: data , data-curation , data-prep , data-preparation , data-processing , data-processing-pipelines , data-quality , datacuration , datarecipes , deduplication , fast-data-processing , fine-tuning , large-language-models , large-scale-data-processing , llm , llm-data-quality , llmapps , python , semantic-deduplication
License: Apache-2.0
Latest release: 22 days ago
First release: 11 months ago
Downloads: 1,501 last month
Stars: 879 on GitHub
Forks: 124 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: about 22 hours ago

    Loading...
    Readme
    Loading...