pypi.org : nemo-curator
Scalable Data Preprocessing Tool for Training Large Language Models
Registry
-
Source
- Documentation
- JSON
purl: pkg:pypi/nemo-curator
Keywords:
data
, data-curation
, data-prep
, data-preparation
, data-processing
, data-processing-pipelines
, data-quality
, datacuration
, datarecipes
, deduplication
, fast-data-processing
, fine-tuning
, large-language-models
, large-scale-data-processing
, llm
, llm-data-quality
, llmapps
, python
, semantic-deduplication
License: Apache-2.0
Latest release: 22 days ago
First release: 11 months ago
Downloads: 1,501 last month
Stars: 879 on GitHub
Forks: 124 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: about 22 hours ago