proxy.golang.org : github.com/NVIDIA/NeMo-Curator
Scalable data pre processing and curation toolkit for LLMs
Registry
-
Source
- Documentation
- JSON
purl: pkg:golang/github.com/%21n%21v%21i%21d%21i%21a/%21ne%21mo-%21curator
Keywords:
data
, data-curation
, data-prep
, data-preparation
, data-processing
, data-processing-pipelines
, data-quality
, datacuration
, datarecipes
, deduplication
, fast-data-processing
, fine-tuning
, large-language-models
, large-scale-data-processing
, llm
, llm-data-quality
, llmapps
, python
, semantic-deduplication
License: Apache-2.0
Latest release: 3 months ago
First release: over 1 year ago
Stars: 918 on GitHub
Forks: 130 on GitHub
See more repository details: repos.ecosyste.ms
Last synced: 28 days ago