proxy.golang.org : github.com/alibaba/data-juicer
A one-stop data processing system to make data higher-quality, juicier, and more digestible for LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大语言模型提供更高质量、更丰富、更易”消化“的数据!
Registry
-
Source
- Documentation
- JSON
- codemeta.json
purl: pkg:golang/github.com/alibaba/data-juicer
Keywords:
chinese
, data-analysis
, data-science
, data-visualization
, dataset
, gpt
, gpt-4
, instruction-tuning
, large-language-models
, llama
, llava
, llm
, llms
, multi-modal
, nlp
, opendata
, pre-training
, pytorch
, sora
, streamlit
License: Apache-2.0
Latest release: 3 months ago
First release: over 2 years ago
Stars: 1,321 on GitHub
Forks: 78 on GitHub
Total Commits: 63
Committers: 13
Average commits per author: 4.846
Development Distribution Score (DDS): 0.635
More commit stats: commits.ecosyste.ms
See more repository details: repos.ecosyste.ms
Last synced: 19 days ago