An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org : py-data-juicer

A One-Stop Data Processing System for Large Language Models.

Registry - Source - Documentation - JSON
purl: pkg:pypi/py-data-juicer
Keywords: chinese , data-analysis , data-science , data-visualization , dataset , gpt , gpt-4 , instruction-tuning , large-language-models , llama , llava , llm , llms , multi-modal , nlp , opendata , pre-training , pytorch , sora , streamlit
License: Apache-2.0
Latest release: 11 days ago
First release: over 1 year ago
Downloads: 903 last month
Stars: 1,321 on GitHub
Forks: 78 on GitHub
Total Commits: 63
Committers: 13
Average commits per author: 4.846
Development Distribution Score (DDS): 0.635
More commit stats: commits.ecosyste.ms
See more repository details: repos.ecosyste.ms
Last synced: 11 days ago

data-juicer
3 packages
2,926 downloads