Ecosyste.ms: Packages
An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.
pypi.org "data-cleaning" keyword
ipydataclean 0.2.2
Interactive cleaning for pandas DataFrames1 version - Latest release: over 4 years ago - 1 dependent repositories - 19 downloads last month - 15 stars on GitHub - 8 maintainers
selfclean 0.0.22
A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates a...21 versions - Latest release: 1 day ago - 618 downloads last month - 9 stars on GitHub - 2 maintainers
tocase 1.0.0
tocase provides an API to recase string into any case5 versions - Latest release: over 2 years ago - 1 dependent repositories - 41 downloads last month - 4 stars on GitHub - 2 maintainers
Top 9.4% on pypi.org
32 versions - Latest release: over 1 year ago - 1 dependent repositories - 339 downloads last month - 1,441 stars on GitHub - 2 maintainers
pyoptimus 0.1.0
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.32 versions - Latest release: over 1 year ago - 1 dependent repositories - 339 downloads last month - 1,441 stars on GitHub - 2 maintainers
learn2clean 0.2.1
Python Library for Data Preprocessing with Reinforcement Learning.1 version - Latest release: about 5 years ago - 1 dependent repositories - 21 downloads last month - 43 stars on GitHub - 1 maintainer
cleanlab-studio 2.0.3
Client interface for all things Cleanlab Studio78 versions - Latest release: about 23 hours ago - 1 dependent repositories - 3.19 thousand downloads last month - 20 stars on GitHub - 5 maintainers
sparx 0.0.2
Sparx is a simplified data munging, wrangling and preparation library2 versions - Latest release: over 5 years ago - 1 dependent repositories - 13 downloads last month - 0 stars on GitLab.com - 6 maintainers
string-treatment 1.0.1
String treatment package12 versions - Latest release: 8 months ago - 64 downloads last month - 0 stars on GitHub - 1 maintainer
example-package-elisno 2.6.24
The standard package for data-centric AI, machine learning with label errors, and automatically f...7 versions - Latest release: about 2 months ago - 53 downloads last month - 8,667 stars on GitHub - 1 maintainer
Top 2.1% on pypi.org
28 versions - Latest release: about 1 month ago - 8 dependent packages - 19 dependent repositories - 17.3 thousand downloads last month - 8,667 stars on GitHub - 5 maintainers
cleanlab 2.6.3
The standard package for data-centric AI, machine learning with label errors, and automatically f...28 versions - Latest release: about 1 month ago - 8 dependent packages - 19 dependent repositories - 17.3 thousand downloads last month - 8,667 stars on GitHub - 5 maintainers
redditcleaner 1.1.2
Clean Reddit Text Data4 versions - Latest release: about 4 years ago - 1 dependent package - 2 dependent repositories - 77 downloads last month - 75 stars on GitHub - 2 maintainers
zvdata 1.2.3 💰
an extendable library for recording and analyzing data40 versions - Latest release: about 4 years ago - 1 dependent repositories - 141 downloads last month - 5 stars on GitHub - 2 maintainers
bunkatopics 0.43.1
Bunkatopics is a Topic Modeling package and Exploration Module39 versions - Latest release: 4 months ago - 332 downloads last month - 77 stars on GitHub - 1 maintainer
dpyp 1.0.0
A pandas convenience wrapper for small-scale data pipelines1 version - Latest release: 10 days ago - 190 downloads last month - 2 stars on GitHub - 2 maintainers
skrub 0.5.0
Prepping tables for machine learning4 versions - Latest release: 5 months ago - 441 downloads last month - 1,011 stars on GitHub - 8 maintainers
imputerapi 0.0.3
Data Imputer API3 versions - Latest release: almost 3 years ago - 1 dependent repositories - 32 downloads last month - 0 stars on GitHub - 2 maintainers
Top 3.5% on pypi.org
61 versions - Latest release: 9 months ago - 1 dependent package - 12 dependent repositories - 23.1 thousand downloads last month - 477 stars on GitHub - 1 maintainer
klib 1.1.2 💰
Customized data preprocessing functions for frequent tasks.61 versions - Latest release: 9 months ago - 1 dependent package - 12 dependent repositories - 23.1 thousand downloads last month - 477 stars on GitHub - 1 maintainer
text2term 4.1.3
A tool for mapping free-text descriptions of (biomedical) entities to controlled terms in ontologies23 versions - Latest release: about 1 month ago - 1 dependent repositories - 147 downloads last month - 11 stars on GitHub - 3 maintainers
tidytcells 2.1.1
Standardise TR/MH data33 versions - Latest release: 2 months ago - 1 dependent package - 1 dependent repositories - 292 downloads last month - 4 stars on GitHub - 1 maintainer
boltzmannclean 0.1.2
Fill missing values in DataFrames with Restricted Boltzmann Machines1 version - Latest release: over 4 years ago - 1 dependent repositories - 28 downloads last month - 23 stars on GitHub - 4 maintainers
pippi-lang 0.0.2
A simple package to create elegant nlp pipelines using sklearn.2 versions - Latest release: about 1 year ago - 23 downloads last month - 0 stars on GitHub - 1 maintainer
dframe-utils 0.0.2rc2
simple utility tools for dataframes in Python2 versions - Latest release: over 6 years ago - 1 dependent repositories - 21 downloads last month - 4 stars on GitHub - 1 maintainer
objectiv-modelhub 0.0.28
The open model hub is a growing collection of data models that you can take, combine and run for ...33 versions - Latest release: over 1 year ago - 1 dependent repositories - 236 downloads last month - 468 stars on GitHub - 1 maintainer
objectiv-bach 0.0.28
Objectiv Bach provides Pandas-like DataFrames backed by SQL33 versions - Latest release: over 1 year ago - 1 dependent package - 1 dependent repositories - 248 downloads last month - 468 stars on GitHub - 1 maintainer
ini2csv 1.0.0
A simple utility that converts and combines a folder of .ini files with identical keys into one c...1 version - Latest release: 11 months ago - 12 downloads last month - 1 stars on GitHub - 4 maintainers
Top 1.4% on pypi.org
86 versions - Latest release: about 2 months ago - 60 dependent packages - 229 dependent repositories - 2.76 million downloads last month - 2,979 stars on GitHub - 3 maintainers
pandera 0.18.3 💰
A light-weight and flexible data validation and testing tool for statistical data objects.86 versions - Latest release: about 2 months ago - 60 dependent packages - 229 dependent repositories - 2.76 million downloads last month - 2,979 stars on GitHub - 3 maintainers
Top 5.5% on pypi.org
18 versions - Latest release: almost 3 years ago - 26 dependent repositories - 1.14 thousand downloads last month - 372 stars on GitHub - 2 maintainers
nonechucks 0.4.2
nonechucks is a library that provides wrappers for PyTorch's datasets, samplers, and transforms t...18 versions - Latest release: almost 3 years ago - 26 dependent repositories - 1.14 thousand downloads last month - 372 stars on GitHub - 2 maintainers
Top 2.2% on pypi.org
21 versions - Latest release: about 2 months ago - 1 dependent package - 36 dependent repositories - 62.8 thousand downloads last month - 6,627 stars on GitHub - 4 maintainers
fiftyone-db 1.1.2
FiftyOne DB21 versions - Latest release: about 2 months ago - 1 dependent package - 36 dependent repositories - 62.8 thousand downloads last month - 6,627 stars on GitHub - 4 maintainers
datasetops 0.0.6
Fluent dataset operations, compatible with your favorite libraries4 versions - Latest release: about 4 years ago - 4 dependent repositories - 57 downloads last month - 10 stars on GitHub - 2 maintainers
cleanlab-cli 0.1.14
Command line interface for all things Cleanlab Studio16 versions - Latest release: over 1 year ago - 129 downloads last month - 20 stars on GitHub - 6 maintainers
cleanflow 1.3.3a1
A a framework for cleaning, pre-processing and exploring data in a scalable and distributed manner.11 versions - Latest release: almost 6 years ago - 31 downloads last month - 1 stars on GitHub - 2 maintainers
pydvl 0.9.1
The Python Data Valuation Library13 versions - Latest release: 10 days ago - 288 downloads last month - 65 stars on GitHub - 3 maintainers
uniflow 0.1.0
Unified interface for pre-training data augmentation and post-training evaluation of Large Langua...33 versions - Latest release: 9 months ago - 1 dependent repositories - 667 downloads last month - 106 stars on GitHub - 2 maintainers
banglanum2words 0.0.3
Converts a Bangla numeric string to literal words.3 versions - Latest release: over 2 years ago - 1 dependent repositories - 65 downloads last month - 3 stars on GitHub - 2 maintainers
plane 0.2.1 💰
A lib for text preprocessing20 versions - Latest release: over 3 years ago - 3 dependent repositories - 635 downloads last month - 11 stars on GitHub - 2 maintainers
scribe-data 3.2.2
Wikidata and Wikipedia language data extraction11 versions - Latest release: 2 months ago - 1 dependent repositories - 46 downloads last month - 14 stars on GitHub - 1 maintainer
sofine 0.2.4
Lightweight framework for creating data-collection plugins and chaining together calls to them, f...10 versions - Latest release: over 9 years ago - 2 dependent repositories - 23 downloads last month - 7 stars on GitHub - 2 maintainers
opuscleaner 0.4.1
OpusCleaner is a web interface that helps you select, clean and schedule your data for training m...7 versions - Latest release: 2 months ago - 1 dependent repositories - 707 downloads last month - 35 stars on GitHub - 2 maintainers
Top 10.0% on pypi.org
76 versions - Latest release: 3 months ago - 1 dependent repositories - 512 downloads last month - 420 stars on GitHub - 1 maintainer
encord-active 0.1.83
Enable users to improve machine learning models in an active learning fashion via data, label, an...76 versions - Latest release: 3 months ago - 1 dependent repositories - 512 downloads last month - 420 stars on GitHub - 1 maintainer
desbordante 2.0.0
Science-intensive high-performance data profiler3 versions - Latest release: 15 days ago - 138 downloads last month - 61 stars on GitHub - 1 maintainer
scikit-clean 0.1.2
A collection of algorithms for detecting and handling label noise6 versions - Latest release: over 3 years ago - 1 dependent repositories - 18 downloads last month - 13 stars on GitHub - 2 maintainers
Top 7.2% on pypi.org
60 versions - Latest release: 17 days ago - 1 dependent package - 1 dependent repositories - 1.33 thousand downloads last month - 6,627 stars on GitHub - 6 maintainers
fiftyone-desktop 0.33.7
FiftyOne Desktop60 versions - Latest release: 17 days ago - 1 dependent package - 1 dependent repositories - 1.33 thousand downloads last month - 6,627 stars on GitHub - 6 maintainers
fiftyone-db-ubuntu2204 0.4.0
FiftyOne DB1 version - Latest release: 12 months ago - 6.7 thousand downloads last month - 6,627 stars on GitHub - 2 maintainers
fiftyone-db-ubuntu2004 0.4.0
FiftyOne DB1 version - Latest release: over 1 year ago - 1 dependent repositories - 15 downloads last month - 6,627 stars on GitHub - 1 maintainer
fiftyone-db-debian9 0.4.0
FiftyOne DB6 versions - Latest release: over 1 year ago - 1 dependent repositories - 10 downloads last month - 6,627 stars on GitHub - 2 maintainers
fiftyone-db-ubuntu1604 0.3.0
FiftyOne DB5 versions - Latest release: over 3 years ago - 1 dependent repositories - 15 downloads last month - 6,619 stars on GitHub - 2 maintainers
fiftyone-eval-only 0.14.3
FiftyOne, for evaluation only.1 version - Latest release: over 2 years ago - 1 dependent repositories - 6 downloads last month - 6,625 stars on GitHub - 2 maintainers
fiftyone-db-rhel7 0.4.0
FiftyOne DB3 versions - Latest release: over 1 year ago - 1 dependent repositories - 17 downloads last month - 6,625 stars on GitHub - 1 maintainer
mercury-dataschema 0.0.1
Mercury's DataSchema package allows the automatic recognition and validation of feature types.1 version - Latest release: about 1 year ago - 2 dependent packages - 92 downloads last month - 11 stars on GitHub - 2 maintainers
pgdedupe 0.2.1
A simple interface to datamade/dedupe to make probabilistic record linkage easy.2 versions - Latest release: almost 7 years ago - 1 dependent repositories - 15 downloads last month - 42 stars on GitHub - 2 maintainers
pytrack-lib 2.0.8
a Map-Matching-based Python Toolbox for Vehicle Trajectory Reconstruction13 versions - Latest release: 7 months ago - 1 dependent repositories - 241 downloads last month - 56 stars on GitHub - 1 maintainer
superdeduper 0.1.7
A simple interface to datamade/dedupe to make probabilistic record linkage easy.7 versions - Latest release: about 7 years ago - 1 dependent repositories - 21 downloads last month - 42 stars on GitHub - 2 maintainers
data-purifier 0.3.6
A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning and Automated D...35 versions - Latest release: 8 months ago - 1 dependent repositories - 145 downloads last month - 39 stars on GitHub - 2 maintainers
mprows 0.1.5
multiprocessing on row data using user defined functions7 versions - Latest release: over 5 years ago - 1 dependent repositories - 9 downloads last month - 2 stars on GitHub - 2 maintainers
Top 4.8% on pypi.org
83 versions - Latest release: almost 4 years ago - 8 dependent repositories - 8.75 thousand downloads last month - 1,439 stars on GitHub - 4 maintainers
optimuspyspark 2.2.32
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion wi...83 versions - Latest release: almost 4 years ago - 8 dependent repositories - 8.75 thousand downloads last month - 1,439 stars on GitHub - 4 maintainers
sliceguard 0.0.35
A library for detecting critical data slices in structured and unstructured data based on feature...33 versions - Latest release: 5 months ago - 1 dependent package - 1 dependent repositories - 210 downloads last month - 51 stars on GitHub - 2 maintainers
texy 0.1.0
Supercharge text processing5 versions - Latest release: over 1 year ago - 26 downloads last month - 0 stars on GitHub - 1 maintainer
purifier 0.2.16
A simple scraping library.19 versions - Latest release: over 1 year ago - 5 downloads last month - 1 stars on GitHub - 1 maintainer
opendataval 1.3.0
Transparent Data Valuation5 versions - Latest release: 6 months ago - 109 downloads last month - 64 stars on GitHub - 2 maintainers
quantclean 0.0.2
Quantclean is a program that reformats every financial dataset to US Equity TradeBar2 versions - Latest release: almost 3 years ago - 1 dependent repositories - 15 downloads last month - 16 stars on GitHub - 1 maintainer
vaquero 0.0.5
A library for iterative and interactive data wrangling5 versions - Latest release: about 6 years ago - 2 dependent repositories - 7 downloads last month - 0 stars on GitHub - 2 maintainers
vulcanai 1.0.8
A high-level framework built on top of Pytorch using added functionality from Scikit-learn to pro...10 versions - Latest release: over 4 years ago - 1 dependent repositories - 18 downloads last month - 16 stars on GitHub - 4 maintainers
completely 0.1.0
A simple tool to measure data completeness1 version - Latest release: over 3 years ago - 3 dependent repositories - 547 downloads last month - 0 stars on GitHub - 2 maintainers
Top 9.2% on pypi.org
27 versions - Latest release: 3 months ago - 3 dependent repositories - 2.11 thousand downloads last month - 113 stars on GitHub - 1 maintainer
pythresh 0.3.6
A Python Toolbox for Outlier Detection Thresholding27 versions - Latest release: 3 months ago - 3 dependent repositories - 2.11 thousand downloads last month - 113 stars on GitHub - 1 maintainer
marshmallow-pyspark 0.2.4
PySpark data serializer6 versions - Latest release: 4 months ago - 1 dependent repositories - 2.09 thousand downloads last month - 12 stars on GitHub - 2 maintainers
data-cleaning 1.0.1
An utility to clean the data and return you the cleaned data2 versions - Latest release: about 3 years ago - 1 dependent repositories - 123 downloads last month - 5 stars on GitHub - 4 maintainers
pypandas 0.2.5
A data cleaning framework for Spark7 versions - Latest release: almost 6 years ago - 3 dependent repositories - 237 downloads last month - 7 stars on GitHub - 2 maintainers
countrywrangler 0.2.7 removed
A library that simplifies the handling of country-related data. Easily standardize your data acco...26 versions - Latest release: about 1 year ago - 1 dependent repositories - 1.06 thousand downloads last month - 3 stars on GitHub - 2 maintainers
contacts-harmony 0.0.4 removed
A Python library to normalize and validate email addresses and phone numbers entered into web forms.4 versions - Latest release: about 1 year ago - 129 downloads last month - 2 stars on GitHub - 1 maintainer
Related Keywords
data-science
33
python
26
machine-learning
22
data-quality
16
data-centric-ai
15
data-curation
15
data-analysis
13
deep-learning
13
computer-vision
12
active-learning
11
image-classification
10
data-preprocessing
10
data
10
visualization
10
data-validation
9
object-detection
9
artificial-intelligence
8
developer-tools
8
unstructured-data
8
vector-search
8
pandas
7
data-processing
7
data-wrangling
7
data-profiling
7
python3
6
noisy-labels
6
outlier-detection
6
nlp
5
preprocessing
5
data-labeling
4
postgresql
4
data-cleansing
4
spark
4
pyspark
4
pandas-dataframe
4
data-exploration
4
data-visualization
4
data-pipeline
4
dataframe
3
llms
3
weak-supervision
3
natural-language-processing
3
pytorch
3
llm
3
data-engineering
3
learning
3
annotations
3
feature-engineering
3
exploratory-data-analysis
3
database
3
data-preparation
3
pipeline
3
record-linkage
2
data-analysis-python
2
bigquery
2
analytics-tracking
2
analytics-sdk
2
analytics-platform
2
game-theory
2
robust-machine-learning
2
csv
2
machine
2
scikit-learn
2
feature-selection
2
dirty-data
2
etl
2
scraper
2
deduplication
2
eda
2
dedupe
2
data-mining
2
schema
2
sql-queries
2
snowplow
2
retention-analysis
2
product-analytics
2
pandas-python
2
pandas-library
2
instrumentation-testing
2
instrumentation-libraries
2
instrumentation
2
datascience
2
data-modeling
2
data-valuation
2
big-data-cleaning
2
datacleaner
2
annotation
2
dask-cudf
2
dataops
2
dataquality
2
datasets
2
labeling
2
out-of-distribution-detection
2
regex
2
data-cleaner
2
structured-data
2
data-extraction
2
data-transformation
2
model-deployment
2
automl
2