Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "data-cleaning" keyword

Top 2.1% on pypi.org
cleanlab 2.6.4
The standard package for data-centric AI, machine learning with label errors, and automatically f...
29 versions - Latest release: 10 days ago - 11 dependent packages - 19 dependent repositories - 26.2 thousand downloads last month - 8,808 stars on GitHub - 4 maintainers
example-package-elisno 2.6.24
The standard package for data-centric AI, machine learning with label errors, and automatically f...
7 versions - Latest release: 2 months ago - 65 downloads last month - 8,808 stars on GitHub - 1 maintainer
fiftyone-db-debian9 0.4.0
FiftyOne DB
6 versions - Latest release: over 1 year ago - 1 dependent repositories - 43 downloads last month - 6,627 stars on GitHub - 2 maintainers
fiftyone-eval-only 0.14.3
FiftyOne, for evaluation only.
1 version - Latest release: over 2 years ago - 1 dependent repositories - 16 downloads last month - 6,627 stars on GitHub - 1 maintainer
fiftyone-db-ubuntu2204 0.4.0
FiftyOne DB
1 version - Latest release: about 1 year ago - 7.1 thousand downloads last month - 6,627 stars on GitHub - 1 maintainer
fiftyone-db-ubuntu1604 0.3.0
FiftyOne DB
5 versions - Latest release: over 3 years ago - 1 dependent repositories - 35 downloads last month - 6,627 stars on GitHub - 2 maintainers
fiftyone-db-ubuntu2004 0.4.0
FiftyOne DB
1 version - Latest release: over 1 year ago - 1 dependent repositories - 27 downloads last month - 6,627 stars on GitHub - 1 maintainer
fiftyone-db-rhel7 0.4.0
FiftyOne DB
3 versions - Latest release: over 1 year ago - 1 dependent repositories - 37 downloads last month - 6,627 stars on GitHub - 1 maintainer
Top 2.2% on pypi.org
fiftyone-db 1.1.2
FiftyOne DB
21 versions - Latest release: 2 months ago - 1 dependent package - 36 dependent repositories - 54.2 thousand downloads last month - 6,627 stars on GitHub - 3 maintainers
Top 7.2% on pypi.org
fiftyone-desktop 0.33.7
FiftyOne Desktop
60 versions - Latest release: about 1 month ago - 1 dependent package - 1 dependent repositories - 1.9 thousand downloads last month - 6,627 stars on GitHub - 4 maintainers
Top 1.4% on pypi.org
pandera 0.19.3 💰
A light-weight and flexible data validation and testing tool for statistical data objects.
91 versions - Latest release: 3 days ago - 97 dependent packages - 229 dependent repositories - 2.53 million downloads last month - 3,009 stars on GitHub - 3 maintainers
Top 4.8% on pypi.org
optimuspyspark 2.2.32
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion wi...
83 versions - Latest release: almost 4 years ago - 8 dependent repositories - 10.7 thousand downloads last month - 1,441 stars on GitHub - 2 maintainers
Top 9.4% on pypi.org
pyoptimus 0.1.0
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.
32 versions - Latest release: over 1 year ago - 1 dependent repositories - 371 downloads last month - 1,441 stars on GitHub - 2 maintainers
skrub 0.5.0
Prepping tables for machine learning
4 versions - Latest release: 5 months ago - 545 downloads last month - 1,018 stars on GitHub - 4 maintainers
Top 3.5% on pypi.org
klib 1.1.2 💰
Customized data preprocessing functions for frequent tasks.
61 versions - Latest release: 10 months ago - 1 dependent package - 12 dependent repositories - 23.9 thousand downloads last month - 478 stars on GitHub - 1 maintainer
objectiv-bach 0.0.28
Objectiv Bach provides Pandas-like DataFrames backed by SQL
33 versions - Latest release: over 1 year ago - 1 dependent package - 1 dependent repositories - 248 downloads last month - 468 stars on GitHub - 1 maintainer
objectiv-modelhub 0.0.28
The open model hub is a growing collection of data models that you can take, combine and run for ...
33 versions - Latest release: over 1 year ago - 1 dependent repositories - 236 downloads last month - 468 stars on GitHub - 1 maintainer
Top 10.0% on pypi.org
encord-active 0.1.83
Enable users to improve machine learning models in an active learning fashion via data, label, an...
76 versions - Latest release: 4 months ago - 1 dependent repositories - 512 downloads last month - 420 stars on GitHub - 1 maintainer
Top 5.5% on pypi.org
nonechucks 0.4.2
nonechucks is a library that provides wrappers for PyTorch's datasets, samplers, and transforms t...
18 versions - Latest release: almost 3 years ago - 26 dependent repositories - 1.14 thousand downloads last month - 372 stars on GitHub - 1 maintainer
uniflow 0.1.0
Unified interface for pre-training data augmentation and post-training evaluation of Large Langua...
33 versions - Latest release: 10 months ago - 1 dependent repositories - 499 downloads last month - 116 stars on GitHub - 1 maintainer
Top 9.2% on pypi.org
pythresh 0.3.6
A Python Toolbox for Outlier Detection Thresholding
27 versions - Latest release: 3 months ago - 3 dependent repositories - 2.05 thousand downloads last month - 113 stars on GitHub - 1 maintainer
bunkatopics 0.46.1
Bunkatopics is a Topic Modeling package and Exploration Module
40 versions - Latest release: 3 days ago - 435 downloads last month - 97 stars on GitHub - 1 maintainer
redditcleaner 1.1.2
Clean Reddit Text Data
4 versions - Latest release: about 4 years ago - 1 dependent package - 2 dependent repositories - 77 downloads last month - 75 stars on GitHub - 1 maintainer
opendataval 1.3.0
Transparent Data Valuation
5 versions - Latest release: 6 months ago - 167 downloads last month - 68 stars on GitHub - 1 maintainer
pydvl 0.9.2
The Python Data Valuation Library
14 versions - Latest release: 10 days ago - 441 downloads last month - 65 stars on GitHub - 2 maintainers
desbordante 2.0.0
Science-intensive high-performance data profiler
3 versions - Latest release: 30 days ago - 1 dependent package - 138 downloads last month - 61 stars on GitHub - 1 maintainer
pytrack-lib 2.0.8
a Map-Matching-based Python Toolbox for Vehicle Trajectory Reconstruction
13 versions - Latest release: 8 months ago - 1 dependent repositories - 242 downloads last month - 56 stars on GitHub - 1 maintainer
sliceguard 0.0.35
A library for detecting critical data slices in structured and unstructured data based on feature...
33 versions - Latest release: 5 months ago - 1 dependent package - 1 dependent repositories - 210 downloads last month - 51 stars on GitHub - 1 maintainer
learn2clean 0.2.1
Python Library for Data Preprocessing with Reinforcement Learning.
1 version - Latest release: about 5 years ago - 1 dependent repositories - 21 downloads last month - 43 stars on GitHub - 1 maintainer
superdeduper 0.1.7
A simple interface to datamade/dedupe to make probabilistic record linkage easy.
7 versions - Latest release: about 7 years ago - 1 dependent repositories - 73 downloads last month - 42 stars on GitHub - 1 maintainer
pgdedupe 0.2.1
A simple interface to datamade/dedupe to make probabilistic record linkage easy.
2 versions - Latest release: about 7 years ago - 1 dependent repositories - 26 downloads last month - 42 stars on GitHub - 1 maintainer
data-purifier 0.3.6
A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning and Automated D...
35 versions - Latest release: 8 months ago - 1 dependent repositories - 285 downloads last month - 41 stars on GitHub - 1 maintainer
opuscleaner 0.4.1
OpusCleaner is a web interface that helps you select, clean and schedule your data for training m...
7 versions - Latest release: 3 months ago - 1 dependent repositories - 707 downloads last month - 35 stars on GitHub - 1 maintainer
boltzmannclean 0.1.2
Fill missing values in DataFrames with Restricted Boltzmann Machines
1 version - Latest release: over 4 years ago - 1 dependent repositories - 28 downloads last month - 23 stars on GitHub - 4 maintainers
cleanlab-studio 2.0.4
Client interface for all things Cleanlab Studio
79 versions - Latest release: 11 days ago - 1 dependent repositories - 3.21 thousand downloads last month - 21 stars on GitHub - 4 maintainers
cleanlab-cli 0.1.14
Command line interface for all things Cleanlab Studio
16 versions - Latest release: over 1 year ago - 129 downloads last month - 20 stars on GitHub - 3 maintainers
vulcanai 1.0.8
A high-level framework built on top of Pytorch using added functionality from Scikit-learn to pro...
10 versions - Latest release: over 4 years ago - 1 dependent repositories - 47 downloads last month - 16 stars on GitHub - 2 maintainers
quantclean 0.0.2
Quantclean is a program that reformats every financial dataset to US Equity TradeBar
2 versions - Latest release: about 3 years ago - 1 dependent repositories - 26 downloads last month - 16 stars on GitHub - 1 maintainer
scribe-data 3.2.2
Wikidata and Wikipedia language data extraction
11 versions - Latest release: 3 months ago - 1 dependent repositories - 52 downloads last month - 16 stars on GitHub - 1 maintainer
ipydataclean 0.2.2
Interactive cleaning for pandas DataFrames
1 version - Latest release: over 4 years ago - 1 dependent repositories - 19 downloads last month - 15 stars on GitHub - 4 maintainers
scikit-clean 0.1.2
A collection of algorithms for detecting and handling label noise
6 versions - Latest release: almost 4 years ago - 1 dependent repositories - 100 downloads last month - 13 stars on GitHub - 1 maintainer
marshmallow-pyspark 0.2.4
PySpark data serializer
6 versions - Latest release: 5 months ago - 1 dependent repositories - 2.14 thousand downloads last month - 12 stars on GitHub - 1 maintainer
mercury-dataschema 0.0.1
Mercury's DataSchema package allows the automatic recognition and validation of feature types.
1 version - Latest release: about 1 year ago - 2 dependent packages - 199 downloads last month - 11 stars on GitHub - 1 maintainer
text2term 4.1.3
A tool for mapping free-text descriptions of (biomedical) entities to controlled terms in ontologies
23 versions - Latest release: about 2 months ago - 1 dependent repositories - 210 downloads last month - 11 stars on GitHub - 2 maintainers
plane 0.2.1 💰
A lib for text preprocessing
20 versions - Latest release: over 3 years ago - 3 dependent repositories - 635 downloads last month - 11 stars on GitHub - 1 maintainer
datasetops 0.0.6
Fluent dataset operations, compatible with your favorite libraries
4 versions - Latest release: about 4 years ago - 4 dependent repositories - 57 downloads last month - 10 stars on GitHub - 1 maintainer
selfclean 0.0.22
A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates a...
21 versions - Latest release: 16 days ago - 618 downloads last month - 9 stars on GitHub - 1 maintainer
pypandas 0.2.5
A data cleaning framework for Spark
7 versions - Latest release: about 6 years ago - 3 dependent repositories - 251 downloads last month - 7 stars on GitHub - 1 maintainer
sofine 0.2.4
Lightweight framework for creating data-collection plugins and chaining together calls to them, f...
10 versions - Latest release: over 9 years ago - 2 dependent repositories - 23 downloads last month - 7 stars on GitHub - 1 maintainer
data-cleaning 1.0.1
An utility to clean the data and return you the cleaned data
2 versions - Latest release: about 3 years ago - 1 dependent repositories - 123 downloads last month - 5 stars on GitHub - 2 maintainers
zvdata 1.2.3 💰
an extendable library for recording and analyzing data
40 versions - Latest release: about 4 years ago - 1 dependent repositories - 141 downloads last month - 5 stars on GitHub - 1 maintainer
dframe-utils 0.0.2rc2
simple utility tools for dataframes in Python
2 versions - Latest release: over 6 years ago - 1 dependent repositories - 21 downloads last month - 4 stars on GitHub - 1 maintainer
tidytcells 2.1.1
Standardise TR/MH data
33 versions - Latest release: 3 months ago - 2 dependent packages - 1 dependent repositories - 292 downloads last month - 4 stars on GitHub - 1 maintainer
tocase 1.0.0
tocase provides an API to recase string into any case
5 versions - Latest release: over 2 years ago - 1 dependent repositories - 41 downloads last month - 4 stars on GitHub - 1 maintainer
banglanum2words 0.0.3
Converts a Bangla numeric string to literal words.
3 versions - Latest release: over 2 years ago - 1 dependent repositories - 65 downloads last month - 3 stars on GitHub - 1 maintainer
countrywrangler 0.2.7 removed
A library that simplifies the handling of country-related data. Easily standardize your data acco...
26 versions - Latest release: about 1 year ago - 1 dependent repositories - 1.06 thousand downloads last month - 3 stars on GitHub - 1 maintainer
contacts-harmony 0.0.4 removed
A Python library to normalize and validate email addresses and phone numbers entered into web forms.
4 versions - Latest release: over 1 year ago - 129 downloads last month - 2 stars on GitHub - 1 maintainer
mprows 0.1.5
multiprocessing on row data using user defined functions
7 versions - Latest release: over 5 years ago - 1 dependent repositories - 29 downloads last month - 2 stars on GitHub - 1 maintainer
dpyp 1.0.0
A pandas convenience wrapper for small-scale data pipelines
1 version - Latest release: 25 days ago - 202 downloads last month - 2 stars on GitHub - 1 maintainer
purifier 0.2.16
A simple scraping library.
19 versions - Latest release: almost 2 years ago - 99 downloads last month - 1 stars on GitHub - 1 maintainer
cleanflow 1.3.3a1
A a framework for cleaning, pre-processing and exploring data in a scalable and distributed manner.
11 versions - Latest release: about 6 years ago - 31 downloads last month - 1 stars on GitHub - 1 maintainer
ini2csv 1.0.0
A simple utility that converts and combines a folder of .ini files with identical keys into one c...
1 version - Latest release: 11 months ago - 12 downloads last month - 1 stars on GitHub - 2 maintainers
imputerapi 0.0.3
Data Imputer API
3 versions - Latest release: almost 3 years ago - 1 dependent repositories - 32 downloads last month - 0 stars on GitHub - 1 maintainer
sparx 0.0.2
Sparx is a simplified data munging, wrangling and preparation library
2 versions - Latest release: over 5 years ago - 1 dependent repositories - 13 downloads last month - 0 stars on GitLab.com - 3 maintainers
vaquero 0.0.5
A library for iterative and interactive data wrangling
5 versions - Latest release: over 6 years ago - 2 dependent repositories - 7 downloads last month - 0 stars on GitHub - 1 maintainer
texy 0.1.0
Supercharge text processing
5 versions - Latest release: over 1 year ago - 80 downloads last month - 0 stars on GitHub - 1 maintainer
string-treatment 1.0.1
String treatment package
12 versions - Latest release: 9 months ago - 64 downloads last month - 0 stars on GitHub - 1 maintainer
completely 0.1.0
A simple tool to measure data completeness
1 version - Latest release: over 3 years ago - 3 dependent repositories - 160 downloads last month - 0 stars on GitHub - 1 maintainer
pippi-lang 0.0.2
A simple package to create elegant nlp pipelines using sklearn.
2 versions - Latest release: over 1 year ago - 23 downloads last month - 0 stars on GitHub - 1 maintainer
Related Keywords
data-science 33 python 26 machine-learning 22 data-quality 16 data-curation 15 data-centric-ai 15 deep-learning 13 data-analysis 13 computer-vision 12 active-learning 11 image-classification 10 visualization 10 data 10 data-preprocessing 10 object-detection 9 data-validation 9 developer-tools 8 artificial-intelligence 8 unstructured-data 8 vector-search 8 data-profiling 7 data-wrangling 7 pandas 7 data-processing 7 noisy-labels 6 outlier-detection 6 python3 6 nlp 5 preprocessing 5 pandas-dataframe 4 data-pipeline 4 spark 4 pyspark 4 data-labeling 4 data-cleansing 4 data-exploration 4 postgresql 4 data-visualization 4 pytorch 3 annotations 3 learning 3 data-preparation 3 llm 3 dataframe 3 database 3 weak-supervision 3 natural-language-processing 3 llms 3 exploratory-data-analysis 3 pipeline 3 data-engineering 3 feature-engineering 3 sql-queries 2 snowplow 2 csv 2 retention-analysis 2 robust-machine-learning 2 product-analytics 2 data-mining 2 pandas-python 2 pandas-library 2 instrumentation-testing 2 instrumentation-libraries 2 instrumentation 2 datascience 2 data-modeling 2 record-linkage 2 deduplication 2 dedupe 2 regex 2 cleanlab 2 automl 2 data-cleaning-pipeline 2 automated 2 model-deployment 2 structured-data 2 text-classification 2 eda 2 scraper 2 data-valuation 2 scikit-learn 2 game-theory 2 machine 2 big-data-cleaning 2 datacleaner 2 schema 2 etl 2 out-of-distribution-detection 2 labeling 2 datasets 2 dataquality 2 dataops 2 annotation 2 datacentric 2 datacentric_ai 2 unsupervised_learning 2 learning_with_noisy_labels 2 weak_supervision 2 classification 2 confident_learning 2