Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "data-cleaning" keyword

ipydataclean 0.2.2
Interactive cleaning for pandas DataFrames
1 version - Latest release: over 4 years ago - 1 dependent repositories - 19 downloads last month - 15 stars on GitHub - 8 maintainers
selfclean 0.0.22
A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates a...
21 versions - Latest release: 1 day ago - 618 downloads last month - 9 stars on GitHub - 2 maintainers
tocase 1.0.0
tocase provides an API to recase string into any case
5 versions - Latest release: over 2 years ago - 1 dependent repositories - 41 downloads last month - 4 stars on GitHub - 2 maintainers
Top 9.4% on pypi.org
pyoptimus 0.1.0
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.
32 versions - Latest release: over 1 year ago - 1 dependent repositories - 339 downloads last month - 1,441 stars on GitHub - 2 maintainers
learn2clean 0.2.1
Python Library for Data Preprocessing with Reinforcement Learning.
1 version - Latest release: about 5 years ago - 1 dependent repositories - 21 downloads last month - 43 stars on GitHub - 1 maintainer
cleanlab-studio 2.0.3
Client interface for all things Cleanlab Studio
78 versions - Latest release: about 23 hours ago - 1 dependent repositories - 3.19 thousand downloads last month - 20 stars on GitHub - 5 maintainers
sparx 0.0.2
Sparx is a simplified data munging, wrangling and preparation library
2 versions - Latest release: over 5 years ago - 1 dependent repositories - 13 downloads last month - 0 stars on GitLab.com - 6 maintainers
string-treatment 1.0.1
String treatment package
12 versions - Latest release: 8 months ago - 64 downloads last month - 0 stars on GitHub - 1 maintainer
example-package-elisno 2.6.24
The standard package for data-centric AI, machine learning with label errors, and automatically f...
7 versions - Latest release: about 2 months ago - 53 downloads last month - 8,667 stars on GitHub - 1 maintainer
Top 2.1% on pypi.org
cleanlab 2.6.3
The standard package for data-centric AI, machine learning with label errors, and automatically f...
28 versions - Latest release: about 1 month ago - 8 dependent packages - 19 dependent repositories - 17.3 thousand downloads last month - 8,667 stars on GitHub - 5 maintainers
redditcleaner 1.1.2
Clean Reddit Text Data
4 versions - Latest release: about 4 years ago - 1 dependent package - 2 dependent repositories - 77 downloads last month - 75 stars on GitHub - 2 maintainers
zvdata 1.2.3 💰
an extendable library for recording and analyzing data
40 versions - Latest release: about 4 years ago - 1 dependent repositories - 141 downloads last month - 5 stars on GitHub - 2 maintainers
bunkatopics 0.43.1
Bunkatopics is a Topic Modeling package and Exploration Module
39 versions - Latest release: 4 months ago - 332 downloads last month - 77 stars on GitHub - 1 maintainer
dpyp 1.0.0
A pandas convenience wrapper for small-scale data pipelines
1 version - Latest release: 10 days ago - 190 downloads last month - 2 stars on GitHub - 2 maintainers
skrub 0.5.0
Prepping tables for machine learning
4 versions - Latest release: 5 months ago - 441 downloads last month - 1,011 stars on GitHub - 8 maintainers
imputerapi 0.0.3
Data Imputer API
3 versions - Latest release: almost 3 years ago - 1 dependent repositories - 32 downloads last month - 0 stars on GitHub - 2 maintainers
Top 3.5% on pypi.org
klib 1.1.2 💰
Customized data preprocessing functions for frequent tasks.
61 versions - Latest release: 9 months ago - 1 dependent package - 12 dependent repositories - 23.1 thousand downloads last month - 477 stars on GitHub - 1 maintainer
text2term 4.1.3
A tool for mapping free-text descriptions of (biomedical) entities to controlled terms in ontologies
23 versions - Latest release: about 1 month ago - 1 dependent repositories - 147 downloads last month - 11 stars on GitHub - 3 maintainers
tidytcells 2.1.1
Standardise TR/MH data
33 versions - Latest release: 2 months ago - 1 dependent package - 1 dependent repositories - 292 downloads last month - 4 stars on GitHub - 1 maintainer
boltzmannclean 0.1.2
Fill missing values in DataFrames with Restricted Boltzmann Machines
1 version - Latest release: over 4 years ago - 1 dependent repositories - 28 downloads last month - 23 stars on GitHub - 4 maintainers
pippi-lang 0.0.2
A simple package to create elegant nlp pipelines using sklearn.
2 versions - Latest release: about 1 year ago - 23 downloads last month - 0 stars on GitHub - 1 maintainer
dframe-utils 0.0.2rc2
simple utility tools for dataframes in Python
2 versions - Latest release: over 6 years ago - 1 dependent repositories - 21 downloads last month - 4 stars on GitHub - 1 maintainer
objectiv-modelhub 0.0.28
The open model hub is a growing collection of data models that you can take, combine and run for ...
33 versions - Latest release: over 1 year ago - 1 dependent repositories - 236 downloads last month - 468 stars on GitHub - 1 maintainer
objectiv-bach 0.0.28
Objectiv Bach provides Pandas-like DataFrames backed by SQL
33 versions - Latest release: over 1 year ago - 1 dependent package - 1 dependent repositories - 248 downloads last month - 468 stars on GitHub - 1 maintainer
ini2csv 1.0.0
A simple utility that converts and combines a folder of .ini files with identical keys into one c...
1 version - Latest release: 11 months ago - 12 downloads last month - 1 stars on GitHub - 4 maintainers
Top 1.4% on pypi.org
pandera 0.18.3 💰
A light-weight and flexible data validation and testing tool for statistical data objects.
86 versions - Latest release: about 2 months ago - 60 dependent packages - 229 dependent repositories - 2.76 million downloads last month - 2,979 stars on GitHub - 3 maintainers
Top 5.5% on pypi.org
nonechucks 0.4.2
nonechucks is a library that provides wrappers for PyTorch's datasets, samplers, and transforms t...
18 versions - Latest release: almost 3 years ago - 26 dependent repositories - 1.14 thousand downloads last month - 372 stars on GitHub - 2 maintainers
Top 2.2% on pypi.org
fiftyone-db 1.1.2
FiftyOne DB
21 versions - Latest release: about 2 months ago - 1 dependent package - 36 dependent repositories - 62.8 thousand downloads last month - 6,627 stars on GitHub - 4 maintainers
datasetops 0.0.6
Fluent dataset operations, compatible with your favorite libraries
4 versions - Latest release: about 4 years ago - 4 dependent repositories - 57 downloads last month - 10 stars on GitHub - 2 maintainers
cleanlab-cli 0.1.14
Command line interface for all things Cleanlab Studio
16 versions - Latest release: over 1 year ago - 129 downloads last month - 20 stars on GitHub - 6 maintainers
cleanflow 1.3.3a1
A a framework for cleaning, pre-processing and exploring data in a scalable and distributed manner.
11 versions - Latest release: almost 6 years ago - 31 downloads last month - 1 stars on GitHub - 2 maintainers
pydvl 0.9.1
The Python Data Valuation Library
13 versions - Latest release: 10 days ago - 288 downloads last month - 65 stars on GitHub - 3 maintainers
uniflow 0.1.0
Unified interface for pre-training data augmentation and post-training evaluation of Large Langua...
33 versions - Latest release: 9 months ago - 1 dependent repositories - 667 downloads last month - 106 stars on GitHub - 2 maintainers
banglanum2words 0.0.3
Converts a Bangla numeric string to literal words.
3 versions - Latest release: over 2 years ago - 1 dependent repositories - 65 downloads last month - 3 stars on GitHub - 2 maintainers
plane 0.2.1 💰
A lib for text preprocessing
20 versions - Latest release: over 3 years ago - 3 dependent repositories - 635 downloads last month - 11 stars on GitHub - 2 maintainers
scribe-data 3.2.2
Wikidata and Wikipedia language data extraction
11 versions - Latest release: 2 months ago - 1 dependent repositories - 46 downloads last month - 14 stars on GitHub - 1 maintainer
sofine 0.2.4
Lightweight framework for creating data-collection plugins and chaining together calls to them, f...
10 versions - Latest release: over 9 years ago - 2 dependent repositories - 23 downloads last month - 7 stars on GitHub - 2 maintainers
opuscleaner 0.4.1
OpusCleaner is a web interface that helps you select, clean and schedule your data for training m...
7 versions - Latest release: 2 months ago - 1 dependent repositories - 707 downloads last month - 35 stars on GitHub - 2 maintainers
Top 10.0% on pypi.org
encord-active 0.1.83
Enable users to improve machine learning models in an active learning fashion via data, label, an...
76 versions - Latest release: 3 months ago - 1 dependent repositories - 512 downloads last month - 420 stars on GitHub - 1 maintainer
desbordante 2.0.0
Science-intensive high-performance data profiler
3 versions - Latest release: 15 days ago - 138 downloads last month - 61 stars on GitHub - 1 maintainer
scikit-clean 0.1.2
A collection of algorithms for detecting and handling label noise
6 versions - Latest release: over 3 years ago - 1 dependent repositories - 18 downloads last month - 13 stars on GitHub - 2 maintainers
Top 7.2% on pypi.org
fiftyone-desktop 0.33.7
FiftyOne Desktop
60 versions - Latest release: 17 days ago - 1 dependent package - 1 dependent repositories - 1.33 thousand downloads last month - 6,627 stars on GitHub - 6 maintainers
fiftyone-db-ubuntu2204 0.4.0
FiftyOne DB
1 version - Latest release: 12 months ago - 6.7 thousand downloads last month - 6,627 stars on GitHub - 2 maintainers
fiftyone-db-ubuntu2004 0.4.0
FiftyOne DB
1 version - Latest release: over 1 year ago - 1 dependent repositories - 15 downloads last month - 6,627 stars on GitHub - 1 maintainer
fiftyone-db-debian9 0.4.0
FiftyOne DB
6 versions - Latest release: over 1 year ago - 1 dependent repositories - 10 downloads last month - 6,627 stars on GitHub - 2 maintainers
fiftyone-db-ubuntu1604 0.3.0
FiftyOne DB
5 versions - Latest release: over 3 years ago - 1 dependent repositories - 15 downloads last month - 6,619 stars on GitHub - 2 maintainers
fiftyone-eval-only 0.14.3
FiftyOne, for evaluation only.
1 version - Latest release: over 2 years ago - 1 dependent repositories - 6 downloads last month - 6,625 stars on GitHub - 2 maintainers
fiftyone-db-rhel7 0.4.0
FiftyOne DB
3 versions - Latest release: over 1 year ago - 1 dependent repositories - 17 downloads last month - 6,625 stars on GitHub - 1 maintainer
mercury-dataschema 0.0.1
Mercury's DataSchema package allows the automatic recognition and validation of feature types.
1 version - Latest release: about 1 year ago - 2 dependent packages - 92 downloads last month - 11 stars on GitHub - 2 maintainers
pgdedupe 0.2.1
A simple interface to datamade/dedupe to make probabilistic record linkage easy.
2 versions - Latest release: almost 7 years ago - 1 dependent repositories - 15 downloads last month - 42 stars on GitHub - 2 maintainers
pytrack-lib 2.0.8
a Map-Matching-based Python Toolbox for Vehicle Trajectory Reconstruction
13 versions - Latest release: 7 months ago - 1 dependent repositories - 241 downloads last month - 56 stars on GitHub - 1 maintainer
superdeduper 0.1.7
A simple interface to datamade/dedupe to make probabilistic record linkage easy.
7 versions - Latest release: about 7 years ago - 1 dependent repositories - 21 downloads last month - 42 stars on GitHub - 2 maintainers
data-purifier 0.3.6
A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning and Automated D...
35 versions - Latest release: 8 months ago - 1 dependent repositories - 145 downloads last month - 39 stars on GitHub - 2 maintainers
mprows 0.1.5
multiprocessing on row data using user defined functions
7 versions - Latest release: over 5 years ago - 1 dependent repositories - 9 downloads last month - 2 stars on GitHub - 2 maintainers
Top 4.8% on pypi.org
optimuspyspark 2.2.32
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion wi...
83 versions - Latest release: almost 4 years ago - 8 dependent repositories - 8.75 thousand downloads last month - 1,439 stars on GitHub - 4 maintainers
sliceguard 0.0.35
A library for detecting critical data slices in structured and unstructured data based on feature...
33 versions - Latest release: 5 months ago - 1 dependent package - 1 dependent repositories - 210 downloads last month - 51 stars on GitHub - 2 maintainers
texy 0.1.0
Supercharge text processing
5 versions - Latest release: over 1 year ago - 26 downloads last month - 0 stars on GitHub - 1 maintainer
purifier 0.2.16
A simple scraping library.
19 versions - Latest release: over 1 year ago - 5 downloads last month - 1 stars on GitHub - 1 maintainer
opendataval 1.3.0
Transparent Data Valuation
5 versions - Latest release: 6 months ago - 109 downloads last month - 64 stars on GitHub - 2 maintainers
quantclean 0.0.2
Quantclean is a program that reformats every financial dataset to US Equity TradeBar
2 versions - Latest release: almost 3 years ago - 1 dependent repositories - 15 downloads last month - 16 stars on GitHub - 1 maintainer
vaquero 0.0.5
A library for iterative and interactive data wrangling
5 versions - Latest release: about 6 years ago - 2 dependent repositories - 7 downloads last month - 0 stars on GitHub - 2 maintainers
vulcanai 1.0.8
A high-level framework built on top of Pytorch using added functionality from Scikit-learn to pro...
10 versions - Latest release: over 4 years ago - 1 dependent repositories - 18 downloads last month - 16 stars on GitHub - 4 maintainers
completely 0.1.0
A simple tool to measure data completeness
1 version - Latest release: over 3 years ago - 3 dependent repositories - 547 downloads last month - 0 stars on GitHub - 2 maintainers
Top 9.2% on pypi.org
pythresh 0.3.6
A Python Toolbox for Outlier Detection Thresholding
27 versions - Latest release: 3 months ago - 3 dependent repositories - 2.11 thousand downloads last month - 113 stars on GitHub - 1 maintainer
marshmallow-pyspark 0.2.4
PySpark data serializer
6 versions - Latest release: 4 months ago - 1 dependent repositories - 2.09 thousand downloads last month - 12 stars on GitHub - 2 maintainers
data-cleaning 1.0.1
An utility to clean the data and return you the cleaned data
2 versions - Latest release: about 3 years ago - 1 dependent repositories - 123 downloads last month - 5 stars on GitHub - 4 maintainers
pypandas 0.2.5
A data cleaning framework for Spark
7 versions - Latest release: almost 6 years ago - 3 dependent repositories - 237 downloads last month - 7 stars on GitHub - 2 maintainers
countrywrangler 0.2.7 removed
A library that simplifies the handling of country-related data. Easily standardize your data acco...
26 versions - Latest release: about 1 year ago - 1 dependent repositories - 1.06 thousand downloads last month - 3 stars on GitHub - 2 maintainers
contacts-harmony 0.0.4 removed
A Python library to normalize and validate email addresses and phone numbers entered into web forms.
4 versions - Latest release: about 1 year ago - 129 downloads last month - 2 stars on GitHub - 1 maintainer
Related Keywords
data-science 33 python 26 machine-learning 22 data-quality 16 data-centric-ai 15 data-curation 15 data-analysis 13 deep-learning 13 computer-vision 12 active-learning 11 image-classification 10 data-preprocessing 10 data 10 visualization 10 data-validation 9 object-detection 9 artificial-intelligence 8 developer-tools 8 unstructured-data 8 vector-search 8 pandas 7 data-processing 7 data-wrangling 7 data-profiling 7 python3 6 noisy-labels 6 outlier-detection 6 nlp 5 preprocessing 5 data-labeling 4 postgresql 4 data-cleansing 4 spark 4 pyspark 4 pandas-dataframe 4 data-exploration 4 data-visualization 4 data-pipeline 4 dataframe 3 llms 3 weak-supervision 3 natural-language-processing 3 pytorch 3 llm 3 data-engineering 3 learning 3 annotations 3 feature-engineering 3 exploratory-data-analysis 3 database 3 data-preparation 3 pipeline 3 record-linkage 2 data-analysis-python 2 bigquery 2 analytics-tracking 2 analytics-sdk 2 analytics-platform 2 game-theory 2 robust-machine-learning 2 csv 2 machine 2 scikit-learn 2 feature-selection 2 dirty-data 2 etl 2 scraper 2 deduplication 2 eda 2 dedupe 2 data-mining 2 schema 2 sql-queries 2 snowplow 2 retention-analysis 2 product-analytics 2 pandas-python 2 pandas-library 2 instrumentation-testing 2 instrumentation-libraries 2 instrumentation 2 datascience 2 data-modeling 2 data-valuation 2 big-data-cleaning 2 datacleaner 2 annotation 2 dask-cudf 2 dataops 2 dataquality 2 datasets 2 labeling 2 out-of-distribution-detection 2 regex 2 data-cleaner 2 structured-data 2 data-extraction 2 data-transformation 2 model-deployment 2 automl 2