Ecosyste.ms: Packages
An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.
pypi.org "data-cleaning" keyword
sofine 0.2.4
Lightweight framework for creating data-collection plugins and chaining together calls to them, f...10 versions - Latest release: over 9 years ago - 2 dependent repositories - 23 downloads last month - 7 stars on GitHub - 1 maintainer
superdeduper 0.1.7
A simple interface to datamade/dedupe to make probabilistic record linkage easy.7 versions - Latest release: about 7 years ago - 1 dependent repositories - 73 downloads last month - 42 stars on GitHub - 1 maintainer
pgdedupe 0.2.1
A simple interface to datamade/dedupe to make probabilistic record linkage easy.2 versions - Latest release: about 7 years ago - 1 dependent repositories - 26 downloads last month - 42 stars on GitHub - 1 maintainer
dframe-utils 0.0.2rc2
simple utility tools for dataframes in Python2 versions - Latest release: over 6 years ago - 1 dependent repositories - 21 downloads last month - 4 stars on GitHub - 1 maintainer
vaquero 0.0.5
A library for iterative and interactive data wrangling5 versions - Latest release: over 6 years ago - 2 dependent repositories - 7 downloads last month - 0 stars on GitHub - 1 maintainer
cleanflow 1.3.3a1
A a framework for cleaning, pre-processing and exploring data in a scalable and distributed manner.11 versions - Latest release: about 6 years ago - 31 downloads last month - 1 stars on GitHub - 1 maintainer
pypandas 0.2.5
A data cleaning framework for Spark7 versions - Latest release: about 6 years ago - 3 dependent repositories - 251 downloads last month - 7 stars on GitHub - 1 maintainer
mprows 0.1.5
multiprocessing on row data using user defined functions7 versions - Latest release: over 5 years ago - 1 dependent repositories - 29 downloads last month - 2 stars on GitHub - 1 maintainer
sparx 0.0.2
Sparx is a simplified data munging, wrangling and preparation library2 versions - Latest release: over 5 years ago - 1 dependent repositories - 13 downloads last month - 0 stars on GitLab.com - 3 maintainers
learn2clean 0.2.1
Python Library for Data Preprocessing with Reinforcement Learning.1 version - Latest release: about 5 years ago - 1 dependent repositories - 21 downloads last month - 43 stars on GitHub - 1 maintainer
vulcanai 1.0.8
A high-level framework built on top of Pytorch using added functionality from Scikit-learn to pro...10 versions - Latest release: over 4 years ago - 1 dependent repositories - 47 downloads last month - 16 stars on GitHub - 2 maintainers
boltzmannclean 0.1.2
Fill missing values in DataFrames with Restricted Boltzmann Machines1 version - Latest release: over 4 years ago - 1 dependent repositories - 28 downloads last month - 23 stars on GitHub - 4 maintainers
ipydataclean 0.2.2
Interactive cleaning for pandas DataFrames1 version - Latest release: over 4 years ago - 1 dependent repositories - 19 downloads last month - 15 stars on GitHub - 4 maintainers
zvdata 1.2.3 💰
an extendable library for recording and analyzing data40 versions - Latest release: about 4 years ago - 1 dependent repositories - 141 downloads last month - 5 stars on GitHub - 1 maintainer
datasetops 0.0.6
Fluent dataset operations, compatible with your favorite libraries4 versions - Latest release: about 4 years ago - 4 dependent repositories - 57 downloads last month - 10 stars on GitHub - 1 maintainer
redditcleaner 1.1.2
Clean Reddit Text Data4 versions - Latest release: about 4 years ago - 1 dependent package - 2 dependent repositories - 77 downloads last month - 75 stars on GitHub - 1 maintainer
Top 4.8% on pypi.org
83 versions - Latest release: almost 4 years ago - 8 dependent repositories - 10.7 thousand downloads last month - 1,441 stars on GitHub - 2 maintainers
optimuspyspark 2.2.32
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion wi...83 versions - Latest release: almost 4 years ago - 8 dependent repositories - 10.7 thousand downloads last month - 1,441 stars on GitHub - 2 maintainers
scikit-clean 0.1.2
A collection of algorithms for detecting and handling label noise6 versions - Latest release: almost 4 years ago - 1 dependent repositories - 100 downloads last month - 13 stars on GitHub - 1 maintainer
completely 0.1.0
A simple tool to measure data completeness1 version - Latest release: over 3 years ago - 3 dependent repositories - 160 downloads last month - 0 stars on GitHub - 1 maintainer
fiftyone-db-ubuntu1604 0.3.0
FiftyOne DB5 versions - Latest release: over 3 years ago - 1 dependent repositories - 35 downloads last month - 6,627 stars on GitHub - 2 maintainers
plane 0.2.1 💰
A lib for text preprocessing20 versions - Latest release: over 3 years ago - 3 dependent repositories - 635 downloads last month - 11 stars on GitHub - 1 maintainer
data-cleaning 1.0.1
An utility to clean the data and return you the cleaned data2 versions - Latest release: about 3 years ago - 1 dependent repositories - 123 downloads last month - 5 stars on GitHub - 2 maintainers
quantclean 0.0.2
Quantclean is a program that reformats every financial dataset to US Equity TradeBar2 versions - Latest release: about 3 years ago - 1 dependent repositories - 26 downloads last month - 16 stars on GitHub - 1 maintainer
Top 5.5% on pypi.org
18 versions - Latest release: almost 3 years ago - 26 dependent repositories - 898 downloads last month - 373 stars on GitHub - 1 maintainer
nonechucks 0.4.2
nonechucks is a library that provides wrappers for PyTorch's datasets, samplers, and transforms t...18 versions - Latest release: almost 3 years ago - 26 dependent repositories - 898 downloads last month - 373 stars on GitHub - 1 maintainer
imputerapi 0.0.3
Data Imputer API3 versions - Latest release: almost 3 years ago - 1 dependent repositories - 32 downloads last month - 0 stars on GitHub - 1 maintainer
tocase 1.0.0
tocase provides an API to recase string into any case5 versions - Latest release: over 2 years ago - 1 dependent repositories - 41 downloads last month - 4 stars on GitHub - 1 maintainer
fiftyone-eval-only 0.14.3
FiftyOne, for evaluation only.1 version - Latest release: over 2 years ago - 1 dependent repositories - 16 downloads last month - 6,627 stars on GitHub - 1 maintainer
banglanum2words 0.0.3
Converts a Bangla numeric string to literal words.3 versions - Latest release: over 2 years ago - 1 dependent repositories - 65 downloads last month - 3 stars on GitHub - 1 maintainer
purifier 0.2.16
A simple scraping library.19 versions - Latest release: almost 2 years ago - 99 downloads last month - 1 stars on GitHub - 1 maintainer
cleanlab-cli 0.1.14
Command line interface for all things Cleanlab Studio16 versions - Latest release: over 1 year ago - 129 downloads last month - 20 stars on GitHub - 3 maintainers
Top 9.4% on pypi.org
32 versions - Latest release: over 1 year ago - 1 dependent repositories - 371 downloads last month - 1,441 stars on GitHub - 2 maintainers
pyoptimus 0.1.0
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.32 versions - Latest release: over 1 year ago - 1 dependent repositories - 371 downloads last month - 1,441 stars on GitHub - 2 maintainers
fiftyone-db-debian9 0.4.0
FiftyOne DB6 versions - Latest release: over 1 year ago - 1 dependent repositories - 43 downloads last month - 6,627 stars on GitHub - 2 maintainers
fiftyone-db-rhel7 0.4.0
FiftyOne DB3 versions - Latest release: over 1 year ago - 1 dependent repositories - 37 downloads last month - 6,627 stars on GitHub - 1 maintainer
fiftyone-db-ubuntu2004 0.4.0
FiftyOne DB1 version - Latest release: over 1 year ago - 1 dependent repositories - 27 downloads last month - 6,627 stars on GitHub - 1 maintainer
texy 0.1.0
Supercharge text processing5 versions - Latest release: over 1 year ago - 80 downloads last month - 0 stars on GitHub - 1 maintainer
objectiv-bach 0.0.28
Objectiv Bach provides Pandas-like DataFrames backed by SQL33 versions - Latest release: over 1 year ago - 1 dependent package - 1 dependent repositories - 248 downloads last month - 468 stars on GitHub - 1 maintainer
objectiv-modelhub 0.0.28
The open model hub is a growing collection of data models that you can take, combine and run for ...33 versions - Latest release: over 1 year ago - 1 dependent repositories - 236 downloads last month - 468 stars on GitHub - 1 maintainer
pippi-lang 0.0.2
A simple package to create elegant nlp pipelines using sklearn.2 versions - Latest release: over 1 year ago - 23 downloads last month - 0 stars on GitHub - 1 maintainer
contacts-harmony 0.0.4 removed
A Python library to normalize and validate email addresses and phone numbers entered into web forms.4 versions - Latest release: over 1 year ago - 129 downloads last month - 2 stars on GitHub - 1 maintainer
mercury-dataschema 0.0.1
Mercury's DataSchema package allows the automatic recognition and validation of feature types.1 version - Latest release: about 1 year ago - 2 dependent packages - 199 downloads last month - 11 stars on GitHub - 1 maintainer
countrywrangler 0.2.7 removed
A library that simplifies the handling of country-related data. Easily standardize your data acco...26 versions - Latest release: about 1 year ago - 1 dependent repositories - 1.06 thousand downloads last month - 3 stars on GitHub - 1 maintainer
fiftyone-db-ubuntu2204 0.4.0
FiftyOne DB1 version - Latest release: about 1 year ago - 7.1 thousand downloads last month - 6,627 stars on GitHub - 1 maintainer
ini2csv 1.0.0
A simple utility that converts and combines a folder of .ini files with identical keys into one c...1 version - Latest release: 11 months ago - 12 downloads last month - 1 stars on GitHub - 2 maintainers
uniflow 0.1.0
Unified interface for pre-training data augmentation and post-training evaluation of Large Langua...33 versions - Latest release: 10 months ago - 1 dependent repositories - 499 downloads last month - 116 stars on GitHub - 1 maintainer
Top 3.5% on pypi.org
61 versions - Latest release: 10 months ago - 1 dependent package - 12 dependent repositories - 23.9 thousand downloads last month - 478 stars on GitHub - 1 maintainer
klib 1.1.2 💰
Customized data preprocessing functions for frequent tasks.61 versions - Latest release: 10 months ago - 1 dependent package - 12 dependent repositories - 23.9 thousand downloads last month - 478 stars on GitHub - 1 maintainer
string-treatment 1.0.1
String treatment package12 versions - Latest release: 9 months ago - 64 downloads last month - 0 stars on GitHub - 1 maintainer
data-purifier 0.3.6
A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning and Automated D...35 versions - Latest release: 8 months ago - 1 dependent repositories - 285 downloads last month - 41 stars on GitHub - 1 maintainer
pytrack-lib 2.0.8
a Map-Matching-based Python Toolbox for Vehicle Trajectory Reconstruction13 versions - Latest release: 8 months ago - 1 dependent repositories - 242 downloads last month - 56 stars on GitHub - 1 maintainer
opendataval 1.3.0
Transparent Data Valuation5 versions - Latest release: 6 months ago - 167 downloads last month - 68 stars on GitHub - 1 maintainer
sliceguard 0.0.35
A library for detecting critical data slices in structured and unstructured data based on feature...33 versions - Latest release: 6 months ago - 1 dependent package - 1 dependent repositories - 210 downloads last month - 51 stars on GitHub - 1 maintainer
skrub 0.5.0
Prepping tables for machine learning4 versions - Latest release: 5 months ago - 545 downloads last month - 1,018 stars on GitHub - 4 maintainers
marshmallow-pyspark 0.2.4
PySpark data serializer6 versions - Latest release: 5 months ago - 1 dependent repositories - 2.14 thousand downloads last month - 12 stars on GitHub - 1 maintainer
Top 10.0% on pypi.org
76 versions - Latest release: 4 months ago - 1 dependent repositories - 512 downloads last month - 420 stars on GitHub - 1 maintainer
encord-active 0.1.83
Enable users to improve machine learning models in an active learning fashion via data, label, an...76 versions - Latest release: 4 months ago - 1 dependent repositories - 512 downloads last month - 420 stars on GitHub - 1 maintainer
Top 9.2% on pypi.org
27 versions - Latest release: 4 months ago - 3 dependent repositories - 2.05 thousand downloads last month - 113 stars on GitHub - 1 maintainer
pythresh 0.3.6
A Python Toolbox for Outlier Detection Thresholding27 versions - Latest release: 4 months ago - 3 dependent repositories - 2.05 thousand downloads last month - 113 stars on GitHub - 1 maintainer
opuscleaner 0.4.1
OpusCleaner is a web interface that helps you select, clean and schedule your data for training m...7 versions - Latest release: 3 months ago - 1 dependent repositories - 707 downloads last month - 35 stars on GitHub - 1 maintainer
scribe-data 3.2.2
Wikidata and Wikipedia language data extraction11 versions - Latest release: 3 months ago - 1 dependent repositories - 52 downloads last month - 16 stars on GitHub - 1 maintainer
tidytcells 2.1.1
Standardise TR/MH data33 versions - Latest release: 3 months ago - 2 dependent packages - 1 dependent repositories - 292 downloads last month - 4 stars on GitHub - 1 maintainer
example-package-elisno 2.6.24
The standard package for data-centric AI, machine learning with label errors, and automatically f...7 versions - Latest release: 3 months ago - 65 downloads last month - 8,808 stars on GitHub - 1 maintainer
Top 2.2% on pypi.org
21 versions - Latest release: 2 months ago - 1 dependent package - 36 dependent repositories - 54.2 thousand downloads last month - 6,627 stars on GitHub - 3 maintainers
fiftyone-db 1.1.2
FiftyOne DB21 versions - Latest release: 2 months ago - 1 dependent package - 36 dependent repositories - 54.2 thousand downloads last month - 6,627 stars on GitHub - 3 maintainers
text2term 4.1.3
A tool for mapping free-text descriptions of (biomedical) entities to controlled terms in ontologies23 versions - Latest release: about 2 months ago - 1 dependent repositories - 210 downloads last month - 11 stars on GitHub - 2 maintainers
Top 7.2% on pypi.org
60 versions - Latest release: about 1 month ago - 1 dependent package - 1 dependent repositories - 1.9 thousand downloads last month - 6,627 stars on GitHub - 4 maintainers
fiftyone-desktop 0.33.7
FiftyOne Desktop60 versions - Latest release: about 1 month ago - 1 dependent package - 1 dependent repositories - 1.9 thousand downloads last month - 6,627 stars on GitHub - 4 maintainers
desbordante 2.0.0
Science-intensive high-performance data profiler3 versions - Latest release: about 1 month ago - 1 dependent package - 427 downloads last month - 62 stars on GitHub - 1 maintainer
dpyp 1.0.0
A pandas convenience wrapper for small-scale data pipelines1 version - Latest release: 27 days ago - 202 downloads last month - 2 stars on GitHub - 1 maintainer
selfclean 0.0.22
A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates a...21 versions - Latest release: 18 days ago - 618 downloads last month - 9 stars on GitHub - 1 maintainer
cleanlab-studio 2.0.4
Client interface for all things Cleanlab Studio79 versions - Latest release: 13 days ago - 1 dependent repositories - 2.95 thousand downloads last month - 21 stars on GitHub - 4 maintainers
pydvl 0.9.2
The Python Data Valuation Library14 versions - Latest release: 12 days ago - 441 downloads last month - 65 stars on GitHub - 2 maintainers
Top 2.1% on pypi.org
29 versions - Latest release: 12 days ago - 11 dependent packages - 19 dependent repositories - 26.2 thousand downloads last month - 8,808 stars on GitHub - 4 maintainers
cleanlab 2.6.4
The standard package for data-centric AI, machine learning with label errors, and automatically f...29 versions - Latest release: 12 days ago - 11 dependent packages - 19 dependent repositories - 26.2 thousand downloads last month - 8,808 stars on GitHub - 4 maintainers
Top 1.4% on pypi.org
91 versions - Latest release: 5 days ago - 97 dependent packages - 229 dependent repositories - 2.53 million downloads last month - 3,009 stars on GitHub - 3 maintainers
pandera 0.19.3 💰
A light-weight and flexible data validation and testing tool for statistical data objects.91 versions - Latest release: 5 days ago - 97 dependent packages - 229 dependent repositories - 2.53 million downloads last month - 3,009 stars on GitHub - 3 maintainers
bunkatopics 0.46.1
Bunkatopics is a Topic Modeling package and Exploration Module40 versions - Latest release: 5 days ago - 435 downloads last month - 97 stars on GitHub - 1 maintainer
urlgenie 1.0.0
Tool to make URL extraction, generalization, validation, and filtration easy.1 version - Latest release: about 23 hours ago
Related Keywords
data-science
33
python
26
machine-learning
22
data-quality
16
data-curation
16
data-centric-ai
15
deep-learning
13
data-analysis
13
computer-vision
12
active-learning
11
data
10
visualization
10
data-preprocessing
10
image-classification
10
data-validation
9
object-detection
9
developer-tools
8
artificial-intelligence
8
unstructured-data
8
data-processing
8
vector-search
8
pandas
7
data-wrangling
7
data-profiling
7
outlier-detection
6
python3
6
noisy-labels
6
nlp
5
preprocessing
5
data-cleansing
5
pyspark
4
spark
4
data-exploration
4
data-visualization
4
data-pipeline
4
postgresql
4
pandas-dataframe
4
data-labeling
4
database
3
feature-engineering
3
learning
3
pytorch
3
pipeline
3
annotations
3
dataframe
3
data-preparation
3
exploratory-data-analysis
3
data-engineering
3
llm
3
weak-supervision
3
natural-language-processing
3
llms
3
feature-selection
2
annotation
2
sql-queries
2
out-of-distribution-detection
2
retention-analysis
2
datacleaner
2
snowplow
2
instrumentation-testing
2
pandas-library
2
pandas-python
2
product-analytics
2
big-data-cleaning
2
bigdata
2
cudf
2
dask
2
dask-cudf
2
data-cleaner
2
data-extraction
2
data-transformation
2
labeling
2
robust-machine-learning
2
datasets
2
dataquality
2
dataops
2
regex
2
analytics-sdk
2
analytics-platform
2
eda
2
text-classification
2
structured-data
2
etl
2
dirty-data
2
data-mining
2
model-deployment
2
schema
2
csv
2
automl
2
record-linkage
2
deduplication
2
dedupe
2
cleanlab
2
game-theory
2
data-valuation
2
scraper
2
datacentric
2
datacentric_ai
2
unsupervised_learning
2
learning_with_noisy_labels
2