Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "data cleaning" keyword

punctuationstripper 0.0.1
A package to strip punctuation from text via a file or a basic string.
1 version - Latest release: about 1 month ago - 16 downloads last month - 0 stars on GitHub - 1 maintainer
dataanalysistoolkit 1.2.2
The DataAnalysisToolkit project is a Python-based data analysis tool designed to streamline vario...
7 versions - Latest release: 9 days ago - 323 downloads last month - 2 stars on GitHub - 1 maintainer
buildml 1.0.9 💰
Let's make building machine learning models the complex way, easy.
10 versions - Latest release: 4 months ago - 93 downloads last month - 3 stars on GitHub - 1 maintainer
adcl 0.1.7
Data preprocessing and cleaning tools for data science projects
8 versions - Latest release: 16 days ago - 139 downloads last month - 0 stars on GitHub - 1 maintainer
text-prettifier 1.1.2
A Python library for cleaning and preprocessing text data by removing,emojies,internet words, spe...
2 versions - Latest release: 15 days ago - 1 maintainer
missing-value 0.2.5
Python class for imputing missing values in data columns using various imputation strategies base...
1 version - Latest release: 2 months ago - 38 downloads last month - 1 maintainer
feature-engineering 2.1.4
Unleash the Power of Your Data with Feature Engineering: The Ultimate Python Library for Machine ...
3 versions - Latest release: about 1 month ago - 118 downloads last month - 0 stars on GitHub - 1 maintainer
wisconsinsc-cleaner 0.0.2
A tool to clean and uniform annotation files from Wisconsin Sleep Cohort (WSC), distributed by NSRR
2 versions - Latest release: 3 months ago - 16 downloads last month - 0 stars on GitHub - 1 maintainer
adaptivebridge 1.1.0
Revolutionizing ML adaptive modelling for handling missing features and data. The model can predi...
5 versions - Latest release: 4 months ago - 48 downloads last month - 1 stars on GitHub - 1 maintainer
winner 0.0.1
A python package for data-centric MLOps for data cleaning and feature engineering
2 versions - Latest release: 8 months ago - 1 dependent repositories - 17 downloads last month - 1 maintainer
featurebridge 0.9.5 removed
FeatureBridge: Revolutionizing ML adaptive modelling for handling missing features and data. The ...
3 versions - Latest release: 8 months ago - 191 downloads last month - 0 stars on GitHub - 1 maintainer
limpo 0.0.3
Cleaning Pandas dataframes
3 versions - Latest release: 12 months ago - 30 downloads last month - 1 maintainer
cleanmydata 1.0.6
Library for data cleaning operations
6 versions - Latest release: over 1 year ago - 36 downloads last month - 0 stars on GitHub - 1 maintainer
toolstack 0.1.5
A collection of useful tools to speed-up the data processing, cleaning and pipelining.
3 versions - Latest release: over 3 years ago - 1 dependent repositories - 34 downloads last month - 0 stars on GitHub - 1 maintainer
outlier-adityavashista-101703039 0.1
It is an outlier detection and removal cmd program that detects outliers in a given file and stor...
1 version - Latest release: almost 4 years ago - 1 dependent repositories - 4 downloads last month - 0 stars on GitHub - 1 maintainer
openclean-notebook 0.1.7
openclean Notebook UI Package
8 versions - Latest release: almost 3 years ago - 1 dependent repositories - 46 downloads last month - 1 maintainer
openclean 0.2.1
Library for data cleaning and data profiling
3 versions - Latest release: almost 3 years ago - 1 dependent repositories - 507 downloads last month - 61 stars on GitHub - 1 maintainer
neatmartinet 0.2.6
Ad-hoc pandas dataframe and string cleaning functions implemented in unreadable Python.
3 versions - Latest release: over 6 years ago - 1 dependent repositories - 14 downloads last month - 0 stars on GitHub - 1 maintainer
neatdata 0.18
Cleaning code
17 versions - Latest release: over 6 years ago - 1 dependent repositories - 50 downloads last month - 0 stars on GitHub - 1 maintainer
mprows 0.1.5
multiprocessing on row data using user defined functions
7 versions - Latest release: over 5 years ago - 1 dependent repositories - 29 downloads last month - 2 stars on GitHub - 1 maintainer
maddress 1.0.0a0
Testing installation of Package
1 version - Latest release: almost 3 years ago - 1 dependent repositories - 9 downloads last month - 1 stars on GitHub - 1 maintainer
Top 6.7% on pypi.org
handyspark 0.2.2a1
HandySpark - bringing pandas-like capabilities to Spark dataframes
7 versions - Latest release: almost 5 years ago - 3 dependent repositories - 156 thousand downloads last month - 183 stars on GitHub - 1 maintainer
duplicatesuricate 0.4.2
Entity resolution algorithm implemented with scikit-learn
8 versions - Latest release: over 6 years ago - 1 dependent repositories - 11 downloads last month - 1 maintainer
datawiz 0.90
DataWiz helps new data learners, hobbyist and industry practitioners write Machine Learning code ...
11 versions - Latest release: almost 5 years ago - 1 dependent repositories - 33 downloads last month - 1 maintainer
btext 1.0
Bear Au Jus Text (btext) is a tool used for processing a text/string, optimized for data science ...
1 version - Latest release: over 3 years ago - 19 downloads last month - 2 stars on GitHub - 1 maintainer
badfish 0.1.2
Badfish - A missing data analysis and wrangling library in Python
4 versions - Latest release: over 7 years ago - 1 dependent repositories - 53 downloads last month - 19 stars on GitHub - 1 maintainer
dhcdatacleaner 0.1.0
DHC Python tool that automatically cleans data sets and readies them for analysis.
1 version - Latest release: about 6 years ago - 1 dependent repositories - 8 downloads last month - 1 maintainer
pythonml 1.02
This hands-on Library is practical for AI and ML which facilitates viewing Dataset Integrity Repo...
3 versions - Latest release: over 3 years ago - 1 dependent repositories - 28 downloads last month - 1 stars on GitHub - 1 maintainer
refineryframe 0.2.2
Cleans data, best to be used as a part of initial preprocessor
41 versions - Latest release: 9 months ago - 1 dependent repositories - 222 downloads last month - 0 stars on GitHub - 1 maintainer
mlwizard 1.0.1 removed
Let's make building machine learning models the complex way, easy.
1 version - Latest release: 4 months ago
hammeroflight 2.1.2 removed
This hands-on Library is practical for AI and ML which facilitates viewing Dataset Integrity Repo...
42 versions - Latest release: almost 4 years ago - 1 dependent repositories - 18 downloads last month - 1 stars on GitHub - 1 maintainer
parallelfileconcatenator 0.1 removed
ParallelFileConcatenator is a robust tool designed to efficiently combine data files of various f...
1 version - Latest release: 9 months ago
Related Keywords
machine learning 12 data science 11 python 9 data analysis 9 data preprocessing 7 pandas 6 classification 6 regression 5 data exploration 5 feature engineering 4 imputation 4 statistics 4 data wrangling 4 predictive modeling 3 missing data 3 data 3 scikit-learn 3 data visualization 3 text cleaning 3 data-science 3 missing data analysis 2 data quality 2 data imputation 2 AI 2 artificial intelligence 2 accuracy scores 2 goodness of fit 2 dataset quality report 2 analytics 2 data manipulation 2 data processing 2 python library 2 sklearn 2 ML algo comparison 2 data handling 2 data integrity 2 data cleansing 2 data validation 2 data completeness 2 impute missing values 2 data missingness 2 missing data detection 2 data quality assessment 2 data pre-processing tool 2 data engineering 2 data preparation 2 cleaning 2 big data 2 data profiling 2 ML framework 2 text normalization 2 supervised learning 2 machine learning toolkit 2 machine-learning 2 NLP 2 string manipulation 2 entity matching 1 entity resolution 1 record linkage 1 Data Engineering 1 python-script 1 btext 1 bearaujus 1 bearaujus btext 1 basic string 1 text procesing 1 file processing 1 pyspark 1 outlier-detection 1 exploratory-data-analysis 1 exploratory data analysis 1 visualization 1 python3 1 spark 1 geolocation 1 address 1 phone number 1 email 1 maddress 1 pypi 1 numpy-arrays 1 file management 1 data compression 1 data deduplication 1 data merging 1 data aggregation 1 parallel file concatenation 1 safeguards 1 punctuation 1 CSV 1 matplotlib 1 csv 1 missing 1 data science in pandas 1 data cleaning in pandas 1 data analysis in pandas 1 string lemming in pandas 1 string normalization in pandas 1 string cleaning in pandas 1 string procesing in pandas 1