An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "data-quality" keyword

View the packages on the pypi.org package registry that are tagged with the "data-quality" keyword.

truthound-orchestration 1.0.2
Official orchestration integrations for Truthound data quality framework
3 versions - Latest release: about 2 hours ago - 196 downloads last month - 1 maintainer
Top 1.0% on pypi.org
ydata-profiling 4.18.0
Generate profile report for pandas DataFrame
33 versions - Latest release: about 2 months ago - 43 dependent packages - 79 dependent repositories - 1.21 million downloads last month - 13,317 stars on GitHub - 1 maintainer
Top 6.0% on pypi.org
cleanvision 0.3.7
Find issues in image datasets
13 versions - Latest release: 1 day ago - 3 dependent packages - 2 dependent repositories - 4.85 thousand downloads last month - 1,129 stars on GitHub - 6 maintainers
validibot-cli 0.1.5
Command-line interface for Validibot - your automated data validation assistant.
5 versions - Latest release: about 8 hours ago - 391 downloads last month - 1 maintainer
Top 7.5% on pypi.org
feathr 1.0.0
An Enterprise-Grade, High Performance Feature Store
22 versions - Latest release: almost 3 years ago - 1 dependent repositories - 493 downloads last month - 1,908 stars on GitHub - 1 maintainer
Top 1.8% on pypi.org
fiftyone 1.11.0
FiftyOne: the open-source tool for building high-quality datasets and computer vision models
169 versions - Latest release: about 1 month ago - 12 dependent packages - 64 dependent repositories - 91.9 thousand downloads last month - 10,155 stars on GitHub - 4 maintainers
data-degradation-detector 1.0.6
A part of my TFM/Research project handles data drift
7 versions - Latest release: 6 months ago - 30 downloads last month - 0 stars on GitHub - 1 maintainer
Top 0.5% on pypi.org
pandas-profiling 3.6.6
Deprecated 'pandas-profiling' package, use 'ydata-profiling' instead
40 versions - Latest release: almost 3 years ago - 46 dependent packages - 1,970 dependent repositories - 221 thousand downloads last month - 12,108 stars on GitHub - 4 maintainers
Top 1.4% on pypi.org
evidently 0.7.19
Open-source tools to analyze, monitor, and debug machine learning model in production.
150 versions - Latest release: 1 day ago - 8 dependent packages - 340 dependent repositories - 1.85 million downloads last month - 6,922 stars on GitHub - 2 maintainers
odgs 1.2.1
The Open Data Governance Schema (ODGS) - A vendor-neutral standard for business definitions.
3 versions - Latest release: 21 days ago - 206 downloads last month - 1 maintainer
Top 1.0% on pypi.org
feast 0.58.0
Python SDK for Feast
143 versions - Latest release: 22 days ago - 13 dependent packages - 140 dependent repositories - 547 thousand downloads last month - 5,313 stars on GitHub - 5 maintainers
Top 0.7% on pypi.org
great-expectations 1.10.0
Always know what to expect from your data.
343 versions - Latest release: 19 days ago - 58 dependent packages - 284 dependent repositories - 22.4 million downloads last month - 9,420 stars on GitHub - 8 maintainers
armor-cli 0.1.0
AnomalyArmor SDK and CLI for data observability
1 version - Latest release: about 1 month ago - 31 downloads last month - 1 maintainer
sql-guard 0.0.3
A small package for data quality rules using SQL
3 versions - Latest release: 7 months ago - 21 downloads last month - 5 stars on GitHub - 1 maintainer
truthound 1.0.7
Zero-Configuration Data Quality Framework Powered by Polars
8 versions - Latest release: 1 day ago - 389 downloads last month - 0 stars on GitHub - 1 maintainer
truthound-dashboard 1.0.0
Open-source data quality dashboard - GX Cloud alternative
1 version - Latest release: 1 day ago - 1 maintainer
feathub 0.1.0
A stream-batch unified feature store for real-time machine learning
2 versions - Latest release: over 2 years ago - 7 downloads last month - 338 stars on GitHub - 1 maintainer
airflow-provider-great-expectations-cta 0.2.4
An Apache Airflow provider for Great Expectations
4 versions - Latest release: about 3 years ago - 40 downloads last month - 169 stars on GitHub - 1 maintainer
datacheck-cli 0.1.2
Lightweight data quality validation CLI tool
2 versions - Latest release: 2 days ago - 117 downloads last month - 1 maintainer
frameon 0.1.2
Frameon extends pandas DataFrame with analysis methods while keeping all original functionality i...
4 versions - Latest release: 5 months ago - 42 downloads last month - 2 stars on GitHub - 1 maintainer
mlready 0.1.0
ML readiness auditor for tabular data with safe normalization and reproducible cleaning recipes
1 version - Latest release: 2 days ago - 1 maintainer
sql-testing-library 0.22.0
SQL Testing Framework for Python: Unit test SQL queries with mock data injection for BigQuery, Sn...
21 versions - Latest release: 2 days ago - 12.6 thousand downloads last month - 39 stars on GitHub - 1 maintainer
rdatacompy 0.1.10
Lightning-fast dataframe comparison library built in Rust with Python bindings
5 versions - Latest release: 2 months ago - 254 downloads last month - 0 stars on GitHub - 1 maintainer
piperider-cli 0.1.3.12
PiperRider CLI
9 versions - Latest release: over 3 years ago - 1 dependent repositories - 26 downloads last month - 492 stars on GitHub - 1 maintainer
pydqkit 0.0.1
A developer-first Python toolkit for data quality profiling, validation, and interactive HTML rep...
1 version - Latest release: 3 days ago - 1 maintainer
daffy 2.4.0
Function decorators for DataFrame validation - columns, data types, and row-level validation with...
48 versions - Latest release: 4 days ago - 1 dependent repositories - 2.19 thousand downloads last month - 13 stars on GitHub - 1 maintainer
diqu 0.2.0
Data Quality CLI for the Auto-Alerts
12 versions - Latest release: over 1 year ago - 1 dependent package - 65 downloads last month - 18 stars on GitHub - 1 maintainer
diqu-email 1.0.0
Data Quality CLI for the Auto-Alerts - Emails
3 versions - Latest release: over 1 year ago - 20 downloads last month - 1 maintainer
Top 7.0% on pypi.org
tangled-up-in-unicode 0.2.0
Access to the Unicode Character Database (UCD)
9 versions - Latest release: over 4 years ago - 12 dependent packages - 722 dependent repositories - 677 thousand downloads last month - 3 stars on GitHub - 1 maintainer
hooqu 0.1.0
Data unit testing for your Python DataFrames
1 version - Latest release: over 5 years ago - 1 dependent repositories - 245 downloads last month - 29 stars on GitHub - 1 maintainer
edexplore 1.0.1
A simple widget for interactive EDA / QA for those who use Pandas in Jupyter Notebook.
1 version - Latest release: over 1 year ago - 10 downloads last month - 0 stars on GitHub - 1 maintainer
autocsv-profiler 2.0.0
Automated CSV data analysis with statistical profiling and visualization
2 versions - Latest release: 3 months ago - 22 downloads last month - 0 stars on GitHub - 1 maintainer
Top 9.9% on pypi.org
soda-spark 0.3.3
Soda SQL API for PySpark data frame
11 versions - Latest release: over 3 years ago - 1 dependent package - 1 dependent repositories - 6.22 thousand downloads last month - 64 stars on GitHub - 1 maintainer
weiser-ai 0.2.2
Enterprise-grade data quality framework with YAML configuration, LLM-friendly design, and advance...
19 versions - Latest release: 5 months ago - 118 downloads last month - 0 stars on GitHub - 1 maintainer
data-watchtower 0.0.5
Data quality inspection tool. Identify issues before your CTO detects them!
5 versions - Latest release: over 1 year ago - 39 downloads last month - 0 stars on GitHub - 1 maintainer
piperider-nightly 0.42.0.20250102
PiperRider CLI
706 versions - Latest release: about 1 year ago - 2.85 thousand downloads last month - 479 stars on GitHub - 1 maintainer
training-data-debugger 0.1.0
Find and fix issues in your ML training data - duplicates, label errors, outliers, and more
1 version - Latest release: 6 days ago - 1 maintainer
contessa 0.2.12
Data-quality framework
14 versions - Latest release: over 4 years ago - 1 dependent repositories - 53 downloads last month - 18 stars on GitHub - 2 maintainers
cuallee 0.15.4
Python library for data validation on DataFrame APIs including Snowflake/Snowpark, Apache/PySpark...
96 versions - Latest release: 3 months ago - 1 dependent package - 1 dependent repositories - 76.8 thousand downloads last month - 227 stars on GitHub - 2 maintainers
yololint 1.1.2
YOLO Dataset Debugger (yololint) is a tool for automatic validation and diagnostics of YOLO-forma...
11 versions - Latest release: 8 months ago - 17 downloads last month - 1 stars on GitHub - 1 maintainer
pycaroline 0.1.0
Data validation library for comparing tables across cloud data warehouses
1 version - Latest release: 6 days ago - 1 maintainer
panda-patrol 0.0.102
🐼 Patrol your data tests
40 versions - Latest release: about 2 years ago - 1 dependent repositories - 165 downloads last month - 22 stars on GitHub - 1 maintainer
dql-core 0.5.2
Framework-agnostic validation engine for Data Quality Language (DQL)
2 versions - Latest release: 3 months ago - 11 downloads last month - 1 maintainer
urarovite 1.3.4
A Google Sheets validation library
28 versions - Latest release: 5 months ago - 87 downloads last month - 1 maintainer
Top 2.2% on pypi.org
fiftyone-db 1.4.1
FiftyOne DB
32 versions - Latest release: 8 days ago - 1 dependent package - 36 dependent repositories - 98.8 thousand downloads last month - 9,898 stars on GitHub - 4 maintainers
sparktestify 0.1.0
PySpark Data Pipeline Testing Framework
1 version - Latest release: 9 months ago - 9 downloads last month - 0 stars on GitHub - 1 maintainer
hdf-dq-framework 1.0.2
HDF Data Quality Framework for PySpark DataFrames using Great Expectations
8 versions - Latest release: 16 days ago - 494 downloads last month - 1 maintainer
iflow-mcp_csv-editor 1.0.1
MCP server for comprehensive CSV file operations with pandas-based tools
1 version - Latest release: about 1 month ago - 18 downloads last month - 1 maintainer
csv-mcp-server 1.0.0
MCP server for comprehensive CSV file operations with pandas-based tools
1 version - Latest release: 5 months ago - 23 downloads last month - 6 stars on GitHub - 1 maintainer
metronome-pulse-core 0.1.0
Core interfaces and abstractions for DataPulse connectors - a high-performance, async-first data ...
1 version - Latest release: 4 months ago - 16 downloads last month - 1 maintainer
metronome-pulse-postgres 0.1.0
High-performance PostgreSQL connector for DataPulse - async-first, connection pooling, and enterp...
1 version - Latest release: 4 months ago - 5 downloads last month - 1 maintainer
cleanengine 0.1.2
The Ultimate Data Cleaning & Analysis Toolkit
2 versions - Latest release: 5 months ago - 12 downloads last month - 1 maintainer
qprofiler 0.3.0
profile tabular datasets, manage automatic validation for new datasets, automatic handling for qu...
6 versions - Latest release: over 2 years ago - 21 downloads last month - 0 stars on GitHub - 1 maintainer
engineer-your-data 0.1.3
MCP server for data engineering and business intelligence operations
4 versions - Latest release: 3 months ago - 40 downloads last month - 0 stars on GitHub - 1 maintainer
leila 0.2
Librería para medir la calidad de los datos en conjuntos de datos estructurados
2 versions - Latest release: about 4 years ago - 2 dependent repositories - 63 downloads last month - 61 stars on GitHub - 1 maintainer
raptor-rnaseq 2.1.2
RNA-seq Analysis Pipeline Testing and Optimization Resource with ML-powered recommendations and a...
3 versions - Latest release: 8 days ago - 199 downloads last month - 1 maintainer
datasure 0.6.0
IPA Data Management System Dashboard
43 versions - Latest release: 2 months ago - 218 downloads last month - 1 maintainer
pydhis2 0.2.1
Reproducible DHIS2 Python SDK for LMIC scenarios
6 versions - Latest release: 3 months ago - 15 downloads last month - 12 stars on GitHub - 1 maintainer
unidq 0.2.0
Unified transformer for multi-task tabular data quality
7 versions - Latest release: 8 days ago - 448 downloads last month - 1 maintainer
Top 2.5% on pypi.org
pandas-summary 0.2.0
An extension to pandas describe function.
7 versions - Latest release: about 4 years ago - 4 dependent packages - 137 dependent repositories - 110 thousand downloads last month - 520 stars on GitHub - 2 maintainers
lavendertown 0.7.1
A Streamlit-first Python package for detecting and visualizing data quality issues
8 versions - Latest release: 8 days ago - 1 maintainer
dingo-client 1.3.1
A Comprehensive Data Quality Evaluation Tool for Large Models
1 version - Latest release: 11 months ago - 6 downloads last month - 353 stars on GitHub - 1 maintainer
haiqv-profiling 0.0.1
Generate profile report for pandas DataFrame
1 version - Latest release: about 5 years ago - 1 dependent repositories - 1.21 thousand downloads last month - 12,108 stars on GitHub - 1 maintainer
pyetlx 0.1.1
Python bindings for the etlx Go library
3 versions - Latest release: 12 months ago - 27 downloads last month - 20 stars on GitHub - 1 maintainer
langquality 1.0.1
Language Quality Toolkit for Low-Resource Languages
2 versions - Latest release: about 1 month ago - 22 downloads last month - 1 stars on GitHub
feast-doris 0.1.2
Python SDK for Feast
3 versions - Latest release: over 1 year ago - 12 downloads last month - 6,493 stars on GitHub - 1 maintainer
streamdaq 0.5.0
Plug-and-play real-time quality monitoring for data streams!
16 versions - Latest release: about 1 month ago - 85 downloads last month - 15 stars on GitHub - 1 maintainer
marshmallow-pyspark 0.2.4
PySpark data serializer
6 versions - Latest release: about 2 years ago - 1 dependent repositories - 1.78 thousand downloads last month - 12 stars on GitHub - 1 maintainer
Top 3.3% on pypi.org
traceml 1.14.2
Engine for ML/Data tracking, visualization, dashboards, and model UI for Polyaxon.
96 versions - Latest release: almost 4 years ago - 3 dependent packages - 8 dependent repositories - 117 thousand downloads last month - 519 stars on GitHub - 2 maintainers
baselinr 0.9.0
Modern data profiling and drift detection framework
11 versions - Latest release: 10 days ago - 560 downloads last month - 1 maintainer
invisible-unicorn 0.4.0
Scalable Data Preprocessing Tool for Training Large Language Models
1 version - Latest release: over 1 year ago - 8 downloads last month - 1,111 stars on GitHub - 1 maintainer
validatelite 0.5.0
A flexible, extensible command-line tool for automated data quality validation
5 versions - Latest release: 4 months ago - 30 downloads last month - 3 stars on GitHub - 1 maintainer
cleanlab-cli 0.1.14
Command line interface for all things Cleanlab Studio
16 versions - Latest release: about 3 years ago - 49 downloads last month - 21 stars on GitHub - 3 maintainers
pydvl 0.10.0
The Python Data Valuation Library
15 versions - Latest release: 9 months ago - 282 downloads last month - 137 stars on GitHub - 2 maintainers
compars 0.0.0
DataFrame comparison done right (AKA the Bear-agnostic DataFrame comparison library)
1 version - Latest release: over 1 year ago - 10 downloads last month - 0 stars on GitHub - 1 maintainer
feathub-nightly 0.2.dev20231231
A stream-batch unified feature store for real-time machine learning
388 versions - Latest release: about 2 years ago - 2.4 thousand downloads last month - 340 stars on GitHub - 1 maintainer
dataexpectations 0.0.6
Is your data meeting all your expecations
1 version - Latest release: over 4 years ago - 1 dependent repositories - 21 downloads last month - 1 stars on GitHub - 1 maintainer
Top 8.8% on pypi.org
openmetadata-airflow-managed-apis 0.10.1
Airflow REST APIs to create and manage DAGS
31 versions - Latest release: over 3 years ago - 1 dependent repositories - 330 downloads last month - 3,365 stars on GitHub - 1 maintainer
koality 0.4.1
Library for data checks and data quality monitoring based on duckdb.
5 versions - Latest release: 14 days ago - 489 downloads last month - 1 maintainer
dingo-python 2.0.0
A Comprehensive AI Data Quality Evaluation Tool for Large Models
24 versions - Latest release: 13 days ago - 990 downloads last month - 597 stars on GitHub - 2 maintainers
thetis 0.2.4
Solution for AI system analysis regarding performance, uncertainty consistency (calibration), fai...
9 versions - Latest release: 4 months ago - 35 downloads last month - 5 stars on GitHub - 1 maintainer
thetiscore 0.2.4
Service to examine data processing pipelines (e.g., machine learning or deep learning pipelines) ...
10 versions - Latest release: 4 months ago - 1 dependent package - 110 downloads last month - 5 stars on GitHub - 1 maintainer
fiftyone-db-ubuntu2204 0.4.0
FiftyOne DB
1 version - Latest release: over 2 years ago - 2.94 thousand downloads last month - 10,038 stars on GitHub - 1 maintainer
sliq 0.1.1
Sliq - Data cleaning, made effortless. An AI-powered data cleaning library to clean your datasets...
2 versions - Latest release: 13 days ago - 180 downloads last month - 1 maintainer
Top 3.2% on pypi.org
lakefs-client 1.44.0
[legacy] lakeFS API
178 versions - Latest release: about 1 year ago - 4 dependent packages - 5 dependent repositories - 1.28 million downloads last month - 5,050 stars on GitHub - 1 maintainer
lakefs_sdk_async 1.53.0
lakeFS API
1 version - Latest release: 5 months ago - 139 downloads last month - 5,050 stars on GitHub - 1 maintainer
lakefs 0.14.1
lakeFS Python SDK Wrapper
30 versions - Latest release: about 1 month ago - 1 dependent package - 4.28 million downloads last month - 4,999 stars on GitHub - 1 maintainer
Top 6.3% on pypi.org
lakefs-sdk 1.74.4
lakeFS API
102 versions - Latest release: 14 days ago - 3 dependent packages - 1 dependent repositories - 5.58 million downloads last month - 4,999 stars on GitHub - 1 maintainer
data-automation-kit 2.0.7
A comprehensive Python package for automated data loading, cleaning, visualization, and quality c...
8 versions - Latest release: about 2 months ago - 191 downloads last month - 1 maintainer
Top 9.4% on pypi.org
openmetadata-managed-apis 1.11.4.0
Airflow REST APIs to create and manage DAGS
344 versions - Latest release: 14 days ago - 18 thousand downloads last month - 5,446 stars on GitHub - 1 maintainer
matilda-cli 0.1.0
MATILDA - TGD Rule Discovery Algorithm CLI
1 version - Latest release: about 2 months ago - 19 downloads last month - 0 stars on GitHub - 1 maintainer
yirifi-dq 1.0.1
Terminal-based CLI/TUI for managing MongoDB data quality operations
2 versions - Latest release: about 2 months ago - 16 downloads last month - 1 maintainer
Top 7.2% on pypi.org
fiftyone-desktop 0.34.1
FiftyOne Desktop
62 versions - Latest release: over 1 year ago - 1 dependent package - 1 dependent repositories - 526 downloads last month - 10,155 stars on GitHub - 4 maintainers
locaria-integrated-testing 1.2.2
A lightweight, automated testing system for data pipelines and tools
33 versions - Latest release: 15 days ago - 2.68 thousand downloads last month - 1 maintainer
fiftyone-db-rhel7 0.4.4
FiftyOne DB
4 versions - Latest release: 6 months ago - 1 dependent repositories - 72 downloads last month - 10,038 stars on GitHub - 1 maintainer
great-expectations-cta 0.15.43
Always know what to expect from your data.
2 versions - Latest release: about 3 years ago - 1 dependent package - 34 downloads last month - 9,420 stars on GitHub - 1 maintainer
fiftyone-db-ubuntu1604 0.3.0
FiftyOne DB
5 versions - Latest release: almost 5 years ago - 1 dependent repositories - 36 downloads last month - 9,898 stars on GitHub - 2 maintainers
adri 5.1.0
The missing data layer for AI agents - Auto-validates data quality with one decorator. Works with...
15 versions - Latest release: 3 months ago - 109 downloads last month - 1 stars on GitHub - 1 maintainer
openmetadata-sqlalchemy-bigquery 1.2.0
SQLAlchemy dialect for BigQuery by OpenMetadata
4 versions - Latest release: about 4 years ago - 1 dependent package - 1 dependent repositories - 14 downloads last month - 4,168 stars on GitHub - 1 maintainer
ml-drift-detection 0.1.1
Streamlit dashboard for monitoring data drift and model metrics.
2 versions - Latest release: 6 months ago - 11 downloads last month - 0 stars on GitHub - 1 maintainer
Related Keywords
data-science 71 python 52 machine-learning 50 data-engineering 47 data-profiling 46 data-validation 36 validation 32 data 29 data-cleaning 28 pandas 26 dataquality 26 data-quality-checks 26 mlops 22 snowflake 22 deep-learning 21 data-observability 20 etl 19 data-analysis 19 dbt 18 eda 16 exploratory-data-analysis 16 data-centric-ai 16 spark 16 data-curation 16 visualization 15 data-pipeline 15 computer-vision 14 sql 13 ai 13 active-learning 13 image-classification 12 artificial-intelligence 12 data-governance 12 data-testing 12 database 12 data-exploration 12 developer-tools 11 object-detection 11 llm 11 pyspark 11 testing 11 data-quality-monitoring 11 hacktoberfest 11 analytics 10 statistics 10 quality 10 data-unit-tests 10 data-visualization 10 data-lineage 10 dataengineering 10 outlier-detection 10 data-reliability 10 cli 10 vector-search 9 unstructured-data 9 postgresql 9 bigquery 9 data-version-control 8 redshift 8 feature-store 8 postgres 8 duckdb 8 metadata 8 dataops 8 polars 7 pipeline 7 data-diffing 7 ml 7 data-catalog 7 data-profilers 7 automation 7 dataunittest 7 monitoring 7 data-discovery 7 apache-spark 7 datavalidation 6 object-storage 6 dataframe 6 data-quality-assessment 6 mcp 6 mysql 6 data-monitoring 6 big-data 6 dbt-metrics 6 metadata-management 6 feature-engineering 6 datadiscovery 6 data-contracts 6 data-drift 6 csv 6 anomaly-detection 6 dataframes 5 data-collaboration 5 data quality 5 trino 5 pipeline-tests 5 data-versioning 5 exploratory-analysis 5 datacleaning 5 datacleaner 5