Ecosyste.ms: Packages
An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.
pypi.org "data-engineering" keyword
postql 1.0.3
Python wrapper for Postgres3 versions - Latest release: 3 months ago - 52 downloads last month - 0 stars on GitHub - 1 maintainer
pnu-adsv 1.0.1
Analyze delimiter-separated values files2 versions - Latest release: over 1 year ago - 1 dependent package - 14 downloads last month - 0 stars on GitHub - 1 maintainer
coralinede 1.0.1
python library for data engineering2 versions - Latest release: over 3 years ago - 1 dependent repositories - 28 downloads last month - 0 stars on GitHub - 1 maintainer
grizzlys 0.0.1
Python DataFrames powered by Julia2 versions - Latest release: about 1 month ago - 93 downloads last month - 0 stars on GitHub - 1 maintainer
mario-python 1.7.0
A configurable data pipeline library.9 versions - Latest release: almost 4 years ago - 1 dependent repositories - 61 downloads last month - 0 stars on GitHub - 1 maintainer
compars 0.0.0
DataFrame comparison done right (AKA the Bear-agnostic DataFrame comparison library)1 version - Latest release: 29 days ago - 156 downloads last month - 0 stars on GitHub - 1 maintainer
parallel-simulations 0.0.1
Helper class to orchestrate in parallel Monte Carlo simulations for an arbitrary number of models...1 version - Latest release: 2 months ago - 10 downloads last month - 0 stars on GitHub - 1 maintainer
resilient-exporters 0.1.6
A package to export data to databases resiliently.7 versions - Latest release: almost 3 years ago - 1 dependent repositories - 5 downloads last month - 0 stars on GitHub - 1 maintainer
methodflow 0.0.1a1
A lightweight library for building pipelines from methods1 version - Latest release: about 1 year ago - 10 downloads last month - 0 stars on GitHub - 1 maintainer
siphon-data 0.6.1
A data engineering utility library for siphoning data around3 versions - Latest release: about 2 years ago - 1 dependent repositories - 23 downloads last month - 0 stars on GitHub - 1 maintainer
twinnterface 0.0.1
Machine learning model contracts with machine learning infrastructure1 version - Latest release: over 1 year ago - 13 downloads last month - 0 stars on GitHub - 1 maintainer
pandas-liteql 0.5.1
A simple pandas extension that enables users to execute SQL statements against DataFrames using2 versions - Latest release: 7 months ago - 23 downloads last month - 0 stars on GitHub - 1 maintainer
sdrdm_database 0.4.1
An sdRDM-based framework to utilize databases5 versions - Latest release: 7 months ago - 165 downloads last month - 0 stars on GitHub - 1 maintainer
entropic 0.3.0
Entropic, the simple data pipeline framework for scientists.5 versions - Latest release: 5 months ago - 5 downloads last month - 0 stars on GitHub - 1 maintainer
pyduct 0.0.1
A framework for building and running simple data engineering pipelines in Python.1 version - Latest release: about 2 years ago - 1 dependent repositories - 12 downloads last month - 0 stars on GitHub - 1 maintainer
aws-json-dataset 0.1.0
Send JSON datasets to various AWS services.1 version - Latest release: 4 months ago - 17 downloads last month - 0 stars on GitHub - 1 maintainer
lognub 0.2.1
Dumb Log Utlity for personal use6 versions - Latest release: 9 months ago - 1 dependent package - 1 dependent repositories - 65 downloads last month - 0 stars on GitHub - 1 maintainer
datasurface 0.0.16
Automate the governance, management and movement of data within your enterprise11 versions - Latest release: 10 days ago - 155 downloads last month - 0 stars on GitHub - 1 maintainer
db-analytics-tools 0.1.4
Databases Tools for Data Analytics19 versions - Latest release: 9 months ago - 220 downloads last month - 0 stars on GitHub - 1 maintainer
metastore 1.0.0.dev21
Metastore Python SDK. Feature store and data catalog for machine learning.21 versions - Latest release: over 2 years ago - 1 dependent repositories - 119 downloads last month - 0 stars on GitHub - 1 maintainer
makeflatt 1.0.4
Simple library to make your dictionary flatten5 versions - Latest release: over 1 year ago - 50 downloads last month - 0 stars on GitHub - 1 maintainer
resp 0.1.2
Make the Redis Mass Insertion by using the REdis Serialization Protocol (RESP) simple.1 version - Latest release: over 6 years ago - 4 dependent repositories - 220 downloads last month - 0 stars on GitHub - 1 maintainer
ddeutil 0.3.4
Data Developer & Engineer Core Utility Objects11 versions - Latest release: 7 days ago - 2 dependent packages - 1 dependent repositories - 574 downloads last month - 0 stars on GitHub - 1 maintainer
samplr 1.0.0
A simple decorator for returning a small sample of items from a list.1 version - Latest release: about 1 year ago - 13 downloads last month - 0 stars on GitHub - 1 maintainer
totype 0.1.0
Data converter1 version - Latest release: over 2 years ago - 1 dependent repositories - 15 downloads last month - 0 stars on GitHub - 1 maintainer
ddeutil-io 0.1.4
Data Developer & Engineer IO Utility Objects8 versions - Latest release: 4 days ago - 346 downloads last month - 0 stars on GitHub - 1 maintainer
dataframemodel 0.1.0 removed
A wrapper of DataFrames to encapsulate and get explicity for your dataframes1 version - Latest release: over 1 year ago - 0 stars on GitHub
files2db 0.1.1
Migration from local files to database made simple!2 versions - Latest release: 3 months ago - 16 downloads last month - 0 stars on GitHub - 1 maintainer
aws-parameters 0.1.8
Streamlined, efficient access to configuration values in AWS SSM Parameter Store and SecretsManager.2 versions - Latest release: 10 months ago - 30 downloads last month - 0 stars on GitHub - 1 maintainer
cacheml 1.0.4
Cache ML -- layer on top of joblib to cache parsed datasets, dramatically reducing load time of l...1 version - Latest release: over 2 years ago - 8 downloads last month - 1 stars on GitHub - 1 maintainer
nifty-nesting 0.2.3
Python utilities for arbitrarily nested data structures.6 versions - Latest release: over 5 years ago - 1 dependent repositories - 72 downloads last month - 1 stars on GitHub - 1 maintainer
data-expectations 1.7.0
Are your data meeting all your expecations10 versions - Latest release: 8 months ago - 1 dependent package - 1 dependent repositories - 13.5 thousand downloads last month - 1 stars on GitHub - 1 maintainer
glue-utils 0.4.0
Reusable utilities for working with Glue PySpark jobs20 versions - Latest release: 4 days ago - 2.57 thousand downloads last month - 1 stars on GitHub - 1 maintainer
data-hopper 0.1.0
Package for data wrangling in python.1 version - Latest release: about 2 years ago - 1 dependent repositories - 8 downloads last month - 1 stars on GitHub - 1 maintainer
dataexpectations 0.0.6
Is your data meeting all your expecations1 version - Latest release: almost 3 years ago - 1 dependent repositories - 10 downloads last month - 1 stars on GitHub - 1 maintainer
py-dagger-contrib 0.4.0
Extensions for the Dagger library (py-dagger in PyPI).4 versions - Latest release: over 2 years ago - 1 dependent repositories - 50 downloads last month - 1 stars on GitHub - 1 maintainer
cargo-crates 0.0.1
An easy way to build data extractors in Docker.1 version - Latest release: over 2 years ago - 1 dependent repositories - 29 downloads last month - 1 stars on GitHub - 1 maintainer
warpflow 0.1.0
Ready for production feature store framework dedicated to open source.1 version - Latest release: 7 months ago - 12 downloads last month - 1 stars on GitHub - 1 maintainer
diver 0.2.3
diver is a series of tools to speed up common feature-set investigation, conditioning and encodin...20 versions - Latest release: about 4 years ago - 1 dependent repositories - 136 downloads last month - 1 stars on GitHub - 1 maintainer
data-science-kit 0.0.1
Data Science Basic Functions1 version - Latest release: almost 3 years ago - 1 dependent repositories - 16 downloads last month - 1 stars on GitHub - 1 maintainer
alphalib 0.0.3
A library for your daily data engineering and data science routines.3 versions - Latest release: over 3 years ago - 1 dependent repositories - 20 downloads last month - 1 stars on GitHub - 1 maintainer
dup-fmt 0.1.3 removed
The utility formatter objects for the data engine package12 versions - Latest release: 9 months ago - 537 downloads last month - 1 stars on GitHub - 1 maintainer
journalpdfscraper 0.2.1
A project to check if articles are free or paid3 versions - Latest release: about 3 years ago - 1 dependent repositories - 35 downloads last month - 1 stars on GitHub - 1 maintainer
zapr-athena-client 0.1
It is a python library to run the presto query on the AWS Athena.1 version - Latest release: about 3 years ago - 1 dependent repositories - 5 downloads last month - 1 stars on GitHub - 1 maintainer
sbss 0.0.1
Similarity-Based Stratified Splitting Algorithm1 version - Latest release: 5 months ago - 21 downloads last month - 1 stars on GitHub - 1 maintainer
scistag 0.9.0
A stack of helpful libraries & applications for the rapid development of data driven solutions.8 versions - Latest release: 4 months ago - 1 dependent package - 2 dependent repositories - 75 downloads last month - 2 stars on GitHub - 1 maintainer
convector 0.1.1
A tool for transforming conversational data to a unified format5 versions - Latest release: 6 months ago - 47 downloads last month - 2 stars on GitHub - 1 maintainer
pyfivetran 0.1.3
Pythonic interface to the fivetran API7 versions - Latest release: 3 months ago - 836 downloads last month - 2 stars on GitHub - 1 maintainer
spinecore 0.0.20
The core lib of spine library10 versions - Latest release: 5 months ago - 2 dependent packages - 100 downloads last month - 2 stars on GitHub - 1 maintainer
pyxmlparser 0.1.2
CLI interface to convert XML into various formats4 versions - Latest release: about 5 years ago - 1 dependent repositories - 48 downloads last month - 2 stars on GitHub - 1 maintainer
opendatablend 1.4.2
The fastest way to get data from the Open Data Blend Dataset API18 versions - Latest release: 5 months ago - 1 dependent repositories - 138 downloads last month - 2 stars on GitHub - 1 maintainer
pyspine 0.0.14
Spine: The backbone of your project3 versions - Latest release: about 1 year ago - 49 downloads last month - 2 stars on GitHub - 1 maintainer
spinelibs 0.0.17
Libs for spine project7 versions - Latest release: 5 months ago - 1 dependent package - 86 downloads last month - 2 stars on GitHub - 1 maintainer
dpyp 1.0.0
A pandas convenience wrapper for small-scale data pipelines1 version - Latest release: 27 days ago - 202 downloads last month - 2 stars on GitHub - 1 maintainer
plai 0.0.0
Programming language to create data manipulation pipelines.2 versions - Latest release: over 4 years ago - 1 dependent repositories - 32 downloads last month - 2 stars on GitHub - 1 maintainer
progress-updater 0.1.9
Progress Updater8 versions - Latest release: over 1 year ago - 68 downloads last month - 2 stars on GitHub - 1 maintainer
aws-parquet 0.5.0
An object-oriented interface for defining parquet datasets for AWS built on top of awswrangler an...5 versions - Latest release: 11 months ago - 28 downloads last month - 3 stars on GitHub - 1 maintainer
datazimmer 0.5.3
sscu-budapest utilities for scientific data engineering38 versions - Latest release: 11 months ago - 4 dependent repositories - 1.92 thousand downloads last month - 3 stars on GitHub - 1 maintainer
kioss 0.9.1 removed
Keep I/O Simple and Stupid: Ease the development of ETL/EL/ReverseETL scripts.61 versions - Latest release: 5 months ago - 2.13 thousand downloads last month - 3 stars on GitHub - 1 maintainer
alyeska 0.3.0a1
Alyeska /al-ee-EHS-kah/ n. A Data Pipeline Toolkit3 versions - Latest release: over 4 years ago - 1 dependent repositories - 46 downloads last month - 3 stars on GitHub - 1 maintainer
tiny-blocks 0.1.15
Tiny Block Operations for Data Pipelines14 versions - Latest release: over 1 year ago - 113 downloads last month - 3 stars on GitHub - 1 maintainer
livyc 0.0.14 💰
Apache Livy Client11 versions - Latest release: almost 2 years ago - 120 downloads last month - 3 stars on GitHub - 1 maintainer
tuberia 0.0.1
Tuberia... when data engineering meets software engineering2 versions - Latest release: over 1 year ago - 1 dependent repositories - 8 downloads last month - 3 stars on GitHub - 1 maintainer
pandasecharts 0.4.2
Visualize your pandas data with one-line code9 versions - Latest release: over 2 years ago - 1 dependent repositories - 93 downloads last month - 4 stars on GitHub - 1 maintainer
unblind 0.0.6
Unblind is a Python package to create data visualizations from data of the Plataforma Digital Nac...6 versions - Latest release: over 1 year ago - 59 downloads last month - 4 stars on GitHub - 2 maintainers
dtflw 0.6.7
dtflw is a Python framework for building modular data pipelines based on Databricks dbutils.noteb...7 versions - Latest release: 7 months ago - 1.97 thousand downloads last month - 4 stars on GitHub - 2 maintainers
streamsql 2.0.1
Python SDK for the StreamSQL feature store14 versions - Latest release: almost 4 years ago - 1 dependent repositories - 48 downloads last month - 4 stars on GitHub - 1 maintainer
pandas-ext 0.5.1
Python Pandas extensions for pandas dataframes22 versions - Latest release: about 5 years ago - 1 dependent repositories - 57 downloads last month - 4 stars on GitHub - 2 maintainers
iam-builder 4.3.0
A lil python package to generate iam policies18 versions - Latest release: about 1 month ago - 1 dependent repositories - 6.8 thousand downloads last month - 4 stars on GitHub - 6 maintainers
csv-shuffler 0.0.4 💰
A tool to automatically Shuffle lines in a csv file4 versions - Latest release: almost 2 years ago - 56 downloads last month - 4 stars on GitHub - 1 maintainer
data_check 0.19.0
simple data validation22 versions - Latest release: 2 months ago - 195 downloads last month - 4 stars on GitHub - 1 maintainer
llmt 0.0.5
LLMT aims to make it easy to programatically connect OpenAI and HuggingFace models to your data p...5 versions - Latest release: 29 days ago - 413 downloads last month - 4 stars on GitHub - 1 maintainer
xml2db 0.9.4
Import complex XML files to a relational database3 versions - Latest release: 20 days ago - 187 downloads last month - 4 stars on GitHub - 1 maintainer
snowflake-dbml-generator 0.1.2
Automatically generate DBML files from Snowflake databases.3 versions - Latest release: 14 days ago - 362 downloads last month - 5 stars on GitHub - 1 maintainer
route1io-connectors 0.16.0
Connectors for interacting with popular API's used in marketing analytics using clean and concise...30 versions - Latest release: 8 months ago - 1 dependent repositories - 101 downloads last month - 5 stars on GitHub - 1 maintainer
adcpipeline 0.2.1
A pipeline for a structured way of working5 versions - Latest release: about 1 year ago - 1 dependent repositories - 320 downloads last month - 5 stars on GitHub - 2 maintainers
streamable 0.9.0
fluent iteration23 versions - Latest release: 2 months ago - 229 downloads last month - 5 stars on GitHub - 1 maintainer
aiscalator 0.1.18
AIscalate your Jupyter Notebook Prototypes into Airflow Data Products22 versions - Latest release: almost 4 years ago - 277 downloads last month - 5 stars on GitHub - 1 maintainer
Top 5.5% on pypi.org
28 versions - Latest release: 6 months ago - 3 dependent packages - 4 dependent repositories - 4.45 thousand downloads last month - 6 stars on GitHub - 2 maintainers
mojap-metadata 1.14.2
A python package to manage metadata28 versions - Latest release: 6 months ago - 3 dependent packages - 4 dependent repositories - 4.45 thousand downloads last month - 6 stars on GitHub - 2 maintainers
Top 5.0% on pypi.org
39 versions - Latest release: 6 days ago - 2 dependent packages - 4 dependent repositories - 6.9 thousand downloads last month - 6 stars on GitHub - 6 maintainers
pydbtools 5.5.18
A python package to query data via amazon athena and bring it into a pandas df using aws-wrangler.39 versions - Latest release: 6 days ago - 2 dependent packages - 4 dependent repositories - 6.9 thousand downloads last month - 6 stars on GitHub - 6 maintainers
codepack 0.8.0
CodePack is the package to easily make, run, and manage workflows19 versions - Latest release: almost 2 years ago - 1 dependent repositories - 220 downloads last month - 6 stars on GitHub - 1 maintainer
dataride 0.2.3
Lightning-fast data platform setup for small/medium projects & PoCs4 versions - Latest release: over 1 year ago - 38 downloads last month - 6 stars on GitHub - 1 maintainer
data-linter 6.2.5
data linter30 versions - Latest release: about 2 months ago - 1 dependent repositories - 355 downloads last month - 6 stars on GitHub - 5 maintainers
lakehouse-engine 1.19.0
A Spark framework serving as the engine for several lakehouse algorithms and data flows.8 versions - Latest release: 2 months ago - 289 downloads last month - 6 stars on GitHub - 1 maintainer
arrow_pd_parser 2.0.0
MoJ arrow-pd-parser24 versions - Latest release: 6 months ago - 1 dependent repositories - 4.37 thousand downloads last month - 7 stars on GitHub - 2 maintainers
spooq 3.4.0
Spooq is a PySpark based helper library for ETL data ingestion pipeline in Data Lakes.11 versions - Latest release: 2 months ago - 1 dependent repositories - 19.3 thousand downloads last month - 8 stars on GitHub - 1 maintainer
flowrunner 0.2.3
Flowrunner is a lightweight package to organize and represent Data Engineering/Science workflows5 versions - Latest release: 12 months ago - 45 downloads last month - 8 stars on GitHub - 1 maintainer
prefecto 1.0.0
Tools for supporting Prefect development.7 versions - Latest release: about 1 month ago - 1 dependent repositories - 1.09 thousand downloads last month - 8 stars on GitHub - 1 maintainer
eos-etl 1.0.0
Tools for exporting EOS blockchain data to JSON1 version - Latest release: almost 5 years ago - 1 dependent repositories - 7 downloads last month - 9 stars on GitHub - 1 maintainer
prefect-planetary-computer 0.1.1
Prefect integrations with Microsoft Planetary Computer2 versions - Latest release: 7 months ago - 24 downloads last month - 10 stars on GitHub - 1 maintainer
facebook-page-info-scraper 1.1.2
A Python package capable of crawling Facebook page information.9 versions - Latest release: 4 months ago - 98 downloads last month - 10 stars on GitHub - 1 maintainer
gcp-airflow-foundations 0.3.7
Opinionated framework based on Airflow 2.0 for building pipelines to ingest data into a BigQuery ...19 versions - Latest release: over 1 year ago - 1 dependent repositories - 41 downloads last month - 11 stars on GitHub - 1 maintainer
risk-command-center 1.0.37
Risk Command Center, manage your risk easly.2 versions - Latest release: about 2 years ago - 1 dependent repositories - 10 downloads last month - 11 stars on GitHub - 1 maintainer
gcp-airflow-foundations-dev 0.9.7
Opinionated framework based on Airflow 2.0 for building pipelines to ingest data into a BigQuery ...148 versions - Latest release: almost 2 years ago - 1 dependent repositories - 452 downloads last month - 11 stars on GitHub - 1 maintainer
dpq 0.1.5
dpq is an open-source python library that makes prompt-based data processing and feature engineer...6 versions - Latest release: about 1 month ago - 102 downloads last month - 11 stars on GitHub - 1 maintainer
analytics-command-center 3.0.14
Command Center for Data Ingestion, Advanced Analytics and Artificial Intelligence process1 version - Latest release: over 2 years ago - 26 downloads last month - 11 stars on GitHub - 1 maintainer
gcp-airflow-foundations-dev-jiny 0.2.9
Opinionated framework based on Airflow 2.0 for building pipelines to ingest data into a BigQuery ...1 version - Latest release: over 2 years ago - 1 dependent repositories - 10 downloads last month - 11 stars on GitHub - 1 maintainer
prefect-alert 0.1.3 💰
Decorator to send alert when a Prefect task or flow fails2 versions - Latest release: over 1 year ago - 1 dependent repositories - 1.57 thousand downloads last month - 12 stars on GitHub - 1 maintainer
marshmallow-pyspark 0.2.4
PySpark data serializer6 versions - Latest release: 5 months ago - 1 dependent repositories - 2.14 thousand downloads last month - 12 stars on GitHub - 1 maintainer
dask-saturn 0.4.3
Dask Cluster objects in Saturn Cloud19 versions - Latest release: over 1 year ago - 1 dependent repositories - 2.25 thousand downloads last month - 12 stars on GitHub - 2 maintainers
Related Keywords
python
641
etl
530
data-integration
443
elt
425
data
416
pipeline
375
snowflake
341
data-analysis
338
data-science
336
bigquery
295
data-pipeline
289
mysql
289
redshift
288
postgresql
287
s3
284
data-collection
284
change-data-capture
283
java
283
mssql
281
self-hosted
278
data-pipelines
208
workflow
189
mlops
184
orchestration
181
scheduler
151
data-orchestrator
150
analytics
150
machine-learning
130
workflow-engine
114
apache
114
automation
110
airflow
102
dag
96
apache-airflow
90
workflow-orchestration
85
sql
71
integration
68
metadata
67
workflow-automation
65
database
65
airflow-provider
64
dagster
64
data-warehouse
55
trino
55
dataops
53
data-engineering-pipeline
51
data-structures
50
warehouse
49
data-lineage
49
data-engineer
49
data-analytics
40
data-visualization
35
pipelines
29
data-quality
29
data-viz
29
flask
29
business-intelligence
29
react
29
apache-superset
29
sql-editor
29
asf
29
bi
29
business-analytics
29
superset
29
data-ops
25
pandas
25
prefect
25
observability
24
infrastructure
23
ml-ops
20
spark
19
dataframe
17
dbt
15
etl-pipeline
14
aws
14
hacktoberfest
14
postgres
13
dataquality
13
big-data
13
feature-engineering
13
framework
12
kubernetes
12
pyspark
11
python3
10
airbyte
10
feature-store
10
data-unit-tests
9
data-lake
9
etl-framework
9
data engineering
8
data-profiling
8
cdk
8
connector-development-kit
8
llmops
8
batch-processing
8
data-versioning
8
ml
7
quality
7
cli
7
parquet
7