Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "data-engineering" keyword

postql 1.0.3
Python wrapper for Postgres
3 versions - Latest release: 3 months ago - 52 downloads last month - 0 stars on GitHub - 1 maintainer
pnu-adsv 1.0.1
Analyze delimiter-separated values files
2 versions - Latest release: over 1 year ago - 1 dependent package - 14 downloads last month - 0 stars on GitHub - 1 maintainer
coralinede 1.0.1
python library for data engineering
2 versions - Latest release: over 3 years ago - 1 dependent repositories - 28 downloads last month - 0 stars on GitHub - 1 maintainer
grizzlys 0.0.1
Python DataFrames powered by Julia
2 versions - Latest release: about 1 month ago - 93 downloads last month - 0 stars on GitHub - 1 maintainer
mario-python 1.7.0
A configurable data pipeline library.
9 versions - Latest release: almost 4 years ago - 1 dependent repositories - 61 downloads last month - 0 stars on GitHub - 1 maintainer
compars 0.0.0
DataFrame comparison done right (AKA the Bear-agnostic DataFrame comparison library)
1 version - Latest release: 29 days ago - 156 downloads last month - 0 stars on GitHub - 1 maintainer
parallel-simulations 0.0.1
Helper class to orchestrate in parallel Monte Carlo simulations for an arbitrary number of models...
1 version - Latest release: 2 months ago - 10 downloads last month - 0 stars on GitHub - 1 maintainer
resilient-exporters 0.1.6
A package to export data to databases resiliently.
7 versions - Latest release: almost 3 years ago - 1 dependent repositories - 5 downloads last month - 0 stars on GitHub - 1 maintainer
methodflow 0.0.1a1
A lightweight library for building pipelines from methods
1 version - Latest release: about 1 year ago - 10 downloads last month - 0 stars on GitHub - 1 maintainer
siphon-data 0.6.1
A data engineering utility library for siphoning data around
3 versions - Latest release: about 2 years ago - 1 dependent repositories - 23 downloads last month - 0 stars on GitHub - 1 maintainer
twinnterface 0.0.1
Machine learning model contracts with machine learning infrastructure
1 version - Latest release: over 1 year ago - 13 downloads last month - 0 stars on GitHub - 1 maintainer
pandas-liteql 0.5.1
A simple pandas extension that enables users to execute SQL statements against DataFrames using
2 versions - Latest release: 7 months ago - 23 downloads last month - 0 stars on GitHub - 1 maintainer
sdrdm_database 0.4.1
An sdRDM-based framework to utilize databases
5 versions - Latest release: 7 months ago - 165 downloads last month - 0 stars on GitHub - 1 maintainer
entropic 0.3.0
Entropic, the simple data pipeline framework for scientists.
5 versions - Latest release: 5 months ago - 5 downloads last month - 0 stars on GitHub - 1 maintainer
pyduct 0.0.1
A framework for building and running simple data engineering pipelines in Python.
1 version - Latest release: about 2 years ago - 1 dependent repositories - 12 downloads last month - 0 stars on GitHub - 1 maintainer
aws-json-dataset 0.1.0
Send JSON datasets to various AWS services.
1 version - Latest release: 4 months ago - 17 downloads last month - 0 stars on GitHub - 1 maintainer
lognub 0.2.1
Dumb Log Utlity for personal use
6 versions - Latest release: 9 months ago - 1 dependent package - 1 dependent repositories - 65 downloads last month - 0 stars on GitHub - 1 maintainer
datasurface 0.0.16
Automate the governance, management and movement of data within your enterprise
11 versions - Latest release: 10 days ago - 155 downloads last month - 0 stars on GitHub - 1 maintainer
db-analytics-tools 0.1.4
Databases Tools for Data Analytics
19 versions - Latest release: 9 months ago - 220 downloads last month - 0 stars on GitHub - 1 maintainer
metastore 1.0.0.dev21
Metastore Python SDK. Feature store and data catalog for machine learning.
21 versions - Latest release: over 2 years ago - 1 dependent repositories - 119 downloads last month - 0 stars on GitHub - 1 maintainer
makeflatt 1.0.4
Simple library to make your dictionary flatten
5 versions - Latest release: over 1 year ago - 50 downloads last month - 0 stars on GitHub - 1 maintainer
resp 0.1.2
Make the Redis Mass Insertion by using the REdis Serialization Protocol (RESP) simple.
1 version - Latest release: over 6 years ago - 4 dependent repositories - 220 downloads last month - 0 stars on GitHub - 1 maintainer
ddeutil 0.3.4
Data Developer & Engineer Core Utility Objects
11 versions - Latest release: 7 days ago - 2 dependent packages - 1 dependent repositories - 574 downloads last month - 0 stars on GitHub - 1 maintainer
samplr 1.0.0
A simple decorator for returning a small sample of items from a list.
1 version - Latest release: about 1 year ago - 13 downloads last month - 0 stars on GitHub - 1 maintainer
totype 0.1.0
Data converter
1 version - Latest release: over 2 years ago - 1 dependent repositories - 15 downloads last month - 0 stars on GitHub - 1 maintainer
ddeutil-io 0.1.4
Data Developer & Engineer IO Utility Objects
8 versions - Latest release: 4 days ago - 346 downloads last month - 0 stars on GitHub - 1 maintainer
dataframemodel 0.1.0 removed
A wrapper of DataFrames to encapsulate and get explicity for your dataframes
1 version - Latest release: over 1 year ago - 0 stars on GitHub
files2db 0.1.1
Migration from local files to database made simple!
2 versions - Latest release: 3 months ago - 16 downloads last month - 0 stars on GitHub - 1 maintainer
aws-parameters 0.1.8
Streamlined, efficient access to configuration values in AWS SSM Parameter Store and SecretsManager.
2 versions - Latest release: 10 months ago - 30 downloads last month - 0 stars on GitHub - 1 maintainer
cacheml 1.0.4
Cache ML -- layer on top of joblib to cache parsed datasets, dramatically reducing load time of l...
1 version - Latest release: over 2 years ago - 8 downloads last month - 1 stars on GitHub - 1 maintainer
nifty-nesting 0.2.3
Python utilities for arbitrarily nested data structures.
6 versions - Latest release: over 5 years ago - 1 dependent repositories - 72 downloads last month - 1 stars on GitHub - 1 maintainer
data-expectations 1.7.0
Are your data meeting all your expecations
10 versions - Latest release: 8 months ago - 1 dependent package - 1 dependent repositories - 13.5 thousand downloads last month - 1 stars on GitHub - 1 maintainer
glue-utils 0.4.0
Reusable utilities for working with Glue PySpark jobs
20 versions - Latest release: 4 days ago - 2.57 thousand downloads last month - 1 stars on GitHub - 1 maintainer
data-hopper 0.1.0
Package for data wrangling in python.
1 version - Latest release: about 2 years ago - 1 dependent repositories - 8 downloads last month - 1 stars on GitHub - 1 maintainer
dataexpectations 0.0.6
Is your data meeting all your expecations
1 version - Latest release: almost 3 years ago - 1 dependent repositories - 10 downloads last month - 1 stars on GitHub - 1 maintainer
py-dagger-contrib 0.4.0
Extensions for the Dagger library (py-dagger in PyPI).
4 versions - Latest release: over 2 years ago - 1 dependent repositories - 50 downloads last month - 1 stars on GitHub - 1 maintainer
cargo-crates 0.0.1
An easy way to build data extractors in Docker.
1 version - Latest release: over 2 years ago - 1 dependent repositories - 29 downloads last month - 1 stars on GitHub - 1 maintainer
warpflow 0.1.0
Ready for production feature store framework dedicated to open source.
1 version - Latest release: 7 months ago - 12 downloads last month - 1 stars on GitHub - 1 maintainer
diver 0.2.3
diver is a series of tools to speed up common feature-set investigation, conditioning and encodin...
20 versions - Latest release: about 4 years ago - 1 dependent repositories - 136 downloads last month - 1 stars on GitHub - 1 maintainer
data-science-kit 0.0.1
Data Science Basic Functions
1 version - Latest release: almost 3 years ago - 1 dependent repositories - 16 downloads last month - 1 stars on GitHub - 1 maintainer
alphalib 0.0.3
A library for your daily data engineering and data science routines.
3 versions - Latest release: over 3 years ago - 1 dependent repositories - 20 downloads last month - 1 stars on GitHub - 1 maintainer
dup-fmt 0.1.3 removed
The utility formatter objects for the data engine package
12 versions - Latest release: 9 months ago - 537 downloads last month - 1 stars on GitHub - 1 maintainer
journalpdfscraper 0.2.1
A project to check if articles are free or paid
3 versions - Latest release: about 3 years ago - 1 dependent repositories - 35 downloads last month - 1 stars on GitHub - 1 maintainer
zapr-athena-client 0.1
It is a python library to run the presto query on the AWS Athena.
1 version - Latest release: about 3 years ago - 1 dependent repositories - 5 downloads last month - 1 stars on GitHub - 1 maintainer
sbss 0.0.1
Similarity-Based Stratified Splitting Algorithm
1 version - Latest release: 5 months ago - 21 downloads last month - 1 stars on GitHub - 1 maintainer
scistag 0.9.0
A stack of helpful libraries & applications for the rapid development of data driven solutions.
8 versions - Latest release: 4 months ago - 1 dependent package - 2 dependent repositories - 75 downloads last month - 2 stars on GitHub - 1 maintainer
convector 0.1.1
A tool for transforming conversational data to a unified format
5 versions - Latest release: 6 months ago - 47 downloads last month - 2 stars on GitHub - 1 maintainer
pyfivetran 0.1.3
Pythonic interface to the fivetran API
7 versions - Latest release: 3 months ago - 836 downloads last month - 2 stars on GitHub - 1 maintainer
spinecore 0.0.20
The core lib of spine library
10 versions - Latest release: 5 months ago - 2 dependent packages - 100 downloads last month - 2 stars on GitHub - 1 maintainer
pyxmlparser 0.1.2
CLI interface to convert XML into various formats
4 versions - Latest release: about 5 years ago - 1 dependent repositories - 48 downloads last month - 2 stars on GitHub - 1 maintainer
opendatablend 1.4.2
The fastest way to get data from the Open Data Blend Dataset API
18 versions - Latest release: 5 months ago - 1 dependent repositories - 138 downloads last month - 2 stars on GitHub - 1 maintainer
pyspine 0.0.14
Spine: The backbone of your project
3 versions - Latest release: about 1 year ago - 49 downloads last month - 2 stars on GitHub - 1 maintainer
spinelibs 0.0.17
Libs for spine project
7 versions - Latest release: 5 months ago - 1 dependent package - 86 downloads last month - 2 stars on GitHub - 1 maintainer
dpyp 1.0.0
A pandas convenience wrapper for small-scale data pipelines
1 version - Latest release: 27 days ago - 202 downloads last month - 2 stars on GitHub - 1 maintainer
plai 0.0.0
Programming language to create data manipulation pipelines.
2 versions - Latest release: over 4 years ago - 1 dependent repositories - 32 downloads last month - 2 stars on GitHub - 1 maintainer
progress-updater 0.1.9
Progress Updater
8 versions - Latest release: over 1 year ago - 68 downloads last month - 2 stars on GitHub - 1 maintainer
aws-parquet 0.5.0
An object-oriented interface for defining parquet datasets for AWS built on top of awswrangler an...
5 versions - Latest release: 11 months ago - 28 downloads last month - 3 stars on GitHub - 1 maintainer
datazimmer 0.5.3
sscu-budapest utilities for scientific data engineering
38 versions - Latest release: 11 months ago - 4 dependent repositories - 1.92 thousand downloads last month - 3 stars on GitHub - 1 maintainer
kioss 0.9.1 removed
Keep I/O Simple and Stupid: Ease the development of ETL/EL/ReverseETL scripts.
61 versions - Latest release: 5 months ago - 2.13 thousand downloads last month - 3 stars on GitHub - 1 maintainer
alyeska 0.3.0a1
Alyeska /al-ee-EHS-kah/ n. A Data Pipeline Toolkit
3 versions - Latest release: over 4 years ago - 1 dependent repositories - 46 downloads last month - 3 stars on GitHub - 1 maintainer
tiny-blocks 0.1.15
Tiny Block Operations for Data Pipelines
14 versions - Latest release: over 1 year ago - 113 downloads last month - 3 stars on GitHub - 1 maintainer
livyc 0.0.14 💰
Apache Livy Client
11 versions - Latest release: almost 2 years ago - 120 downloads last month - 3 stars on GitHub - 1 maintainer
tuberia 0.0.1
Tuberia... when data engineering meets software engineering
2 versions - Latest release: over 1 year ago - 1 dependent repositories - 8 downloads last month - 3 stars on GitHub - 1 maintainer
pandasecharts 0.4.2
Visualize your pandas data with one-line code
9 versions - Latest release: over 2 years ago - 1 dependent repositories - 93 downloads last month - 4 stars on GitHub - 1 maintainer
unblind 0.0.6
Unblind is a Python package to create data visualizations from data of the Plataforma Digital Nac...
6 versions - Latest release: over 1 year ago - 59 downloads last month - 4 stars on GitHub - 2 maintainers
dtflw 0.6.7
dtflw is a Python framework for building modular data pipelines based on Databricks dbutils.noteb...
7 versions - Latest release: 7 months ago - 1.97 thousand downloads last month - 4 stars on GitHub - 2 maintainers
streamsql 2.0.1
Python SDK for the StreamSQL feature store
14 versions - Latest release: almost 4 years ago - 1 dependent repositories - 48 downloads last month - 4 stars on GitHub - 1 maintainer
pandas-ext 0.5.1
Python Pandas extensions for pandas dataframes
22 versions - Latest release: about 5 years ago - 1 dependent repositories - 57 downloads last month - 4 stars on GitHub - 2 maintainers
iam-builder 4.3.0
A lil python package to generate iam policies
18 versions - Latest release: about 1 month ago - 1 dependent repositories - 6.8 thousand downloads last month - 4 stars on GitHub - 6 maintainers
csv-shuffler 0.0.4 💰
A tool to automatically Shuffle lines in a csv file
4 versions - Latest release: almost 2 years ago - 56 downloads last month - 4 stars on GitHub - 1 maintainer
data_check 0.19.0
simple data validation
22 versions - Latest release: 2 months ago - 195 downloads last month - 4 stars on GitHub - 1 maintainer
llmt 0.0.5
LLMT aims to make it easy to programatically connect OpenAI and HuggingFace models to your data p...
5 versions - Latest release: 29 days ago - 413 downloads last month - 4 stars on GitHub - 1 maintainer
xml2db 0.9.4
Import complex XML files to a relational database
3 versions - Latest release: 20 days ago - 187 downloads last month - 4 stars on GitHub - 1 maintainer
snowflake-dbml-generator 0.1.2
Automatically generate DBML files from Snowflake databases.
3 versions - Latest release: 14 days ago - 362 downloads last month - 5 stars on GitHub - 1 maintainer
route1io-connectors 0.16.0
Connectors for interacting with popular API's used in marketing analytics using clean and concise...
30 versions - Latest release: 8 months ago - 1 dependent repositories - 101 downloads last month - 5 stars on GitHub - 1 maintainer
adcpipeline 0.2.1
A pipeline for a structured way of working
5 versions - Latest release: about 1 year ago - 1 dependent repositories - 320 downloads last month - 5 stars on GitHub - 2 maintainers
streamable 0.9.0
fluent iteration
23 versions - Latest release: 2 months ago - 229 downloads last month - 5 stars on GitHub - 1 maintainer
aiscalator 0.1.18
AIscalate your Jupyter Notebook Prototypes into Airflow Data Products
22 versions - Latest release: almost 4 years ago - 277 downloads last month - 5 stars on GitHub - 1 maintainer
Top 5.5% on pypi.org
mojap-metadata 1.14.2
A python package to manage metadata
28 versions - Latest release: 6 months ago - 3 dependent packages - 4 dependent repositories - 4.45 thousand downloads last month - 6 stars on GitHub - 2 maintainers
Top 5.0% on pypi.org
pydbtools 5.5.18
A python package to query data via amazon athena and bring it into a pandas df using aws-wrangler.
39 versions - Latest release: 6 days ago - 2 dependent packages - 4 dependent repositories - 6.9 thousand downloads last month - 6 stars on GitHub - 6 maintainers
codepack 0.8.0
CodePack is the package to easily make, run, and manage workflows
19 versions - Latest release: almost 2 years ago - 1 dependent repositories - 220 downloads last month - 6 stars on GitHub - 1 maintainer
dataride 0.2.3
Lightning-fast data platform setup for small/medium projects & PoCs
4 versions - Latest release: over 1 year ago - 38 downloads last month - 6 stars on GitHub - 1 maintainer
data-linter 6.2.5
data linter
30 versions - Latest release: about 2 months ago - 1 dependent repositories - 355 downloads last month - 6 stars on GitHub - 5 maintainers
lakehouse-engine 1.19.0
A Spark framework serving as the engine for several lakehouse algorithms and data flows.
8 versions - Latest release: 2 months ago - 289 downloads last month - 6 stars on GitHub - 1 maintainer
arrow_pd_parser 2.0.0
MoJ arrow-pd-parser
24 versions - Latest release: 6 months ago - 1 dependent repositories - 4.37 thousand downloads last month - 7 stars on GitHub - 2 maintainers
spooq 3.4.0
Spooq is a PySpark based helper library for ETL data ingestion pipeline in Data Lakes.
11 versions - Latest release: 2 months ago - 1 dependent repositories - 19.3 thousand downloads last month - 8 stars on GitHub - 1 maintainer
flowrunner 0.2.3
Flowrunner is a lightweight package to organize and represent Data Engineering/Science workflows
5 versions - Latest release: 12 months ago - 45 downloads last month - 8 stars on GitHub - 1 maintainer
prefecto 1.0.0
Tools for supporting Prefect development.
7 versions - Latest release: about 1 month ago - 1 dependent repositories - 1.09 thousand downloads last month - 8 stars on GitHub - 1 maintainer
eos-etl 1.0.0
Tools for exporting EOS blockchain data to JSON
1 version - Latest release: almost 5 years ago - 1 dependent repositories - 7 downloads last month - 9 stars on GitHub - 1 maintainer
prefect-planetary-computer 0.1.1
Prefect integrations with Microsoft Planetary Computer
2 versions - Latest release: 7 months ago - 24 downloads last month - 10 stars on GitHub - 1 maintainer
facebook-page-info-scraper 1.1.2
A Python package capable of crawling Facebook page information.
9 versions - Latest release: 4 months ago - 98 downloads last month - 10 stars on GitHub - 1 maintainer
gcp-airflow-foundations 0.3.7
Opinionated framework based on Airflow 2.0 for building pipelines to ingest data into a BigQuery ...
19 versions - Latest release: over 1 year ago - 1 dependent repositories - 41 downloads last month - 11 stars on GitHub - 1 maintainer
risk-command-center 1.0.37
Risk Command Center, manage your risk easly.
2 versions - Latest release: about 2 years ago - 1 dependent repositories - 10 downloads last month - 11 stars on GitHub - 1 maintainer
gcp-airflow-foundations-dev 0.9.7
Opinionated framework based on Airflow 2.0 for building pipelines to ingest data into a BigQuery ...
148 versions - Latest release: almost 2 years ago - 1 dependent repositories - 452 downloads last month - 11 stars on GitHub - 1 maintainer
dpq 0.1.5
dpq is an open-source python library that makes prompt-based data processing and feature engineer...
6 versions - Latest release: about 1 month ago - 102 downloads last month - 11 stars on GitHub - 1 maintainer
analytics-command-center 3.0.14
Command Center for Data Ingestion, Advanced Analytics and Artificial Intelligence process
1 version - Latest release: over 2 years ago - 26 downloads last month - 11 stars on GitHub - 1 maintainer
gcp-airflow-foundations-dev-jiny 0.2.9
Opinionated framework based on Airflow 2.0 for building pipelines to ingest data into a BigQuery ...
1 version - Latest release: over 2 years ago - 1 dependent repositories - 10 downloads last month - 11 stars on GitHub - 1 maintainer
prefect-alert 0.1.3 💰
Decorator to send alert when a Prefect task or flow fails
2 versions - Latest release: over 1 year ago - 1 dependent repositories - 1.57 thousand downloads last month - 12 stars on GitHub - 1 maintainer
marshmallow-pyspark 0.2.4
PySpark data serializer
6 versions - Latest release: 5 months ago - 1 dependent repositories - 2.14 thousand downloads last month - 12 stars on GitHub - 1 maintainer
dask-saturn 0.4.3
Dask Cluster objects in Saturn Cloud
19 versions - Latest release: over 1 year ago - 1 dependent repositories - 2.25 thousand downloads last month - 12 stars on GitHub - 2 maintainers