Ecosyste.ms: Packages
An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.
conda-forge.org "spark" keyword
visions 0.7.5
Type System for Data Analysis in Python12 versions - Latest release: over 2 years ago - 2 dependent packages - 13 dependent repositories - 174 stars on GitHub
fugue 0.7.3
Fugue is a unified interface for distributed computing that lets users execute Python, pandas, a...8 versions - Latest release: over 1 year ago - 4 dependent repositories - 1,271 stars on GitHub
glow 1.0.1
Glow is an open-source toolkit for working with genomic data at biobank-scale and beyond. The too...2 versions - Latest release: about 3 years ago - 3 dependent repositories - 225 stars on GitHub
sparkmagic 0.20.0
Jupyter magics and kernels for working with remote Spark clusters17 versions - Latest release: about 2 years ago - 2 dependent repositories - 1,207 stars on GitHub
datacompy 0.8.3
Pandas and Spark DataFrame comparison for humans9 versions - Latest release: over 1 year ago - 1 dependent repositories - 269 stars on GitHub
zappy 0.2.0
Distributed processing with NumPy and Zarr2 versions - Latest release: over 5 years ago - 1 dependent repositories - 8 stars on GitHub
pixiedust 1.1.19
Python Helper library for Jupyter Notebooks3 versions - Latest release: over 3 years ago - 1 dependent repositories - 1,029 stars on GitHub
hdijupyterutils 0.20.0
Jupyter magics and kernels for working with remote Spark clusters10 versions - Latest release: almost 2 years ago - 4 dependent packages - 1 dependent repositories - 1,207 stars on GitHub
autovizwidget 0.20.0
Jupyter magics and kernels for working with remote Spark clusters12 versions - Latest release: almost 2 years ago - 3 dependent packages - 1 dependent repositories - 1,207 stars on GitHub
r-sparkr 3.3.1
Apache Spark - A unified analytics engine for large-scale data processing9 versions - Latest release: over 1 year ago - 37,996 stars on GitHub
tensorflowonspark 2.2.5
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.10 versions - Latest release: over 1 year ago - 3,851 stars on GitHub
jupyter_enterprise_gateway 3.0.0
Jupyter Enterprise Gateway A lightweight, multi-tenant, scalable and secure gateway that enables ...23 versions - Latest release: over 1 year ago - 550 stars on GitHub
elephas 2.1.0 💰
Distributed Deep learning with Keras & Spark4 versions - Latest release: almost 3 years ago - 1,562 stars on GitHub
splink 1.0.6
Fast, accurate and scalable probabilistic data linkage using your choice of SQL backend16 versions - Latest release: almost 3 years ago - 571 stars on GitHub
h2o-py 3.38.0.2
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gra...31 versions - Latest release: over 1 year ago - 6,189 stars on GitHub
mage-ai 0.7.5
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and t...22 versions - Latest release: over 1 year ago - 3,631 stars on GitHub
sparkhpc 0.3.post4
launching and controlling spark on hpc clusters1 version - Latest release: about 5 years ago - 19 stars on GitHub
delta-sharing-python 0.5.2
An open protocol for secure data sharing6 versions - Latest release: over 1 year ago - 539 stars on GitHub
mleap 0.21.0
MLeap: Deploy ML Pipelines to Production9 versions - Latest release: over 1 year ago - 1 dependent package - 1,443 stars on GitHub
r-h2o 3.38.0.1
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gra...11 versions - Latest release: over 1 year ago - 6,189 stars on GitHub
koalas 1.8.2
Koalas: pandas API on Apache Spark42 versions - Latest release: over 2 years ago - 1 dependent package - 3,256 stars on GitHub
sagemaker_pyspark 1.4.2
A Spark library for Amazon SageMaker.12 versions - Latest release: about 3 years ago - 274 stars on GitHub
flytekitplugins-awsbatch 1.2.4
AWS Batch plugin for Flytekit: `flytekitplugins-awsbatch` PyPI: [https://pypi.org/project/flytek...9 versions - Latest release: over 1 year ago - 123 stars on GitHub
flytekitplugins-athena 1.2.4
Athena plugin for Flytekit: `flytekitplugins-athena` PyPI: [https://pypi.org/project/flytekitplu...8 versions - Latest release: over 1 year ago - 123 stars on GitHub
flytekitplugins-modin 1.2.4
Modin plugin for Flytekit: `flytekitplugins-modin` PyPI: [https://pypi.org/project/flytekitplugi...9 versions - Latest release: over 1 year ago - 123 stars on GitHub
flytekitplugins-sqlalchemy 1.2.4
SQLAlchemy plugin for Flytekit: `flytekitplugins-sqlalchemy` PyPI: [https://pypi.org/project/fly...8 versions - Latest release: over 1 year ago - 123 stars on GitHub
flytekit 1.1.0
Flytekit Python is the Python Library for easily authoring, testing, deploying, and interacting w...4 versions - Latest release: almost 2 years ago - 8 dependent packages - 123 stars on GitHub
flytekitplugins-pandera 1.2.4
Pandera plugin for Flytekit: `flytekitplugins-pandera` PyPI: [https://pypi.org/project/flytekitp...7 versions - Latest release: over 1 year ago - 123 stars on GitHub
traceml 1.0.0
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for P...1 version - Latest release: almost 2 years ago - 463 stars on GitHub
sqlglot 10.0.6
Python SQL Parser and Transpiler131 versions - Latest release: over 1 year ago - 2 dependent packages - 2,926 stars on GitHub
flytekitplugins-spark 1.0.5
Spark 3 plugin for Flytekit: `flytekitplugins-spark` PyPI: [https://pypi.org/project/flytekitplu...1 version - Latest release: almost 2 years ago - 123 stars on GitHub
flytekitplugins-data-fsspec 1.2.4
`fsspec` powered data-plugins for Flytekit: `flytekitplugins-data-fsspec` PyPI: [https://pypi.or...8 versions - Latest release: over 1 year ago - 123 stars on GitHub
pyspark-test 0.2.0
Testing library for pyspark, inspired from pandas testing module but for pyspark, to help users w...2 versions - Latest release: over 2 years ago - 14 stars on GitHub
Related Keywords
python
21
data-science
17
hacktoberfest
11
data
10
mlops
9
automation
8
extensible
8
flyte
8
flyte-tasks
8
pypi
8
workflows
8
sdk
8
machine-learning
7
pandas
6
jupyter-notebook
5
big-data
5
pyspark
4
kernel
4
jupyter
4
cluster
4
scala
4
sql
4
java
3
r
3
tensorflow
3
sql-query
3
deep-learning
3
pandas-dataframe
3
notebook
3
magic
3
livy
3
kerberos
3
dask
3
distributed
3
distributed-computing
3
numpy
2
duckdb
2
automl
2
ensemble-learning
2
gbm
2
gpu
2
h2o
2
h2o-automl
2
dataframe
2
hadoop
2
data-pipelines
2
naive-bayes
2
opensource
2
pca
2
random-forest
2
data-engineering
2
slurm
1
data-sharing
1
sagemaker
1
delta-lake
1
scikit-learn
1
aws
1
amazon-sagemaker
1
pydata
1
transformers
1
mlflow
1
unittesting
1
tsql
1
trino
1
transpiler
1
translation
1
sqlparser
1
sqlite
1
snowflake
1
redshift
1
presto
1
postgres
1
parser
1
optimizer
1
mysql
1
hive
1
clickhouse
1
bigquery
1
tracking
1
statistics
1
pytorch
1
plotly
1
pandas-summary
1
matplotlib
1
explainable-ai
1
dataops
1
dataframes
1
data-visualization
1
data-quality-checks
1
data-quality
1
data-profiling
1
data-exploration
1
jupyter-kernels
1
jupyter-enterprise-gateway
1
gateway
1
enterprise
1
yahoo
1
featured
1
jdbc
1
visualization
1