Ecosyste.ms: Packages
An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.
conda-forge.org "spark" keyword
elephas 2.1.0 💰
Distributed Deep learning with Keras & Spark4 versions - Latest release: almost 3 years ago - 1,562 stars on GitHub
delta-sharing-python 0.5.2
An open protocol for secure data sharing6 versions - Latest release: over 1 year ago - 539 stars on GitHub
glow 1.0.1
Glow is an open-source toolkit for working with genomic data at biobank-scale and beyond. The too...2 versions - Latest release: about 3 years ago - 3 dependent repositories - 225 stars on GitHub
datacompy 0.8.3
Pandas and Spark DataFrame comparison for humans9 versions - Latest release: over 1 year ago - 1 dependent repositories - 269 stars on GitHub
sparkmagic 0.20.0
Jupyter magics and kernels for working with remote Spark clusters17 versions - Latest release: almost 2 years ago - 2 dependent repositories - 1,207 stars on GitHub
hdijupyterutils 0.20.0
Jupyter magics and kernels for working with remote Spark clusters10 versions - Latest release: almost 2 years ago - 4 dependent packages - 1 dependent repositories - 1,207 stars on GitHub
fugue 0.7.3
Fugue is a unified interface for distributed computing that lets users execute Python, pandas, a...8 versions - Latest release: over 1 year ago - 4 dependent repositories - 1,271 stars on GitHub
splink 1.0.6
Fast, accurate and scalable probabilistic data linkage using your choice of SQL backend16 versions - Latest release: almost 3 years ago - 571 stars on GitHub
autovizwidget 0.20.0
Jupyter magics and kernels for working with remote Spark clusters12 versions - Latest release: almost 2 years ago - 3 dependent packages - 1 dependent repositories - 1,207 stars on GitHub
pyspark-test 0.2.0
Testing library for pyspark, inspired from pandas testing module but for pyspark, to help users w...2 versions - Latest release: over 2 years ago - 14 stars on GitHub
r-sparkr 3.3.1
Apache Spark - A unified analytics engine for large-scale data processing9 versions - Latest release: over 1 year ago - 37,996 stars on GitHub
mage-ai 0.7.5
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and t...22 versions - Latest release: over 1 year ago - 3,631 stars on GitHub
sqlglot 10.0.6
Python SQL Parser and Transpiler131 versions - Latest release: over 1 year ago - 2 dependent packages - 2,926 stars on GitHub
traceml 1.0.0
Engine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for P...1 version - Latest release: almost 2 years ago - 463 stars on GitHub
flytekit 1.1.0
Flytekit Python is the Python Library for easily authoring, testing, deploying, and interacting w...4 versions - Latest release: almost 2 years ago - 8 dependent packages - 123 stars on GitHub
flytekitplugins-sqlalchemy 1.2.4
SQLAlchemy plugin for Flytekit: `flytekitplugins-sqlalchemy` PyPI: [https://pypi.org/project/fly...8 versions - Latest release: over 1 year ago - 123 stars on GitHub
flytekitplugins-modin 1.2.4
Modin plugin for Flytekit: `flytekitplugins-modin` PyPI: [https://pypi.org/project/flytekitplugi...9 versions - Latest release: over 1 year ago - 123 stars on GitHub
flytekitplugins-athena 1.2.4
Athena plugin for Flytekit: `flytekitplugins-athena` PyPI: [https://pypi.org/project/flytekitplu...8 versions - Latest release: over 1 year ago - 123 stars on GitHub
flytekitplugins-awsbatch 1.2.4
AWS Batch plugin for Flytekit: `flytekitplugins-awsbatch` PyPI: [https://pypi.org/project/flytek...9 versions - Latest release: over 1 year ago - 123 stars on GitHub
flytekitplugins-data-fsspec 1.2.4
`fsspec` powered data-plugins for Flytekit: `flytekitplugins-data-fsspec` PyPI: [https://pypi.or...8 versions - Latest release: over 1 year ago - 123 stars on GitHub
flytekitplugins-spark 1.0.5
Spark 3 plugin for Flytekit: `flytekitplugins-spark` PyPI: [https://pypi.org/project/flytekitplu...1 version - Latest release: over 1 year ago - 123 stars on GitHub
mleap 0.21.0
MLeap: Deploy ML Pipelines to Production9 versions - Latest release: over 1 year ago - 1 dependent package - 1,443 stars on GitHub
pixiedust 1.1.19
Python Helper library for Jupyter Notebooks3 versions - Latest release: about 3 years ago - 1 dependent repositories - 1,029 stars on GitHub
tensorflowonspark 2.2.5
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.10 versions - Latest release: over 1 year ago - 3,851 stars on GitHub
sparkhpc 0.3.post4
launching and controlling spark on hpc clusters1 version - Latest release: about 5 years ago - 19 stars on GitHub
jupyter_enterprise_gateway 3.0.0
Jupyter Enterprise Gateway A lightweight, multi-tenant, scalable and secure gateway that enables ...23 versions - Latest release: over 1 year ago - 550 stars on GitHub
zappy 0.2.0
Distributed processing with NumPy and Zarr2 versions - Latest release: about 5 years ago - 1 dependent repositories - 8 stars on GitHub
koalas 1.8.2
Koalas: pandas API on Apache Spark42 versions - Latest release: over 2 years ago - 1 dependent package - 3,256 stars on GitHub
sagemaker_pyspark 1.4.2
A Spark library for Amazon SageMaker.12 versions - Latest release: about 3 years ago - 274 stars on GitHub
flytekitplugins-pandera 1.2.4
Pandera plugin for Flytekit: `flytekitplugins-pandera` PyPI: [https://pypi.org/project/flytekitp...7 versions - Latest release: over 1 year ago - 123 stars on GitHub
r-h2o 3.38.0.1
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gra...11 versions - Latest release: over 1 year ago - 6,189 stars on GitHub
h2o-py 3.38.0.2
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gra...31 versions - Latest release: over 1 year ago - 6,189 stars on GitHub
visions 0.7.5
Type System for Data Analysis in Python12 versions - Latest release: over 2 years ago - 2 dependent packages - 13 dependent repositories - 174 stars on GitHub
Related Keywords
python
21
data-science
17
hacktoberfest
11
data
10
mlops
9
workflows
8
sdk
8
pypi
8
flyte-tasks
8
flyte
8
extensible
8
automation
8
machine-learning
7
pandas
6
big-data
5
jupyter-notebook
5
jupyter
4
sql
4
kernel
4
pyspark
4
cluster
4
scala
4
r
3
distributed
3
dask
3
sql-query
3
java
3
pandas-dataframe
3
notebook
3
magic
3
livy
3
kerberos
3
tensorflow
3
distributed-computing
3
deep-learning
3
data-pipelines
2
numpy
2
dataframe
2
data-engineering
2
hadoop
2
naive-bayes
2
opensource
2
pca
2
random-forest
2
h2o-automl
2
h2o
2
gpu
2
gbm
2
ensemble-learning
2
automl
2
duckdb
2
scikit-learn
1
transformers
1
pixiedust
1
python-notebook
1
scala-notebooks
1
visualization
1
featured
1
data-analysis
1
tracking
1
type-inference
1
statistics
1
pytorch
1
plotly
1
type-system
1
pipelines
1
aws
1
amazon-sagemaker
1
pydata
1
sagemaker
1
mlflow
1
zarr
1
pywren
1
human-cell-atlas
1
beam
1
yarn
1
spark-on-kubernetes
1
remote-kernels
1
kubernetes
1
jupyter-kernels
1
jupyter-enterprise-gateway
1
gateway
1
enterprise
1
slurm
1
lsf
1
hpc-cluster
1
yahoo
1
pipeline
1
orchestration
1
etl
1
elt
1
dbt
1
data-integration
1
artificial-intelligence
1
jdbc
1
unittesting
1
record-linkage
1
fuzzy-matching
1
entity-resolution
1
em-algorithm
1