An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "apache-spark" keyword

View the packages on the pypi.org package registry that are tagged with the "apache-spark" keyword.

Top 3.3% on pypi.org
quinn 0.10.3
Pyspark helper methods to maximize developer efficiency
16 versions - Latest release: over 1 year ago - 3 dependent packages - 11 dependent repositories - 603 thousand downloads last month - 610 stars on GitHub - 1 maintainer
google-dataproc-templates 0.1.0
Google Dataproc templates written in Python
13 versions - Latest release: over 2 years ago - 165 downloads last month - 131 stars on GitHub - 3 maintainers
Top 0.2% on pypi.org
mlflow 3.1.1
MLflow is an open source platform for the complete machine learning lifecycle
143 versions - Latest release: 13 days ago - 360 dependent packages - 5,089 dependent repositories - 20.4 million downloads last month - 17,453 stars on GitHub - 13 maintainers
Top 0.6% on pypi.org
mlflow-skinny 3.1.1
MLflow is an open source platform for the complete machine learning lifecycle
104 versions - Latest release: 13 days ago - 49 dependent packages - 70 dependent repositories - 18.5 million downloads last month - 17,453 stars on GitHub - 8 maintainers
livyc 0.0.14 💰
Apache Livy Client
11 versions - Latest release: about 3 years ago - 31 downloads last month - 3 stars on GitHub - 1 maintainer
Top 6.9% on pypi.org
sparkmeasure 0.25.0
Python API for sparkMeasure, a tool for performance troubleshooting of Apache Spark workloads.
13 versions - Latest release: 2 months ago - 1 dependent repositories - 501 thousand downloads last month - 758 stars on GitHub - 1 maintainer
sparkmanager 0.7.3
A pyspark management framework
21 versions - Latest release: over 5 years ago - 1 dependent repositories - 72 downloads last month - 0 stars on GitHub - 1 maintainer
Top 3.7% on pypi.org
pyspark-stubs 3.0.0 💰
A collection of the Apache Spark stub files
38 versions - Latest release: almost 5 years ago - 2 dependent packages - 146 dependent repositories - 91.9 thousand downloads last month - 117 stars on GitHub - 1 maintainer
fasttrackml 0.6.0
An experiment tracking server focused on speed and scalability
25 versions - Latest release: about 1 year ago - 370 downloads last month - 105 stars on GitHub - 1 maintainer
pysparql 0.0.6
Query a SPARQL endpoint and manage the result with Spark
2 versions - Latest release: over 4 years ago - 1 dependent repositories - 28 downloads last month - 0 stars on GitHub - 1 maintainer
Top 3.3% on pypi.org
synapseml 1.0.11
Synapse Machine Learning
24 versions - Latest release: 3 months ago - 2 dependent packages - 3 dependent repositories - 879 thousand downloads last month - 5,144 stars on GitHub - 1 maintainer
miss-lightgbm-mmlspark 1.0.0
Microsoft ML for Spark
1 version - Latest release: about 1 year ago - 22 downloads last month - 5,144 stars on GitHub - 1 maintainer
dcborow-mmlspark 0.14.dev1
Microsoft ML for Spark
1 version - Latest release: over 5 years ago - 1 dependent repositories - 82 downloads last month - 4,986 stars on GitHub - 1 maintainer
nozberkman-mmlspark 1.0.0
Microsoft ML for Spark
1 version - Latest release: almost 4 years ago - 1 dependent repositories - 34 downloads last month - 4,472 stars on GitHub - 1 maintainer
Top 6.3% on pypi.org
lakefs-sdk 1.61.0
lakeFS API
81 versions - Latest release: 13 days ago - 3 dependent packages - 1 dependent repositories - 202 thousand downloads last month - 4,745 stars on GitHub - 1 maintainer
Top 3.2% on pypi.org
lakefs-client 1.44.0
[legacy] lakeFS API
178 versions - Latest release: 7 months ago - 4 dependent packages - 5 dependent repositories - 154 thousand downloads last month - 4,745 stars on GitHub - 1 maintainer
lakefs 0.11.1
lakeFS Python SDK Wrapper
26 versions - Latest release: about 1 month ago - 1 dependent package - 49.5 thousand downloads last month - 4,745 stars on GitHub - 1 maintainer
pysparkgateway 0.0.22
Connect Pyspark to remote clusters
18 versions - Latest release: over 4 years ago - 1 dependent repositories - 6.69 thousand downloads last month - 3 stars on GitHub - 1 maintainer
sparkdq 0.10.0
A declarative PySpark framework for row- and aggregate-level data quality validation.
17 versions - Latest release: about 1 month ago - 211 downloads last month - 46 stars on GitHub - 1 maintainer
flintrock 2.1.0
A command-line tool for launching Apache Spark clusters.
14 versions - Latest release: over 1 year ago - 1 dependent repositories - 415 downloads last month - 642 stars on GitHub - 1 maintainer
mlflow-tmp 2.2.26
MLflow: A Platform for ML Development and Productionization
25 versions - Latest release: about 2 years ago - 140 downloads last month - 21,092 stars on GitHub - 1 maintainer
mlflow-devlibx 1.22.8
MLflow: A Platform for ML Development and Productionization
9 versions - Latest release: over 3 years ago - 1 dependent repositories - 63 downloads last month - 21,092 stars on GitHub - 1 maintainer
mlflow-by-johnsnowlabs-v2 2.44.0
MLflow: A Platform for ML Development and Productionization
9 versions - Latest release: almost 2 years ago - 78 downloads last month - 21,092 stars on GitHub - 1 maintainer
qubole-ml 1.9.1
MLflow: An ML Workflow Tool
2 versions - Latest release: almost 5 years ago - 1 dependent repositories - 16 downloads last month - 21,092 stars on GitHub - 1 maintainer
mlflow-stonewise 1.30.1
MLflow: A Platform for ML Development and Productionization
1 version - Latest release: over 2 years ago - 25 downloads last month - 21,092 stars on GitHub - 1 maintainer
lmcmlflow 1.17.1
MLflow: A Platform for ML Development and Productionization
3 versions - Latest release: about 4 years ago - 1 dependent repositories - 13 downloads last month - 21,092 stars on GitHub - 1 maintainer
mlflow-no-ssl 2.16.3
MLflow is an open source platform for the complete machine learning lifecycle
1 version - Latest release: 10 months ago - 36 downloads last month - 21,092 stars on GitHub - 1 maintainer
mlflow-saagie 2.9.2
MLflow: A Platform for ML Development and Productionization - forked for Saagie
8 versions - Latest release: over 1 year ago - 1 dependent repositories - 46 downloads last month - 21,092 stars on GitHub - 1 maintainer
mlflow-tracing 3.1.0
MLflow Tracing SDK is an open-source, lightweight Python package that only includes the minimum s...
4 versions - Latest release: 25 days ago - 781 downloads last month - 21,092 stars on GitHub - 5 maintainers
mlflow-by-johnsnowlabs 2.40.0
MLflow: A Platform for ML Development and Productionization
35 versions - Latest release: over 1 year ago - 215 downloads last month - 16,113 stars on GitHub - 1 maintainer
mlflow-by-ckl 2.81.0
MLflow: A Platform for ML Development and Productionization
44 versions - Latest release: about 1 year ago - 287 downloads last month - 16,113 stars on GitHub - 1 maintainer
mlflow-ste 1.10.1.dev0
MLflow: An ML Workflow Tool
1 version - Latest release: almost 5 years ago - 1 dependent repositories - 19 downloads last month - 16,107 stars on GitHub - 1 maintainer
Top 7.5% on pypi.org
feathr 1.0.0
An Enterprise-Grade, High Performance Feature Store
22 versions - Latest release: over 2 years ago - 1 dependent repositories - 913 downloads last month - 1,899 stars on GitHub - 1 maintainer
sparkora 0.0.1
Exploratory data analysis toolkit for Pyspark
1 version - Latest release: over 3 years ago - 1 dependent repositories - 20 downloads last month - 53 stars on GitHub - 1 maintainer
Top 3.1% on pypi.org
spark-sklearn 0.3.0
Integration tools for running scikit-learn on Spark
8 versions - Latest release: over 6 years ago - 14 dependent repositories - 336 thousand downloads last month - 1,078 stars on GitHub - 5 maintainers
dead-salmon-brain 0.0.7
Dead Salmon Brain is a cluster computing system for analysing A/B experiments
7 versions - Latest release: about 3 years ago - 1 dependent repositories - 215 downloads last month - 15 stars on GitHub - 1 maintainer
openaivec 0.8.10
Generative mutation for tabular calculation
59 versions - Latest release: 19 days ago - 664 downloads last month - 13 stars on GitHub - 1 maintainer
Top 6.6% on pypi.org
spylon 0.3.0
Utilities to work with Scala/Java code with py4j
19 versions - Latest release: about 8 years ago - 16 dependent repositories - 2.06 thousand downloads last month - 40 stars on GitHub - 3 maintainers
graphframes-latest 0.8.3
GraphFrames: DataFrame-based Graphs
1 version - Latest release: almost 2 years ago - 32 thousand downloads last month - 1,061 stars on GitHub - 1 maintainer
sparkql 0.10.0
sparkql: Apache Spark SQL DataFrame schema management for sensible humans
20 versions - Latest release: almost 2 years ago - 1 dependent repositories - 6.92 thousand downloads last month - 13 stars on GitHub - 1 maintainer
laktory 0.8.0
An ETL and DataOps framework for building a lakehouse
91 versions - Latest release: 9 days ago - 9.12 thousand downloads last month - 13 stars on GitHub - 1 maintainer
pypair 3.0.9 💰
Pairwise association measures of statistical variable types
11 versions - Latest release: almost 4 years ago - 1 dependent repositories - 195 downloads last month - 22 stars on GitHub - 1 maintainer
aim-mlflow 0.2.1
Aim-MLflow integration
4 versions - Latest release: about 2 years ago - 1 dependent repositories - 967 downloads last month - 20,464 stars on GitHub - 1 maintainer
spark-connect-proxy 0.0.11
A reverse proxy server which allows secure connectivity to a Spark Connect server
8 versions - Latest release: 6 months ago - 53 downloads last month - 9 stars on GitHub - 1 maintainer
dbnet 0.2.5
DbNet.
24 versions - Latest release: over 5 years ago - 1 dependent repositories - 60 downloads last month - 7 stars on GitHub - 1 maintainer
Top 7.6% on pypi.org
sparktorch 0.2.0
Distributed training of PyTorch networks on Apache Spark with ML Pipeline support
11 versions - Latest release: about 2 years ago - 2 dependent repositories - 413 downloads last month - 339 stars on GitHub - 1 maintainer
sparkflow 0.7.0
Deep learning on Spark with Tensorflow
13 versions - Latest release: about 6 years ago - 1 dependent repositories - 36 downloads last month - 296 stars on GitHub - 1 maintainer
Top 5.0% on pypi.org
sparkit-learn 0.2.6
Scikit-learn on PySpark
5 versions - Latest release: about 10 years ago - 5 dependent repositories - 1.08 thousand downloads last month - 1,154 stars on GitHub - 3 maintainers
spark-privacy-preserver 0.3.1
Anonymizing Library for Apache Spark
4 versions - Latest release: almost 5 years ago - 1 dependent repositories - 24 downloads last month - 30 stars on GitHub - 3 maintainers
Top 3.8% on pypi.org
pysparkling 0.6.2
Pure Python implementation of the Spark RDD interface.
69 versions - Latest release: over 2 years ago - 1 dependent package - 34 dependent repositories - 45.8 thousand downloads last month - 269 stars on GitHub - 2 maintainers
mlflow-databricks-artifacts 2.0.1
Plugin to create and access MLflow-managed artifacts on Databricks
3 versions - Latest release: over 2 years ago - 1 dependent repositories - 3.16 thousand downloads last month - 20,464 stars on GitHub - 2 maintainers
pyspark-analyzer 4.4.0
A comprehensive PySpark DataFrame profiler for generating detailed statistics and data quality re...
16 versions - Latest release: 16 days ago - 1.27 thousand downloads last month - 0 stars on GitHub - 1 maintainer
mlflowcollab 0.0.4
Gebruik MLFlow op een centrale locatie
1 version - Latest release: about 3 years ago - 1 dependent repositories - 8 downloads last month - 20,464 stars on GitHub - 1 maintainer
exelog 0.0.1
Enabling meticulous logging for Spark Applications
1 version - Latest release: over 3 years ago - 1 dependent repositories - 8 downloads last month - 5 stars on GitHub - 1 maintainer
blind 0.0.1
Blind Client: The easiest ML tracking library
1 version - Latest release: over 3 years ago - 2 dependent repositories - 27 downloads last month - 20,464 stars on GitHub - 1 maintainer
Top 9.0% on pypi.org
jupyterlab-sparkmonitor 4.1.0
Spark Monitor Extension for Jupyter Lab
14 versions - Latest release: almost 4 years ago - 1 dependent repositories - 142 downloads last month - 92 stars on GitHub - 1 maintainer
pyjaws 0.1.7
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
10 versions - Latest release: almost 2 years ago - 1 dependent repositories - 28 downloads last month - 43 stars on GitHub - 1 maintainer
Top 7.8% on pypi.org
dist-keras 0.2.1
Distributed Deep learning with Apache Spark with Keras.
3 versions - Latest release: over 7 years ago - 1 dependent repositories - 293 downloads last month - 623 stars on GitHub - 1 maintainer
k8s-spark-helper-liangdao-data 0.0.3
Run spark task in k8s
3 versions - Latest release: over 2 years ago - 13 downloads last month - 2,914 stars on GitHub - 1 maintainer
mmtfpyspark 0.3.6
Methods for parallel and distributed analysis and mining of the Protein Data Bank using MMTF and ...
10 versions - Latest release: over 6 years ago - 1 dependent repositories - 48 downloads last month - 67 stars on GitHub - 2 maintainers
pyspark-connectors 0.3.0
The easy and quickly way to connect and integrate the Spark project with many others data sources.
9 versions - Latest release: about 1 year ago - 46 downloads last month - 6 stars on GitHub - 1 maintainer
pyspark-asyncactions 0.0.4 💰
A proof of concept asynchronous actions for PySpark using concurent.futures
4 versions - Latest release: almost 5 years ago - 1 dependent repositories - 5.04 thousand downloads last month - 47 stars on GitHub - 1 maintainer
pyspark3d 0.3.1
Spark extension for processing large-scale 3D data sets
4 versions - Latest release: over 6 years ago - 1 dependent repositories - 28 downloads last month - 30 stars on GitHub - 1 maintainer
pyspark-regression 4.1.0
A tool for regression testing Spark Dataframes in Python
6 versions - Latest release: about 2 months ago - 18.1 thousand downloads last month - 1 stars on GitHub - 1 maintainer
chopin2 1.0.9
Supervised Classification with Hyperdimensional Computing
13 versions - Latest release: about 1 year ago - 1 dependent repositories - 26 downloads last month - 12 stars on GitHub - 1 maintainer
cleanflow 1.3.3a1
A a framework for cleaning, pre-processing and exploring data in a scalable and distributed manner.
11 versions - Latest release: about 7 years ago - 38 downloads last month - 1 stars on GitHub - 1 maintainer
sparkhistogram 0.4
Sparkhistogram contains helper functions for generating data histograms with the Spark DataFrame ...
4 versions - Latest release: about 1 year ago - 1 dependent repositories - 155 downloads last month - 449 stars on GitHub - 1 maintainer
tpcds-pyspark 1.0.6
TPCDS_PySpark is a TPC-DS workload generator implemented in Python designed to run at scale using...
6 versions - Latest release: about 1 month ago - 96 downloads last month - 449 stars on GitHub - 1 maintainer
test-cpu-parallel 1.0.5
test-CPU-parallel is a basic CPU workload generator.
2 versions - Latest release: almost 2 years ago - 16 downloads last month - 449 stars on GitHub - 1 maintainer
patek 0.5.2
A collection of utilities and tools for accelerating pyspark development and productivity.
7 versions - Latest release: over 2 years ago - 124 downloads last month - 0 stars on GitHub - 1 maintainer
spark-pipeline 0.0.4
Data Science oriented tools, mostly for Apache Spark
2 versions - Latest release: about 5 years ago - 1 dependent repositories - 7 downloads last month - 9 stars on GitHub - 1 maintainer
gdmix-workflow 0.3.0
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
5 versions - Latest release: over 4 years ago - 1 dependent repositories - 12 downloads last month - 2,912 stars on GitHub - 1 maintainer
parallel-simulations 0.0.1
Helper class to orchestrate in parallel Monte Carlo simulations for an arbitrary number of models...
1 version - Latest release: over 1 year ago - 6 downloads last month - 0 stars on GitHub - 1 maintainer
pybda 0.1.0
Analysis of big biological data sets for distributed HPC clusters.
6 versions - Latest release: almost 6 years ago - 1 dependent repositories - 26 downloads last month - 9 stars on GitHub - 1 maintainer
svm 0.1.0
Version manager for Apache Spark
2 versions - Latest release: over 7 years ago - 14 dependent repositories - 988 downloads last month - 1 stars on GitHub - 1 maintainer
eleflow-spark-integrations 0.0.1a2
The easy and quickly way to connect and integrate the Spark project with many others data sources.
2 versions - Latest release: about 3 years ago - 6 stars on GitHub
Top 6.8% on pypi.org
dataproc-templates 0.0.2
Dataproc templates written in Python
2 versions - Latest release: over 2 years ago - 40 stars on GitHub
jennytest 0.2.9 removed
Data quality and profiling tool powered by Apache Spark.
5 versions - Latest release: almost 5 years ago - 39 downloads last month