pypi.org "pyspark" keyword
View the packages on the pypi.org package registry that are tagged with the "pyspark" keyword.
pysail 0.2.4
Sail Python library15 versions - Latest release: 8 days ago - 1.7 thousand downloads last month - 12 stars on GitHub - 1 maintainer
koheesio 0.10.2
The steps-based Koheesio framework18 versions - Latest release: 17 days ago - 316 thousand downloads last month - 634 stars on GitHub - 1 maintainer
Top 1.9% on pypi.org
55 versions - Latest release: 5 months ago - 4 dependent packages - 93 dependent repositories - 335 thousand downloads last month - 1,314 stars on GitHub - 4 maintainers
autovizwidget 0.22.0
AutoVizWidget: An Auto-Visualization library for pandas dataframes55 versions - Latest release: 5 months ago - 4 dependent packages - 93 dependent repositories - 335 thousand downloads last month - 1,314 stars on GitHub - 4 maintainers
Top 1.4% on pypi.org
151 versions - Latest release: 3 months ago - 35 dependent packages - 35 dependent repositories - 4.22 million downloads last month - 3,717 stars on GitHub - 3 maintainers
spark-nlp 5.5.3
John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML...151 versions - Latest release: 3 months ago - 35 dependent packages - 35 dependent repositories - 4.22 million downloads last month - 3,717 stars on GitHub - 3 maintainers
metaspore 1.1.0
Metaspore: A Unified End-to-end Machine Intelligence Platform3 versions - Latest release: over 2 years ago - 160 downloads last month - 533 stars on GitHub - 2 maintainers
h3spark 0.1.6
Lightweight pyspark wrapper for h3-py12 versions - Latest release: 2 months ago - 399 downloads last month - 1 maintainer
Top 2.7% on pypi.org
42 versions - Latest release: 4 days ago - 7 dependent packages - 16 dependent repositories - 1.13 million downloads last month - 399 stars on GitHub - 4 maintainers
datacompy 0.16.5
Dataframe comparison in Python42 versions - Latest release: 4 days ago - 7 dependent packages - 16 dependent repositories - 1.13 million downloads last month - 399 stars on GitHub - 4 maintainers
pysparkproxy 0.0.17
Seamlessly execute pyspark code on remote clusters9 versions - Latest release: over 6 years ago - 1 dependent repositories - 253 downloads last month - 5 stars on GitHub - 1 maintainer
flake8-pyspark-with-column 0.0.5
A Flake8 plugin to check for PySpark withColumn usage in loops4 versions - Latest release: 3 months ago - 1.69 thousand downloads last month - 27 stars on GitHub - 1 maintainer
graphlet 0.1.1
Graphlet AI Knowledge Graph Factory1 version - Latest release: over 2 years ago - 59 downloads last month - 28 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
23 versions - Latest release: 7 days ago - 7 dependent packages - 54 dependent repositories - 2.16 million downloads last month - 606 stars on GitHub - 1 maintainer
chispa 0.11.1
Pyspark test helper library23 versions - Latest release: 7 days ago - 7 dependent packages - 54 dependent repositories - 2.16 million downloads last month - 606 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
54 versions - Latest release: 5 months ago - 3 dependent packages - 92 dependent repositories - 328 thousand downloads last month - 1,326 stars on GitHub - 4 maintainers
hdijupyterutils 0.22.0
HdiJupyterUtils: Utils for Jupyter projects from HDInsight team54 versions - Latest release: 5 months ago - 3 dependent packages - 92 dependent repositories - 328 thousand downloads last month - 1,326 stars on GitHub - 4 maintainers
Top 1.7% on pypi.org
57 versions - Latest release: 5 months ago - 4 dependent packages - 86 dependent repositories - 32.2 thousand downloads last month - 1,326 stars on GitHub - 5 maintainers
sparkmagic 0.22.0
SparkMagic: Spark execution via Livy57 versions - Latest release: 5 months ago - 4 dependent packages - 86 dependent repositories - 32.2 thousand downloads last month - 1,326 stars on GitHub - 5 maintainers
Top 9.4% on pypi.org
32 versions - Latest release: over 2 years ago - 1 dependent repositories - 743 downloads last month - 1,447 stars on GitHub - 2 maintainers
pyoptimus 0.1.0
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.32 versions - Latest release: over 2 years ago - 1 dependent repositories - 743 downloads last month - 1,447 stars on GitHub - 2 maintainers
dapter 0.1.1
Tool to adapt multiple dataframes to one unique format1 version - Latest release: 8 months ago - 46 downloads last month - 916 stars on GitHub - 1 maintainer
turntable-spoonbill 10.0.5
Productivity-centric Python Big Data Framework9 versions - Latest release: 4 months ago - 348 downloads last month - 5,698 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
139 versions - Latest release: 23 days ago - 25 dependent packages - 130 dependent repositories - 461 thousand downloads last month - 3,333 stars on GitHub - 6 maintainers
ibis-framework 10.4.0
The portable Python dataframe library139 versions - Latest release: 23 days ago - 25 dependent packages - 130 dependent repositories - 461 thousand downloads last month - 3,333 stars on GitHub - 6 maintainers
autofeatures 1.0.2
PySpark Auto Feature Selector3 versions - Latest release: almost 5 years ago - 179 downloads last month - 9 stars on GitHub - 1 maintainer
glue-utils 0.9.2
Reusable utilities for working with Glue PySpark jobs29 versions - Latest release: 1 day ago - 10.6 thousand downloads last month - 6 stars on GitHub - 1 maintainer
exelog 0.0.1
Enabling meticulous logging for Spark Applications1 version - Latest release: over 3 years ago - 1 dependent repositories - 60 downloads last month - 5 stars on GitHub - 1 maintainer
Top 4.8% on pypi.org
83 versions - Latest release: almost 5 years ago - 8 dependent repositories - 5.85 thousand downloads last month - 1,501 stars on GitHub - 2 maintainers
optimuspyspark 2.2.32
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion wi...83 versions - Latest release: almost 5 years ago - 8 dependent repositories - 5.85 thousand downloads last month - 1,501 stars on GitHub - 2 maintainers
codeme 0.1.9
CodeMe - Automatic Python Coder20 versions - Latest release: over 2 years ago - 450 downloads last month - 1 stars on GitHub - 1 maintainer
dataflat 2.0.0
A library to flatten nested data.8 versions - Latest release: 7 months ago - 653 downloads last month - 11 stars on GitHub - 1 maintainer
bigdatasml 0.1.3
This package calculates average student performances3 versions - Latest release: over 3 years ago - 138 downloads last month - 1 maintainer
yummy 0.0.11
14 versions - Latest release: about 1 year ago - 1 dependent repositories - 312 downloads last month - 34 stars on GitHub - 1 maintainerhermione-databricks 1.0.7
Tool to create ML project structure inside the databricks framework18 versions - Latest release: over 4 years ago - 1 dependent repositories - 511 downloads last month - 4 stars on GitHub - 1 maintainer
pyspark-eda 1.6.0
A Python package for univariate ,bivariate and multivariate data analysis using PySpark23 versions - Latest release: 10 months ago - 820 downloads last month - 1 maintainer
Top 3.3% on pypi.org
23 versions - Latest release: 2 days ago - 2 dependent packages - 3 dependent repositories - 613 thousand downloads last month - 5,119 stars on GitHub - 1 maintainer
synapseml 1.0.11
Synapse Machine Learning23 versions - Latest release: 2 days ago - 2 dependent packages - 3 dependent repositories - 613 thousand downloads last month - 5,119 stars on GitHub - 1 maintainer
pyspark-spy 1.0.2
Collect and aggregate on spark events for profitz. In 🐍 way!3 versions - Latest release: about 4 years ago - 1 dependent repositories - 333 downloads last month - 10 stars on GitHub - 1 maintainer
spark-lean 0.3.3
An interactive PySpark-based Data Cleaning Library4 versions - Latest release: almost 7 years ago - 1 dependent repositories - 79 downloads last month - 7 stars on GitHub - 2 maintainers
scimple 1.11.5
Scimplify Ploting, Graph manipulation, Spark Streaming & Kafka and other tools20 versions - Latest release: almost 5 years ago - 1 dependent repositories - 384 downloads last month - 1 stars on GitHub - 1 maintainer
mmtfpyspark 0.3.6
Methods for parallel and distributed analysis and mining of the Protein Data Bank using MMTF and ...10 versions - Latest release: about 6 years ago - 1 dependent repositories - 317 downloads last month - 68 stars on GitHub - 2 maintainers
openaivec 0.6.0
Generative mutation for tabular calculation30 versions - Latest release: 3 days ago - 1.9 thousand downloads last month - 9 stars on GitHub - 1 maintainer
jsonspark 0.0.2
This is a wrapper package for pyspark to process json files. It pythonifies the json pyspark object.2 versions - Latest release: almost 4 years ago - 1 dependent repositories - 77 downloads last month - 4 stars on GitHub - 1 maintainer
sparkql 0.10.0
sparkql: Apache Spark SQL DataFrame schema management for sensible humans20 versions - Latest release: over 1 year ago - 1 dependent repositories - 7.03 thousand downloads last month - 14 stars on GitHub - 1 maintainer
pyspark-typedschema 0.0.6
Define (typed) schemas for pyspark dataframes6 versions - Latest release: 3 months ago - 229 downloads last month - 1 stars on GitHub - 1 maintainer
pyspark-val 0.1.4
PySpark validation & testing tooling5 versions - Latest release: about 1 year ago - 219 downloads last month - 0 stars on GitHub - 1 maintainer
compars 0.0.0
DataFrame comparison done right (AKA the Bear-agnostic DataFrame comparison library)1 version - Latest release: 12 months ago - 58 downloads last month - 0 stars on GitHub - 1 maintainer
sysxtract 1.0.0
Extract logs based off events from sysmon. Comes as a package, cli and ui.1 version - Latest release: almost 5 years ago - 1 dependent repositories - 48 downloads last month - 3 stars on GitHub - 1 maintainer
hlink 4.1.0
Fast supervised pyspark record linkage software24 versions - Latest release: 4 days ago - 637 downloads last month - 12 stars on GitHub - 4 maintainers
pyspark-connectors 0.3.0
The easy and quickly way to connect and integrate the Spark project with many others data sources.9 versions - Latest release: 10 months ago - 281 downloads last month - 6 stars on GitHub - 1 maintainer
pbspark 0.9.0
Convert between protobuf messages and pyspark dataframes11 versions - Latest release: almost 2 years ago - 1 dependent repositories - 222 thousand downloads last month - 23 stars on GitHub - 1 maintainer
spark-map 0.2.78
Pyspark implementation of `map()` function for spark DataFrames3 versions - Latest release: almost 2 years ago - 318 downloads last month - 1 stars on GitHub - 1 maintainer
pyspark_dfreport 0.1
Simple Python Package to save (small) PySpark DataFrames to one Excel File.1 version - Latest release: over 8 years ago - 24 downloads last month - 0 stars on GitHub - 1 maintainer
Top 8.9% on pypi.org
30 versions - Latest release: over 2 years ago - 1 dependent repositories - 103 thousand downloads last month - 266 stars on GitHub - 1 maintainer
pyspark-hnsw 1.1.0
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World gr...30 versions - Latest release: over 2 years ago - 1 dependent repositories - 103 thousand downloads last month - 266 stars on GitHub - 1 maintainer
hnswlib-spark 0.0.0
Pyspark module for hnswlib10 versions - Latest release: 2 months ago - 422 downloads last month - 266 stars on GitHub - 1 maintainer
scopt 0.0.5
Calculate optimized properties of Spark configuration5 versions - Latest release: about 3 years ago - 1 dependent repositories - 571 downloads last month - 5 stars on GitHub - 1 maintainer
dbloy 0.3.0
Continuous Delivery tool for PySpark Notebooks based jobs on Databricks.1 version - Latest release: over 5 years ago - 1 dependent repositories - 37 downloads last month - 1 stars on GitHub - 1 maintainer
pyspark-easy 1.5
Makes pyspark dataframe exploration easy10 versions - Latest release: about 4 years ago - 1 dependent repositories - 216 downloads last month - 1 stars on GitHub - 1 maintainer
sparkhpc 0.1
spark deployment on hpc resources made easy11 versions - Latest release: over 1 year ago - 1 dependent repositories - 202 downloads last month - 1 maintainer
cuallee 0.15.2
Python library for data validation on DataFrame APIs including Snowflake/Snowpark, Apache/PySpark...94 versions - Latest release: 4 months ago - 1 dependent package - 1 dependent repositories - 47.1 thousand downloads last month - 185 stars on GitHub - 2 maintainers
pyspark-asyncactions 0.0.4 💰
A proof of concept asynchronous actions for PySpark using concurent.futures4 versions - Latest release: over 4 years ago - 1 dependent repositories - 5.57 thousand downloads last month - 47 stars on GitHub - 1 maintainer
spark-scaffolder-transforms-tools 0.0.1
spark_scaffolder_transforms_tools1 version - Latest release: about 1 year ago - 46 downloads last month - 1 maintainer
pagaya-mapinpandas 0.5
Easy python wrapper for Spark mapInPandas, applyInPandas1 version - Latest release: almost 3 years ago - 30 downloads last month - 1 maintainer
replay-rec 0.18.1
RecSys Library25 versions - Latest release: about 1 month ago - 1 dependent package - 1 dependent repositories - 2.71 thousand downloads last month - 137 stars on GitHub - 1 maintainer
pyspark-model-plus 1.0.2
Enhancements to commonly used pyspark functions for building models2 versions - Latest release: about 4 years ago - 1 dependent repositories - 108 downloads last month - 0 stars on GitHub - 1 maintainer
tinsel 0.3.0
PySpark schema generator3 versions - Latest release: over 6 years ago - 1 dependent repositories - 237 thousand downloads last month - 1 maintainer
Top 9.0% on pypi.org
14 versions - Latest release: over 3 years ago - 1 dependent repositories - 411 downloads last month - 92 stars on GitHub - 1 maintainer
jupyterlab-sparkmonitor 4.1.0
Spark Monitor Extension for Jupyter Lab14 versions - Latest release: over 3 years ago - 1 dependent repositories - 411 downloads last month - 92 stars on GitHub - 1 maintainer
pyspark-sugar 0.4.1
SparkUI enchancements with pyspark4 versions - Latest release: about 6 years ago - 1 dependent repositories - 22.3 thousand downloads last month - 5 stars on GitHub - 1 maintainer
pyspark3d 0.3.1
Spark extension for processing large-scale 3D data sets4 versions - Latest release: over 6 years ago - 1 dependent repositories - 154 downloads last month - 30 stars on GitHub - 1 maintainer
pyspark_db_utils 0.0.7
Usefull functions for working with Database in PySpark (PostgreSQL, ClickHouse)6 versions - Latest release: almost 7 years ago - 274 downloads last month - 8 stars on GitHub - 1 maintainer
ydot 0.0.6 💰
R-like formulas for Spark Dataframes6 versions - Latest release: over 4 years ago - 1 dependent repositories - 134 downloads last month - 10 stars on GitHub - 1 maintainer
fink-science 7.4.0
User-defined science module for the Fink broker.64 versions - Latest release: 28 days ago - 1 dependent package - 5 dependent repositories - 1.78 thousand downloads last month - 11 stars on GitHub - 1 maintainer
getallcolumnname 0.2
getallcolumns2 versions - Latest release: 10 months ago - 100 downloads last month - 1 maintainer
spark-connect-proxy 0.0.11
A reverse proxy server which allows secure connectivity to a Spark Connect server8 versions - Latest release: 3 months ago - 272 downloads last month - 9 stars on GitHub - 1 maintainer
google-dataproc-templates 0.1.0
Google Dataproc templates written in Python12 versions - Latest release: about 2 years ago - 276 downloads last month - 126 stars on GitHub - 3 maintainers
ibmaemagic 0.0.4
Make accessing IBM Analytic Engine easier.4 versions - Latest release: over 4 years ago - 1 dependent repositories - 165 downloads last month - 4 maintainers
pramen-py 1.11.2
Pramen transformations written in python51 versions - Latest release: about 2 months ago - 1.51 thousand downloads last month - 24 stars on GitHub - 3 maintainers
spark-xarray 0.1.dev0
This is an experimental project that seeks to integrate PySpark and xarray for Climate Data Analy...1 version - Latest release: over 1 year ago - 354 downloads last month - 22 stars on GitHub - 1 maintainer
databricks-bridge 0.0.4
Databricks read and write with sql connection4 versions - Latest release: about 1 year ago - 161 downloads last month - 1 maintainer
sparkpolars 0.1.0
Conversion between PySpark and Polars DataFrames11 versions - Latest release: 2 months ago - 614 downloads last month - 2 stars on GitHub - 1 maintainer
pyspark-supp 0.1.0
Data Engineer Support PySpark Library2 versions - Latest release: almost 2 years ago - 75 downloads last month - 1 stars on GitHub - 1 maintainer
data-manipulation 0.48
Powerful data manipulation43 versions - Latest release: 5 months ago - 1 dependent repositories - 1.03 thousand downloads last month - 2 stars on GitHub - 1 maintainer
Top 2.3% on pypi.org
86 versions - Latest release: over 2 years ago - 4 dependent packages - 26 dependent repositories - 179 thousand downloads last month - 1,828 stars on GitHub - 2 maintainers
petastorm 0.12.1
Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Py...86 versions - Latest release: over 2 years ago - 4 dependent packages - 26 dependent repositories - 179 thousand downloads last month - 1,828 stars on GitHub - 2 maintainers
firespark 0.0.32
FireSpark data processing utility library16 versions - Latest release: almost 5 years ago - 1 dependent repositories - 562 downloads last month - 1,828 stars on GitHub - 1 maintainer
hops-petastorm 0.9.4
Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Py...6 versions - Latest release: over 4 years ago - 1 dependent repositories - 156 downloads last month - 1,828 stars on GitHub - 3 maintainers
sparkorm 1.2.29
SparkORM: Python Spark SQL & DataFrame schema management and basic Object Relational Mapping.33 versions - Latest release: 10 months ago - 367 thousand downloads last month - 14 stars on GitHub - 1 maintainer
sparkpickle 1.0.1
Provides functions for reading SequenceFile-s with Python pickles.2 versions - Latest release: over 8 years ago - 1 dependent repositories - 4.95 thousand downloads last month - 24 stars on GitHub - 1 maintainer
typed-pyspark 0.0.5
Contains a set of abstractions to type annotate and validate dataframes in pyspark5 versions - Latest release: about 3 years ago - 1 dependent package - 21.3 thousand downloads last month - 14 stars on GitHub - 3 maintainers
typedspark 1.5.3
Column-wise type annotations for pyspark DataFrames44 versions - Latest release: 10 days ago - 1 dependent package - 59.5 thousand downloads last month - 74 stars on GitHub - 1 maintainer
fink-filters 0.2.18
User-defined filters for the Fink broker.88 versions - Latest release: over 3 years ago - 3 dependent repositories - 2.44 thousand downloads last month - 1 stars on GitHub - 1 maintainer
sparglim 0.2.1 💰
sparglim16 versions - Latest release: about 1 year ago - 1 dependent package - 1 dependent repositories - 365 downloads last month - 37 stars on GitHub - 1 maintainer
pysparkdt 1.0.1
An open-source Python library for simplifying local testing of Databricks workflows that use PySp...2 versions - Latest release: 4 months ago - 90 downloads last month - 27 stars on GitHub - 1 maintainer
Top 8.6% on pypi.org
54 versions - Latest release: 10 days ago - 1 dependent repositories - 85 thousand downloads last month - 268 stars on GitHub - 1 maintainer
butterfree 1.7.2
A tool for building feature stores - Transform your raw data into beautiful features.54 versions - Latest release: 10 days ago - 1 dependent repositories - 85 thousand downloads last month - 268 stars on GitHub - 1 maintainer
pyspark-event-correlation 1.0.2
Event Correlation and Changing Detection Algorithm3 versions - Latest release: about 3 years ago - 1 dependent repositories - 135 downloads last month - 1 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-2.3 3.46.0.6.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark43 versions - Latest release: 5 months ago - 464 downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-3.5 0.0.0
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark10 versions - Latest release: over 1 year ago - 15.7 thousand downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-2.1 3.26.11
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark88 versions - Latest release: over 5 years ago - 913 downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-3.2 3.46.0.6.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark28 versions - Latest release: 5 months ago - 322 downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-2.2 3.36.1.5.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark22 versions - Latest release: over 2 years ago - 237 downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-3.0 3.46.0.6.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark43 versions - Latest release: 5 months ago - 433 downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-2.2 3.26.11
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark97 versions - Latest release: over 5 years ago - 996 downloads last month - 968 stars on GitHub - 1 maintainer
Top 5.2% on pypi.org
54 versions - Latest release: 5 months ago - 2 dependent repositories - 4.58 thousand downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-3.0 3.46.0.6.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark54 versions - Latest release: 5 months ago - 2 dependent repositories - 4.58 thousand downloads last month - 968 stars on GitHub - 1 maintainer
Top 8.8% on pypi.org
42 versions - Latest release: 5 months ago - 588 downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-3.1 3.46.0.6.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark42 versions - Latest release: 5 months ago - 588 downloads last month - 968 stars on GitHub - 1 maintainer
Top 3.7% on pypi.org
86 versions - Latest release: over 5 years ago - 12 dependent repositories - 36.3 thousand downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-2.4 3.26.11
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark86 versions - Latest release: over 5 years ago - 12 dependent repositories - 36.3 thousand downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-2.4 3.46.0.6.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark43 versions - Latest release: 5 months ago - 442 downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-3.4 0.0.2
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark15 versions - Latest release: almost 2 years ago - 3.35 thousand downloads last month - 968 stars on GitHub - 1 maintainer
Top 9.0% on pypi.org
25 versions - Latest release: 5 months ago - 27 thousand downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-3.3 3.46.0.6.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark25 versions - Latest release: 5 months ago - 27 thousand downloads last month - 968 stars on GitHub - 1 maintainer
Top 8.7% on pypi.org
29 versions - Latest release: 5 months ago - 11.3 thousand downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-3.2 3.46.0.6.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark29 versions - Latest release: 5 months ago - 11.3 thousand downloads last month - 968 stars on GitHub - 1 maintainer
Top 9.2% on pypi.org
107 versions - Latest release: over 5 years ago - 1.05 thousand downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-2.3 3.26.11
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark107 versions - Latest release: over 5 years ago - 1.05 thousand downloads last month - 968 stars on GitHub - 1 maintainer
Related Keywords
spark
128
python
83
machine-learning
39
big-data
34
scala
33
apache-spark
26
pandas
25
databricks
24
distributed
23
machine learning
23
big data
22
modeling
21
data mining
20
statistical analysis
20
parallel
20
h2o
20
integration
20
pysparkling
20
rsparkling
20
dataframe
19
data-science
17
etl
17
data
15
jupyter
13
data-engineering
13
polars
13
bigdata
11
lakehouse
10
python3
10
jupyter-notebook
10
scoring
10
delta-spark
9
deep-learning
9
aws
9
sql
9
dask
8
pandas-dataframe
8
notebook
8
magic
8
cluster
8
data-analysis
7
schema
7
pytorch
7
pyarrow
7
kerberos
7
kernel
7
livy
7
sql-query
7
tensorflow
6
azure
6
ai
6
mysql
6
test
6
testing
6
ipython
5
bigquery
5
pipelines
5
parquet
5
data-quality
5
analysis
5
emr
4
opencv
4
data-cleaning
4
pytest
4
onnx
4
duckdb
4
model-deployment
4
apachespark
4
clickhouse
4
impala
4
graphs
4
ml
4
microsoft
4
lightgbm
4
spark-sql
4
mssql
4
postgresql
4
http
4
snowflake
4
cognitive-services
4
distributed-computing
4
glue
4
workflow
4
framework
4
postgres
4
synapse
4
dataframes
4
datafusion
4
utils
4
learning
4
Spark
4
pipeline
4
gcp
3
transformers
3
machinelearning
3
apache
3
spine
3
library
3
dag
3
core
3