Ecosyste.ms: Packages
An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.
pypi.org "pyspark" keyword
irmagic 0.1.0 💰
Intelligent Reliability Platform2 versions - Latest release: over 6 years ago - 1 dependent repositories - 24 downloads last month - 5,003 stars on GitHub - 1 maintainer
miss-lightgbm-mmlspark
Microsoft ML for Spark1 version - 4,986 stars on GitHub
Top 3.3% on pypi.org
16 versions - Latest release: about 2 months ago - 2 dependent packages - 3 dependent repositories - 230 thousand downloads last month - 4,985 stars on GitHub - 1 maintainer
synapseml 1.0.4
Synapse Machine Learning16 versions - Latest release: about 2 months ago - 2 dependent packages - 3 dependent repositories - 230 thousand downloads last month - 4,985 stars on GitHub - 1 maintainer
dcborow-mmlspark 0.14.dev1
Microsoft ML for Spark1 version - Latest release: over 4 years ago - 1 dependent repositories - 58 downloads last month - 4,972 stars on GitHub - 1 maintainer
nozberkman-mmlspark 1.0.0
Microsoft ML for Spark1 version - Latest release: over 2 years ago - 1 dependent repositories - 18 downloads last month - 4,472 stars on GitHub - 1 maintainer
turntable-spoonbill 10.0.0
Productivity-centric Python Big Data Framework5 versions - Latest release: 19 days ago - 239 downloads last month - 4,327 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
141 versions - Latest release: about 2 months ago - 35 dependent packages - 35 dependent repositories - 4.15 million downloads last month - 3,717 stars on GitHub - 3 maintainers
spark-nlp 5.3.3
John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML...141 versions - Latest release: about 2 months ago - 35 dependent packages - 35 dependent repositories - 4.15 million downloads last month - 3,717 stars on GitHub - 3 maintainers
Top 1.4% on pypi.org
94 versions - Latest release: about 1 month ago - 25 dependent packages - 130 dependent repositories - 186 thousand downloads last month - 3,333 stars on GitHub - 6 maintainers
ibis-framework 9.0.0
The portable Python dataframe library94 versions - Latest release: about 1 month ago - 25 dependent packages - 130 dependent repositories - 186 thousand downloads last month - 3,333 stars on GitHub - 6 maintainers
Top 10.0% on pypi.org
1 version - Latest release: over 1 year ago - 85 downloads last month - 3,066 stars on GitHub - 1 maintainer
ckls-test-lib 4.2.7 removed
John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML...1 version - Latest release: over 1 year ago - 85 downloads last month - 3,066 stars on GitHub - 1 maintainer
firespark 0.0.32
FireSpark data processing utility library16 versions - Latest release: about 4 years ago - 1 dependent repositories - 140 downloads last month - 1,752 stars on GitHub - 1 maintainer
hops-petastorm 0.9.4
Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Py...6 versions - Latest release: over 3 years ago - 1 dependent repositories - 93 downloads last month - 1,752 stars on GitHub - 3 maintainers
Top 2.3% on pypi.org
86 versions - Latest release: over 1 year ago - 4 dependent packages - 26 dependent repositories - 36.9 thousand downloads last month - 1,752 stars on GitHub - 2 maintainers
petastorm 0.12.1
Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Py...86 versions - Latest release: over 1 year ago - 4 dependent packages - 26 dependent repositories - 36.9 thousand downloads last month - 1,752 stars on GitHub - 2 maintainers
Top 9.4% on pypi.org
32 versions - Latest release: over 1 year ago - 1 dependent repositories - 240 downloads last month - 1,447 stars on GitHub - 2 maintainers
pyoptimus 0.1.0
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.32 versions - Latest release: over 1 year ago - 1 dependent repositories - 240 downloads last month - 1,447 stars on GitHub - 2 maintainers
Top 4.8% on pypi.org
83 versions - Latest release: almost 4 years ago - 8 dependent repositories - 10.7 thousand downloads last month - 1,441 stars on GitHub - 2 maintainers
optimuspyspark 2.2.32
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion wi...83 versions - Latest release: almost 4 years ago - 8 dependent repositories - 10.7 thousand downloads last month - 1,441 stars on GitHub - 2 maintainers
Top 1.9% on pypi.org
54 versions - Latest release: 9 months ago - 4 dependent packages - 93 dependent repositories - 86 thousand downloads last month - 1,289 stars on GitHub - 4 maintainers
autovizwidget 0.21.0
AutoVizWidget: An Auto-Visualization library for pandas dataframes54 versions - Latest release: 9 months ago - 4 dependent packages - 93 dependent repositories - 86 thousand downloads last month - 1,289 stars on GitHub - 4 maintainers
emr-serverless-customauth
EMR Serverless Custom Authenticator for spark magic kernel.1 version - 122 downloads last month - 1,289 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
53 versions - Latest release: 9 months ago - 3 dependent packages - 92 dependent repositories - 86.4 thousand downloads last month - 1,289 stars on GitHub - 4 maintainers
hdijupyterutils 0.21.0
HdiJupyterUtils: Utils for Jupyter projects from HDInsight team53 versions - Latest release: 9 months ago - 3 dependent packages - 92 dependent repositories - 86.4 thousand downloads last month - 1,289 stars on GitHub - 4 maintainers
Top 1.7% on pypi.org
56 versions - Latest release: 9 months ago - 4 dependent packages - 86 dependent repositories - 46.3 thousand downloads last month - 1,273 stars on GitHub - 5 maintainers
sparkmagic 0.21.0
SparkMagic: Spark execution via Livy56 versions - Latest release: 9 months ago - 4 dependent packages - 86 dependent repositories - 46.3 thousand downloads last month - 1,273 stars on GitHub - 5 maintainers
Top 8.8% on pypi.org
37 versions - Latest release: about 2 months ago - 1.49 thousand downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-3.1 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark37 versions - Latest release: about 2 months ago - 1.49 thousand downloads last month - 952 stars on GitHub - 1 maintainer
Top 5.2% on pypi.org
49 versions - Latest release: about 2 months ago - 2 dependent repositories - 8.52 thousand downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-3.0 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark49 versions - Latest release: about 2 months ago - 2 dependent repositories - 8.52 thousand downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-2.1 3.32.1.7-1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark7 versions - Latest release: over 2 years ago - 17 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-2.2 3.26.11
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark97 versions - Latest release: over 4 years ago - 351 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-3.4 0.0.2
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark10 versions - Latest release: about 1 year ago - 6.18 thousand downloads last month - 952 stars on GitHub - 1 maintainer
Top 9.2% on pypi.org
102 versions - Latest release: over 4 years ago - 524 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-2.3 3.26.11
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark102 versions - Latest release: over 4 years ago - 524 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-2.1 3.26.11
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark88 versions - Latest release: over 4 years ago - 394 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-2.4 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark38 versions - Latest release: about 2 months ago - 91 downloads last month - 952 stars on GitHub - 1 maintainer
Top 8.7% on pypi.org
24 versions - Latest release: about 2 months ago - 40.4 thousand downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-3.2 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark24 versions - Latest release: about 2 months ago - 40.4 thousand downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-3.2 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark23 versions - Latest release: about 2 months ago - 52 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-3.4 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark8 versions - Latest release: about 2 months ago - 185 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-3.3 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark19 versions - Latest release: about 2 months ago - 97 downloads last month - 952 stars on GitHub - 1 maintainer
Top 5.0% on pypi.org
37 versions - Latest release: about 2 months ago - 2 dependent repositories - 75.6 thousand downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-3.1 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark37 versions - Latest release: about 2 months ago - 2 dependent repositories - 75.6 thousand downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-3.5 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark4 versions - Latest release: about 2 months ago - 155 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-3.5 0.0.0
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark5 versions - Latest release: 8 months ago - 1.27 thousand downloads last month - 952 stars on GitHub - 1 maintainer
Top 9.0% on pypi.org
20 versions - Latest release: about 2 months ago - 32 thousand downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-3.3 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark20 versions - Latest release: about 2 months ago - 32 thousand downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-2.2 3.36.1.5.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark22 versions - Latest release: over 1 year ago - 71 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-3.0 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark38 versions - Latest release: about 2 months ago - 91 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-2.3 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark38 versions - Latest release: about 2 months ago - 93 downloads last month - 952 stars on GitHub - 1 maintainer
Top 3.7% on pypi.org
81 versions - Latest release: over 4 years ago - 12 dependent repositories - 11.2 thousand downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-2.4 3.26.11
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark81 versions - Latest release: over 4 years ago - 12 dependent repositories - 11.2 thousand downloads last month - 952 stars on GitHub - 1 maintainer
metaspore 1.1.0
Metaspore: A Unified End-to-end Machine Intelligence Platform3 versions - Latest release: over 1 year ago - 10 downloads last month - 626 stars on GitHub - 2 maintainers
Top 3.3% on pypi.org
16 versions - Latest release: 4 months ago - 3 dependent packages - 11 dependent repositories - 781 thousand downloads last month - 580 stars on GitHub - 1 maintainer
quinn 0.10.3
Pyspark helper methods to maximize developer efficiency16 versions - Latest release: 4 months ago - 3 dependent packages - 11 dependent repositories - 781 thousand downloads last month - 580 stars on GitHub - 1 maintainer
Top 2.7% on pypi.org
25 versions - Latest release: about 1 month ago - 7 dependent packages - 16 dependent repositories - 683 thousand downloads last month - 399 stars on GitHub - 4 maintainers
datacompy 0.12.0
Dataframe comparison in Python25 versions - Latest release: about 1 month ago - 7 dependent packages - 16 dependent repositories - 683 thousand downloads last month - 399 stars on GitHub - 4 maintainers
Top 2.9% on pypi.org
14 versions - Latest release: almost 9 years ago - 13 dependent packages - 41 dependent repositories - 259 thousand downloads last month - 376 stars on GitHub - 1 maintainer
tdigest 0.4.0 💰
T-Digest data structure14 versions - Latest release: almost 9 years ago - 13 dependent packages - 41 dependent repositories - 259 thousand downloads last month - 376 stars on GitHub - 1 maintainer
mack 0.5.0
Delta Lake helper methods in PySpark5 versions - Latest release: 4 months ago - 12.3 thousand downloads last month - 271 stars on GitHub - 1 maintainer
sparkdantic 0.20.5
A pydantic -> spark schema library34 versions - Latest release: 3 months ago - 33.7 thousand downloads last month - 270 stars on GitHub - 1 maintainer
Top 5.7% on pypi.org
13 versions - Latest release: 3 months ago - 1 dependent package - 2 dependent repositories - 171 thousand downloads last month - 270 stars on GitHub - 1 maintainer
dbldatagen 0.3.6
Databricks Labs - PySpark Synthetic Data Generator13 versions - Latest release: 3 months ago - 1 dependent package - 2 dependent repositories - 171 thousand downloads last month - 270 stars on GitHub - 1 maintainer
Top 8.6% on pypi.org
36 versions - Latest release: about 1 month ago - 1 dependent repositories - 78.5 thousand downloads last month - 268 stars on GitHub - 1 maintainer
butterfree 1.2.4
A tool for building feature stores - Transform your raw data into beautiful features.36 versions - Latest release: about 1 month ago - 1 dependent repositories - 78.5 thousand downloads last month - 268 stars on GitHub - 1 maintainer
Top 8.9% on pypi.org
30 versions - Latest release: over 1 year ago - 1 dependent repositories - 31.4 thousand downloads last month - 242 stars on GitHub - 1 maintainer
pyspark-hnsw 1.1.0
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World gr...30 versions - Latest release: over 1 year ago - 1 dependent repositories - 31.4 thousand downloads last month - 242 stars on GitHub - 1 maintainer
Top 6.6% on pypi.org
13 versions - Latest release: over 7 years ago - 2 dependent repositories - 53.9 thousand downloads last month - 194 stars on GitHub - 1 maintainer
spark-df-profiling 1.1.13
Create HTML profiling reports from Apache Spark DataFrames13 versions - Latest release: over 7 years ago - 2 dependent repositories - 53.9 thousand downloads last month - 194 stars on GitHub - 1 maintainer
Top 8.6% on pypi.org
1 version - Latest release: about 3 years ago - 1 dependent repositories - 38.6 thousand downloads last month - 194 stars on GitHub - 1 maintainer
spark-df-profiling-new 1.1.14
Create HTML profiling reports from Apache Spark DataFrames1 version - Latest release: about 3 years ago - 1 dependent repositories - 38.6 thousand downloads last month - 194 stars on GitHub - 1 maintainer
Top 6.7% on pypi.org
7 versions - Latest release: about 5 years ago - 3 dependent repositories - 156 thousand downloads last month - 183 stars on GitHub - 1 maintainer
handyspark 0.2.2a1
HandySpark - bringing pandas-like capabilities to Spark dataframes7 versions - Latest release: about 5 years ago - 3 dependent repositories - 156 thousand downloads last month - 183 stars on GitHub - 1 maintainer
pyspark-extension 2.12.0.3.5
A library that provides useful extensions to Apache Spark.45 versions - Latest release: about 1 month ago - 38.6 thousand downloads last month - 174 stars on GitHub - 1 maintainer
replay-rec 0.16.0
RecSys Library17 versions - Latest release: 3 months ago - 1 dependent package - 1 dependent repositories - 6.17 thousand downloads last month - 127 stars on GitHub - 1 maintainer
Top 3.7% on pypi.org
38 versions - Latest release: almost 4 years ago - 2 dependent packages - 146 dependent repositories - 187 thousand downloads last month - 114 stars on GitHub - 1 maintainer
pyspark-stubs 3.0.0 💰
A collection of the Apache Spark stub files38 versions - Latest release: almost 4 years ago - 2 dependent packages - 146 dependent repositories - 187 thousand downloads last month - 114 stars on GitHub - 1 maintainer
google-dataproc-templates 0.1.0
Google Dataproc templates written in Python8 versions - Latest release: about 1 year ago - 29.6 thousand downloads last month - 111 stars on GitHub - 3 maintainers
cuallee 0.10.2
Python library for data validation on DataFrame APIs including Snowflake/Snowpark, Apache/PySpark...76 versions - Latest release: 22 days ago - 1 dependent package - 1 dependent repositories - 11.6 thousand downloads last month - 111 stars on GitHub - 2 maintainers
Top 9.0% on pypi.org
14 versions - Latest release: almost 3 years ago - 1 dependent repositories - 10.3 thousand downloads last month - 91 stars on GitHub - 1 maintainer
jupyterlab-sparkmonitor 4.1.0
Spark Monitor Extension for Jupyter Lab14 versions - Latest release: almost 3 years ago - 1 dependent repositories - 10.3 thousand downloads last month - 91 stars on GitHub - 1 maintainer
Top 3.9% on pypi.org
15 versions - Latest release: over 4 years ago - 7 dependent packages - 71 dependent repositories - 290 thousand downloads last month - 82 stars on GitHub - 1 maintainer
pytest-spark 0.6.0
pytest plugin to run the tests with support of pyspark.15 versions - Latest release: over 4 years ago - 7 dependent packages - 71 dependent repositories - 290 thousand downloads last month - 82 stars on GitHub - 1 maintainer
Top 9.3% on pypi.org
8 versions - Latest release: over 1 year ago - 2 dependent repositories - 2.01 thousand downloads last month - 77 stars on GitHub - 1 maintainer
anovos 1.1.0
An Open Source tool for Feature Engineering in Machine Learning8 versions - Latest release: over 1 year ago - 2 dependent repositories - 2.01 thousand downloads last month - 77 stars on GitHub - 1 maintainer
sourced-spark-api 0.0.12
API to use Spark on top of source code repositories.10 versions - Latest release: over 6 years ago - 1 dependent repositories - 35 downloads last month - 71 stars on GitHub - 1 maintainer
sourced-jgit-spark-connector 2.0.1
Engine to use Spark on top of source code repositories.2 versions - Latest release: over 5 years ago - 1 dependent repositories - 26 downloads last month - 71 stars on GitHub - 1 maintainer
mmtfpyspark 0.3.6
Methods for parallel and distributed analysis and mining of the Protein Data Bank using MMTF and ...10 versions - Latest release: over 5 years ago - 1 dependent repositories - 41 downloads last month - 67 stars on GitHub - 2 maintainers
sparksteps 3.0.1
Workflow tool to launch Spark jobs on AWS EMR20 versions - Latest release: over 3 years ago - 1 dependent repositories - 266 downloads last month - 67 stars on GitHub - 2 maintainers
sparkly 2.8.2
Helpers & syntax sugar for PySpark.21 versions - Latest release: almost 4 years ago - 1 dependent repositories - 14.1 thousand downloads last month - 60 stars on GitHub - 3 maintainers
Top 9.9% on pypi.org
11 versions - Latest release: about 2 years ago - 1 dependent package - 1 dependent repositories - 46.3 thousand downloads last month - 60 stars on GitHub - 1 maintainer
soda-spark 0.3.3
Soda SQL API for PySpark data frame11 versions - Latest release: about 2 years ago - 1 dependent package - 1 dependent repositories - 46.3 thousand downloads last month - 60 stars on GitHub - 1 maintainer
sparkora 0.0.1
Exploratory data analysis toolkit for Pyspark1 version - Latest release: over 2 years ago - 1 dependent repositories - 16 downloads last month - 53 stars on GitHub - 1 maintainer
typedspark 1.4.2
Column-wise type annotations for pyspark DataFrames38 versions - Latest release: about 1 month ago - 1 dependent package - 11.9 thousand downloads last month - 52 stars on GitHub - 1 maintainer
pyspark-asyncactions 0.0.4 💰
A proof of concept asynchronous actions for PySpark using concurent.futures4 versions - Latest release: over 3 years ago - 1 dependent repositories - 5.19 thousand downloads last month - 44 stars on GitHub - 1 maintainer
Top 7.7% on pypi.org
44 versions - Latest release: about 1 month ago - 2 dependent packages - 5 dependent repositories - 1.27 thousand downloads last month - 43 stars on GitHub - 7 maintainers
cluster-pack 0.3.7
A library on top of either pex or conda-packto make your Python code easily available on a cluster44 versions - Latest release: about 1 month ago - 2 dependent packages - 5 dependent repositories - 1.27 thousand downloads last month - 43 stars on GitHub - 7 maintainers
Top 6.8% on pypi.org
2 versions - Latest release: over 1 year ago - 40 stars on GitHub
dataproc-templates 0.0.2 removed
Dataproc templates written in Python2 versions - Latest release: over 1 year ago - 40 stars on GitHub
pyjaws 0.1.7
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows10 versions - Latest release: 9 months ago - 1 dependent repositories - 46 downloads last month - 38 stars on GitHub - 1 maintainer
sparkdataset 1.0.0
Provides instant access to many popular datasets right from Pyspark (in dataframe structure).1 version - Latest release: over 2 years ago - 1 dependent repositories - 9 downloads last month - 34 stars on GitHub - 1 maintainer
checkengine 0.2.0 removed
Data-quality checks for PySpark1 version - Latest release: almost 3 years ago - 16 downloads last month - 30 stars on GitHub
yummy 0.0.11
14 versions - Latest release: 4 months ago - 1 dependent repositories - 64 downloads last month - 30 stars on GitHub - 1 maintainerpyspark3d 0.3.1
Spark extension for processing large-scale 3D data sets4 versions - Latest release: over 5 years ago - 1 dependent repositories - 62 downloads last month - 29 stars on GitHub - 1 maintainer
sparglim 0.2.1 💰
sparglim16 versions - Latest release: 4 months ago - 1 dependent package - 1 dependent repositories - 75 downloads last month - 28 stars on GitHub - 1 maintainer
graphlet 0.1.1
Graphlet AI Knowledge Graph Factory1 version - Latest release: almost 2 years ago - 12 downloads last month - 27 stars on GitHub - 1 maintainer
dummyrdd 0.1.2
A pure python mocked version of pyspark's rdd class11 versions - Latest release: about 7 years ago - 1 dependent repositories - 51 downloads last month - 27 stars on GitHub - 1 maintainer
dummy_spark 0.0.1
A pure python mocked version of pyspark's rdd class1 version - Latest release: almost 8 years ago - 32 downloads last month - 27 stars on GitHub - 1 maintainer
sparkpickle 1.0.1
Provides functions for reading SequenceFile-s with Python pickles.2 versions - Latest release: over 7 years ago - 1 dependent repositories - 5.3 thousand downloads last month - 24 stars on GitHub - 1 maintainer
spark-xarray 0.1.dev0
This is an experimental project that seeks to integrate PySpark and xarray for Climate Data Analy...1 version - Latest release: 6 months ago - 41 downloads last month - 22 stars on GitHub - 1 maintainer
pramen-py 1.8.8
Pramen transformations written in python30 versions - Latest release: 17 days ago - 401 downloads last month - 22 stars on GitHub - 3 maintainers
pbspark 0.9.0
Convert between protobuf messages and pyspark dataframes11 versions - Latest release: 12 months ago - 1 dependent repositories - 276 thousand downloads last month - 21 stars on GitHub - 1 maintainer
pypair 3.0.9 💰
Pairwise association measures of statistical variable types11 versions - Latest release: over 2 years ago - 1 dependent repositories - 954 downloads last month - 21 stars on GitHub - 1 maintainer
Top 9.6% on pypi.org
2 versions - Latest release: over 2 years ago - 5 dependent repositories - 169 thousand downloads last month - 21 stars on GitHub - 1 maintainer
pyspark-test 0.2.0
Check that left and right spark DataFrame are equal.2 versions - Latest release: over 2 years ago - 5 dependent repositories - 169 thousand downloads last month - 21 stars on GitHub - 1 maintainer
spetlr 5.1.6
A python ETL libRary (SPETLR) for Databricks powered by Apache SPark.53 versions - Latest release: 18 days ago - 1 dependent package - 22.4 thousand downloads last month - 18 stars on GitHub - 1 maintainer
mse 0.1.4
Make Structs Easy (MSE)3 versions - Latest release: about 4 years ago - 1 dependent repositories - 2.72 thousand downloads last month - 18 stars on GitHub - 1 maintainer
copybook 1.0.16
python copybook parser12 versions - Latest release: over 1 year ago - 1 dependent repositories - 29.9 thousand downloads last month - 17 stars on GitHub - 1 maintainer
tmlt-analytics 0.10.0
Tumult's differential privacy analytics API27 versions - Latest release: 16 days ago - 1.03 thousand downloads last month - 17 stars on GitLab.com - 1 maintainer
imnet 0.2.1
imNet: a Sequence Network Construction Toolkit5 versions - Latest release: almost 4 years ago - 2 dependent repositories - 30 downloads last month - 16 stars on GitHub - 1 maintainer
sparklanes 0.2.4
A lightweight framework to build and execute data processing pipelines in pyspark (Apache Spark's...5 versions - Latest release: over 5 years ago - 1 dependent repositories - 27 downloads last month - 16 stars on GitHub - 1 maintainer
tidypyspark 0.0.1
dplyr for pyspark1 version - Latest release: about 1 year ago - 17 downloads last month - 14 stars on GitHub - 2 maintainers
typed-pyspark 0.0.5
Contains a set of abstractions to type annotate and validate dataframes in pyspark5 versions - Latest release: about 2 years ago - 1 dependent package - 17.2 thousand downloads last month - 14 stars on GitHub - 3 maintainers
dot-connect 0.3.32
Improve your workflow efficiency by connecting to databases and cloud systems effortlessly.7 versions - Latest release: 9 months ago - 47 downloads last month - 13 stars on GitHub - 1 maintainer
sparkql 0.10.0
sparkql: Apache Spark SQL DataFrame schema management for sensible humans20 versions - Latest release: 9 months ago - 1 dependent repositories - 28.9 thousand downloads last month - 13 stars on GitHub - 1 maintainer
aws-insurancelake-etl 3.3.1
A CDK Python app for deploying ETL jobs that operate data pipelines for InsuranceLake in AWS8 versions - Latest release: 2 months ago - 32 downloads last month - 12 stars on GitHub - 1 maintainer
marshmallow-pyspark 0.2.4
PySpark data serializer6 versions - Latest release: 5 months ago - 1 dependent repositories - 2.14 thousand downloads last month - 12 stars on GitHub - 1 maintainer
sparksnake 0.2.2
Improving the development of Spark applications deployed as jobs on AWS services like Glue and EMR24 versions - Latest release: 11 months ago - 23.3 thousand downloads last month - 12 stars on GitHub - 1 maintainer
sparkorm 1.2.17
SparkORM: Python Spark SQL & DataFrame schema management and basic Object Relational Mapping.23 versions - Latest release: 18 days ago - 213 thousand downloads last month - 11 stars on GitHub - 1 maintainer
ydot 0.0.6 💰
R-like formulas for Spark Dataframes6 versions - Latest release: over 3 years ago - 1 dependent repositories - 37 downloads last month - 10 stars on GitHub - 1 maintainer
pyspark-spy 1.0.2
Collect and aggregate on spark events for profitz. In 🐍 way!3 versions - Latest release: about 3 years ago - 1 dependent repositories - 218 downloads last month - 10 stars on GitHub - 1 maintainer
Related Keywords
spark
115
python
73
machine-learning
37
big-data
33
scala
32
apache-spark
24
distributed
23
machine learning
23
pandas
22
big data
21
modeling
21
databricks
20
statistical analysis
20
parallel
20
h2o
20
integration
20
pysparkling
20
rsparkling
20
data mining
20
dataframe
18
data-science
16
data
11
bigdata
11
data-engineering
11
python3
10
polars
10
scoring
10
jupyter
10
deep-learning
9
aws
8
dask
8
etl
7
jupyter-notebook
7
data-analysis
7
pytorch
6
tensorflow
6
sql
6
azure
6
ai
6
cluster
5
magic
5
notebook
5
pandas-dataframe
5
data-quality
5
pyarrow
5
mysql
5
ipython
5
test
5
testing
4
graphs
4
parquet
4
Spark
4
data-cleaning
4
distributed-computing
4
workflow
4
kerberos
4
kernel
4
livy
4
sql-query
4
synapse
4
bigquery
4
pipeline
4
glue
4
pipelines
4
opencv
4
analysis
4
onnx
4
learning
4
schema
4
postgres
4
cognitive-services
4
http
4
lightgbm
4
microsoft
4
ml
4
model-deployment
4
mssql
3
impala
3
snowflake
3
postgresql
3
clickhouse
3
sqlalchemy
3
s3
3
core
3
compare
3
data science
3
gcp
3
report
3
emr
3
preprocessing
3
apache
3
deltalake
3
streaming
3
spinelibs
3
typing
3
spinecore
3
spine
3
library
3
framework
3
hadoop
3