An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "pyspark" keyword

View the packages on the pypi.org package registry that are tagged with the "pyspark" keyword.

pysail 0.2.4
Sail Python library
15 versions - Latest release: 8 days ago - 1.7 thousand downloads last month - 12 stars on GitHub - 1 maintainer
koheesio 0.10.2
The steps-based Koheesio framework
18 versions - Latest release: 17 days ago - 316 thousand downloads last month - 634 stars on GitHub - 1 maintainer
Top 1.9% on pypi.org
autovizwidget 0.22.0
AutoVizWidget: An Auto-Visualization library for pandas dataframes
55 versions - Latest release: 5 months ago - 4 dependent packages - 93 dependent repositories - 335 thousand downloads last month - 1,314 stars on GitHub - 4 maintainers
Top 1.4% on pypi.org
spark-nlp 5.5.3
John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML...
151 versions - Latest release: 3 months ago - 35 dependent packages - 35 dependent repositories - 4.22 million downloads last month - 3,717 stars on GitHub - 3 maintainers
metaspore 1.1.0
Metaspore: A Unified End-to-end Machine Intelligence Platform
3 versions - Latest release: over 2 years ago - 160 downloads last month - 533 stars on GitHub - 2 maintainers
h3spark 0.1.6
Lightweight pyspark wrapper for h3-py
12 versions - Latest release: 2 months ago - 399 downloads last month - 1 maintainer
Top 2.7% on pypi.org
datacompy 0.16.5
Dataframe comparison in Python
42 versions - Latest release: 4 days ago - 7 dependent packages - 16 dependent repositories - 1.13 million downloads last month - 399 stars on GitHub - 4 maintainers
pysparkproxy 0.0.17
Seamlessly execute pyspark code on remote clusters
9 versions - Latest release: over 6 years ago - 1 dependent repositories - 253 downloads last month - 5 stars on GitHub - 1 maintainer
flake8-pyspark-with-column 0.0.5
A Flake8 plugin to check for PySpark withColumn usage in loops
4 versions - Latest release: 3 months ago - 1.69 thousand downloads last month - 27 stars on GitHub - 1 maintainer
graphlet 0.1.1
Graphlet AI Knowledge Graph Factory
1 version - Latest release: over 2 years ago - 59 downloads last month - 28 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
chispa 0.11.1
Pyspark test helper library
23 versions - Latest release: 7 days ago - 7 dependent packages - 54 dependent repositories - 2.16 million downloads last month - 606 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
hdijupyterutils 0.22.0
HdiJupyterUtils: Utils for Jupyter projects from HDInsight team
54 versions - Latest release: 5 months ago - 3 dependent packages - 92 dependent repositories - 328 thousand downloads last month - 1,326 stars on GitHub - 4 maintainers
Top 1.7% on pypi.org
sparkmagic 0.22.0
SparkMagic: Spark execution via Livy
57 versions - Latest release: 5 months ago - 4 dependent packages - 86 dependent repositories - 32.2 thousand downloads last month - 1,326 stars on GitHub - 5 maintainers
Top 9.4% on pypi.org
pyoptimus 0.1.0
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.
32 versions - Latest release: over 2 years ago - 1 dependent repositories - 743 downloads last month - 1,447 stars on GitHub - 2 maintainers
dapter 0.1.1
Tool to adapt multiple dataframes to one unique format
1 version - Latest release: 8 months ago - 46 downloads last month - 916 stars on GitHub - 1 maintainer
turntable-spoonbill 10.0.5
Productivity-centric Python Big Data Framework
9 versions - Latest release: 4 months ago - 348 downloads last month - 5,698 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
ibis-framework 10.4.0
The portable Python dataframe library
139 versions - Latest release: 23 days ago - 25 dependent packages - 130 dependent repositories - 461 thousand downloads last month - 3,333 stars on GitHub - 6 maintainers
autofeatures 1.0.2
PySpark Auto Feature Selector
3 versions - Latest release: almost 5 years ago - 179 downloads last month - 9 stars on GitHub - 1 maintainer
glue-utils 0.9.2
Reusable utilities for working with Glue PySpark jobs
29 versions - Latest release: 1 day ago - 10.6 thousand downloads last month - 6 stars on GitHub - 1 maintainer
exelog 0.0.1
Enabling meticulous logging for Spark Applications
1 version - Latest release: over 3 years ago - 1 dependent repositories - 60 downloads last month - 5 stars on GitHub - 1 maintainer
Top 4.8% on pypi.org
optimuspyspark 2.2.32
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion wi...
83 versions - Latest release: almost 5 years ago - 8 dependent repositories - 5.85 thousand downloads last month - 1,501 stars on GitHub - 2 maintainers
codeme 0.1.9
CodeMe - Automatic Python Coder
20 versions - Latest release: over 2 years ago - 450 downloads last month - 1 stars on GitHub - 1 maintainer
dataflat 2.0.0
A library to flatten nested data.
8 versions - Latest release: 7 months ago - 653 downloads last month - 11 stars on GitHub - 1 maintainer
bigdatasml 0.1.3
This package calculates average student performances
3 versions - Latest release: over 3 years ago - 138 downloads last month - 1 maintainer
yummy 0.0.11
14 versions - Latest release: about 1 year ago - 1 dependent repositories - 312 downloads last month - 34 stars on GitHub - 1 maintainer
hermione-databricks 1.0.7
Tool to create ML project structure inside the databricks framework
18 versions - Latest release: over 4 years ago - 1 dependent repositories - 511 downloads last month - 4 stars on GitHub - 1 maintainer
pyspark-eda 1.6.0
A Python package for univariate ,bivariate and multivariate data analysis using PySpark
23 versions - Latest release: 10 months ago - 820 downloads last month - 1 maintainer
Top 3.3% on pypi.org
synapseml 1.0.11
Synapse Machine Learning
23 versions - Latest release: 2 days ago - 2 dependent packages - 3 dependent repositories - 613 thousand downloads last month - 5,119 stars on GitHub - 1 maintainer
pyspark-spy 1.0.2
Collect and aggregate on spark events for profitz. In 🐍 way!
3 versions - Latest release: about 4 years ago - 1 dependent repositories - 333 downloads last month - 10 stars on GitHub - 1 maintainer
spark-lean 0.3.3
An interactive PySpark-based Data Cleaning Library
4 versions - Latest release: almost 7 years ago - 1 dependent repositories - 79 downloads last month - 7 stars on GitHub - 2 maintainers
scimple 1.11.5
Scimplify Ploting, Graph manipulation, Spark Streaming & Kafka and other tools
20 versions - Latest release: almost 5 years ago - 1 dependent repositories - 384 downloads last month - 1 stars on GitHub - 1 maintainer
mmtfpyspark 0.3.6
Methods for parallel and distributed analysis and mining of the Protein Data Bank using MMTF and ...
10 versions - Latest release: about 6 years ago - 1 dependent repositories - 317 downloads last month - 68 stars on GitHub - 2 maintainers
openaivec 0.6.0
Generative mutation for tabular calculation
30 versions - Latest release: 3 days ago - 1.9 thousand downloads last month - 9 stars on GitHub - 1 maintainer
jsonspark 0.0.2
This is a wrapper package for pyspark to process json files. It pythonifies the json pyspark object.
2 versions - Latest release: almost 4 years ago - 1 dependent repositories - 77 downloads last month - 4 stars on GitHub - 1 maintainer
sparkql 0.10.0
sparkql: Apache Spark SQL DataFrame schema management for sensible humans
20 versions - Latest release: over 1 year ago - 1 dependent repositories - 7.03 thousand downloads last month - 14 stars on GitHub - 1 maintainer
pyspark-typedschema 0.0.6
Define (typed) schemas for pyspark dataframes
6 versions - Latest release: 3 months ago - 229 downloads last month - 1 stars on GitHub - 1 maintainer
pyspark-val 0.1.4
PySpark validation & testing tooling
5 versions - Latest release: about 1 year ago - 219 downloads last month - 0 stars on GitHub - 1 maintainer
compars 0.0.0
DataFrame comparison done right (AKA the Bear-agnostic DataFrame comparison library)
1 version - Latest release: 12 months ago - 58 downloads last month - 0 stars on GitHub - 1 maintainer
sysxtract 1.0.0
Extract logs based off events from sysmon. Comes as a package, cli and ui.
1 version - Latest release: almost 5 years ago - 1 dependent repositories - 48 downloads last month - 3 stars on GitHub - 1 maintainer
hlink 4.1.0
Fast supervised pyspark record linkage software
24 versions - Latest release: 4 days ago - 637 downloads last month - 12 stars on GitHub - 4 maintainers
pyspark-connectors 0.3.0
The easy and quickly way to connect and integrate the Spark project with many others data sources.
9 versions - Latest release: 10 months ago - 281 downloads last month - 6 stars on GitHub - 1 maintainer
pbspark 0.9.0
Convert between protobuf messages and pyspark dataframes
11 versions - Latest release: almost 2 years ago - 1 dependent repositories - 222 thousand downloads last month - 23 stars on GitHub - 1 maintainer
spark-map 0.2.78
Pyspark implementation of `map()` function for spark DataFrames
3 versions - Latest release: almost 2 years ago - 318 downloads last month - 1 stars on GitHub - 1 maintainer
pyspark_dfreport 0.1
Simple Python Package to save (small) PySpark DataFrames to one Excel File.
1 version - Latest release: over 8 years ago - 24 downloads last month - 0 stars on GitHub - 1 maintainer
Top 8.9% on pypi.org
pyspark-hnsw 1.1.0
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World gr...
30 versions - Latest release: over 2 years ago - 1 dependent repositories - 103 thousand downloads last month - 266 stars on GitHub - 1 maintainer
hnswlib-spark 0.0.0
Pyspark module for hnswlib
10 versions - Latest release: 2 months ago - 422 downloads last month - 266 stars on GitHub - 1 maintainer
scopt 0.0.5
Calculate optimized properties of Spark configuration
5 versions - Latest release: about 3 years ago - 1 dependent repositories - 571 downloads last month - 5 stars on GitHub - 1 maintainer
dbloy 0.3.0
Continuous Delivery tool for PySpark Notebooks based jobs on Databricks.
1 version - Latest release: over 5 years ago - 1 dependent repositories - 37 downloads last month - 1 stars on GitHub - 1 maintainer
pyspark-easy 1.5
Makes pyspark dataframe exploration easy
10 versions - Latest release: about 4 years ago - 1 dependent repositories - 216 downloads last month - 1 stars on GitHub - 1 maintainer
sparkhpc 0.1
spark deployment on hpc resources made easy
11 versions - Latest release: over 1 year ago - 1 dependent repositories - 202 downloads last month - 1 maintainer
cuallee 0.15.2
Python library for data validation on DataFrame APIs including Snowflake/Snowpark, Apache/PySpark...
94 versions - Latest release: 4 months ago - 1 dependent package - 1 dependent repositories - 47.1 thousand downloads last month - 185 stars on GitHub - 2 maintainers
pyspark-asyncactions 0.0.4 💰
A proof of concept asynchronous actions for PySpark using concurent.futures
4 versions - Latest release: over 4 years ago - 1 dependent repositories - 5.57 thousand downloads last month - 47 stars on GitHub - 1 maintainer
spark-scaffolder-transforms-tools 0.0.1
spark_scaffolder_transforms_tools
1 version - Latest release: about 1 year ago - 46 downloads last month - 1 maintainer
pagaya-mapinpandas 0.5
Easy python wrapper for Spark mapInPandas, applyInPandas
1 version - Latest release: almost 3 years ago - 30 downloads last month - 1 maintainer
replay-rec 0.18.1
RecSys Library
25 versions - Latest release: about 1 month ago - 1 dependent package - 1 dependent repositories - 2.71 thousand downloads last month - 137 stars on GitHub - 1 maintainer
pyspark-model-plus 1.0.2
Enhancements to commonly used pyspark functions for building models
2 versions - Latest release: about 4 years ago - 1 dependent repositories - 108 downloads last month - 0 stars on GitHub - 1 maintainer
tinsel 0.3.0
PySpark schema generator
3 versions - Latest release: over 6 years ago - 1 dependent repositories - 237 thousand downloads last month - 1 maintainer
Top 9.0% on pypi.org
jupyterlab-sparkmonitor 4.1.0
Spark Monitor Extension for Jupyter Lab
14 versions - Latest release: over 3 years ago - 1 dependent repositories - 411 downloads last month - 92 stars on GitHub - 1 maintainer
pyspark-sugar 0.4.1
SparkUI enchancements with pyspark
4 versions - Latest release: about 6 years ago - 1 dependent repositories - 22.3 thousand downloads last month - 5 stars on GitHub - 1 maintainer
pyspark3d 0.3.1
Spark extension for processing large-scale 3D data sets
4 versions - Latest release: over 6 years ago - 1 dependent repositories - 154 downloads last month - 30 stars on GitHub - 1 maintainer
pyspark_db_utils 0.0.7
Usefull functions for working with Database in PySpark (PostgreSQL, ClickHouse)
6 versions - Latest release: almost 7 years ago - 274 downloads last month - 8 stars on GitHub - 1 maintainer
ydot 0.0.6 💰
R-like formulas for Spark Dataframes
6 versions - Latest release: over 4 years ago - 1 dependent repositories - 134 downloads last month - 10 stars on GitHub - 1 maintainer
fink-science 7.4.0
User-defined science module for the Fink broker.
64 versions - Latest release: 28 days ago - 1 dependent package - 5 dependent repositories - 1.78 thousand downloads last month - 11 stars on GitHub - 1 maintainer
getallcolumnname 0.2
getallcolumns
2 versions - Latest release: 10 months ago - 100 downloads last month - 1 maintainer
spark-connect-proxy 0.0.11
A reverse proxy server which allows secure connectivity to a Spark Connect server
8 versions - Latest release: 3 months ago - 272 downloads last month - 9 stars on GitHub - 1 maintainer
google-dataproc-templates 0.1.0
Google Dataproc templates written in Python
12 versions - Latest release: about 2 years ago - 276 downloads last month - 126 stars on GitHub - 3 maintainers
ibmaemagic 0.0.4
Make accessing IBM Analytic Engine easier.
4 versions - Latest release: over 4 years ago - 1 dependent repositories - 165 downloads last month - 4 maintainers
pramen-py 1.11.2
Pramen transformations written in python
51 versions - Latest release: about 2 months ago - 1.51 thousand downloads last month - 24 stars on GitHub - 3 maintainers
spark-xarray 0.1.dev0
This is an experimental project that seeks to integrate PySpark and xarray for Climate Data Analy...
1 version - Latest release: over 1 year ago - 354 downloads last month - 22 stars on GitHub - 1 maintainer
databricks-bridge 0.0.4
Databricks read and write with sql connection
4 versions - Latest release: about 1 year ago - 161 downloads last month - 1 maintainer
sparkpolars 0.1.0
Conversion between PySpark and Polars DataFrames
11 versions - Latest release: 2 months ago - 614 downloads last month - 2 stars on GitHub - 1 maintainer
pyspark-supp 0.1.0
Data Engineer Support PySpark Library
2 versions - Latest release: almost 2 years ago - 75 downloads last month - 1 stars on GitHub - 1 maintainer
data-manipulation 0.48
Powerful data manipulation
43 versions - Latest release: 5 months ago - 1 dependent repositories - 1.03 thousand downloads last month - 2 stars on GitHub - 1 maintainer
Top 2.3% on pypi.org
petastorm 0.12.1
Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Py...
86 versions - Latest release: over 2 years ago - 4 dependent packages - 26 dependent repositories - 179 thousand downloads last month - 1,828 stars on GitHub - 2 maintainers
firespark 0.0.32
FireSpark data processing utility library
16 versions - Latest release: almost 5 years ago - 1 dependent repositories - 562 downloads last month - 1,828 stars on GitHub - 1 maintainer
hops-petastorm 0.9.4
Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Py...
6 versions - Latest release: over 4 years ago - 1 dependent repositories - 156 downloads last month - 1,828 stars on GitHub - 3 maintainers
sparkorm 1.2.29
SparkORM: Python Spark SQL & DataFrame schema management and basic Object Relational Mapping.
33 versions - Latest release: 10 months ago - 367 thousand downloads last month - 14 stars on GitHub - 1 maintainer
sparkpickle 1.0.1
Provides functions for reading SequenceFile-s with Python pickles.
2 versions - Latest release: over 8 years ago - 1 dependent repositories - 4.95 thousand downloads last month - 24 stars on GitHub - 1 maintainer
typed-pyspark 0.0.5
Contains a set of abstractions to type annotate and validate dataframes in pyspark
5 versions - Latest release: about 3 years ago - 1 dependent package - 21.3 thousand downloads last month - 14 stars on GitHub - 3 maintainers
typedspark 1.5.3
Column-wise type annotations for pyspark DataFrames
44 versions - Latest release: 10 days ago - 1 dependent package - 59.5 thousand downloads last month - 74 stars on GitHub - 1 maintainer
fink-filters 0.2.18
User-defined filters for the Fink broker.
88 versions - Latest release: over 3 years ago - 3 dependent repositories - 2.44 thousand downloads last month - 1 stars on GitHub - 1 maintainer
sparglim 0.2.1 💰
sparglim
16 versions - Latest release: about 1 year ago - 1 dependent package - 1 dependent repositories - 365 downloads last month - 37 stars on GitHub - 1 maintainer
pysparkdt 1.0.1
An open-source Python library for simplifying local testing of Databricks workflows that use PySp...
2 versions - Latest release: 4 months ago - 90 downloads last month - 27 stars on GitHub - 1 maintainer
Top 8.6% on pypi.org
butterfree 1.7.2
A tool for building feature stores - Transform your raw data into beautiful features.
54 versions - Latest release: 10 days ago - 1 dependent repositories - 85 thousand downloads last month - 268 stars on GitHub - 1 maintainer
pyspark-event-correlation 1.0.2
Event Correlation and Changing Detection Algorithm
3 versions - Latest release: about 3 years ago - 1 dependent repositories - 135 downloads last month - 1 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-2.3 3.46.0.6.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
43 versions - Latest release: 5 months ago - 464 downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-3.5 0.0.0
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
10 versions - Latest release: over 1 year ago - 15.7 thousand downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-2.1 3.26.11
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
88 versions - Latest release: over 5 years ago - 913 downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-3.2 3.46.0.6.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
28 versions - Latest release: 5 months ago - 322 downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-2.2 3.36.1.5.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
22 versions - Latest release: over 2 years ago - 237 downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-3.0 3.46.0.6.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
43 versions - Latest release: 5 months ago - 433 downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-2.2 3.26.11
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
97 versions - Latest release: over 5 years ago - 996 downloads last month - 968 stars on GitHub - 1 maintainer
Top 5.2% on pypi.org
h2o-pysparkling-3.0 3.46.0.6.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
54 versions - Latest release: 5 months ago - 2 dependent repositories - 4.58 thousand downloads last month - 968 stars on GitHub - 1 maintainer
Top 8.8% on pypi.org
h2o-pysparkling-scoring-3.1 3.46.0.6.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
42 versions - Latest release: 5 months ago - 588 downloads last month - 968 stars on GitHub - 1 maintainer
Top 3.7% on pypi.org
h2o-pysparkling-2.4 3.26.11
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
86 versions - Latest release: over 5 years ago - 12 dependent repositories - 36.3 thousand downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-2.4 3.46.0.6.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
43 versions - Latest release: 5 months ago - 442 downloads last month - 968 stars on GitHub - 1 maintainer
h2o-pysparkling-3.4 0.0.2
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
15 versions - Latest release: almost 2 years ago - 3.35 thousand downloads last month - 968 stars on GitHub - 1 maintainer
Top 9.0% on pypi.org
h2o-pysparkling-3.3 3.46.0.6.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
25 versions - Latest release: 5 months ago - 27 thousand downloads last month - 968 stars on GitHub - 1 maintainer
Top 8.7% on pypi.org
h2o-pysparkling-3.2 3.46.0.6.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
29 versions - Latest release: 5 months ago - 11.3 thousand downloads last month - 968 stars on GitHub - 1 maintainer
Top 9.2% on pypi.org
h2o-pysparkling-2.3 3.26.11
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
107 versions - Latest release: over 5 years ago - 1.05 thousand downloads last month - 968 stars on GitHub - 1 maintainer