Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "pyspark" keyword

Top 5.7% on pypi.org
dbldatagen 0.3.6
Databricks Labs - PySpark Synthetic Data Generator
13 versions - Latest release: 3 months ago - 1 dependent package - 2 dependent repositories - 156 thousand downloads last month - 267 stars on GitHub - 1 maintainer
sparkdantic 0.20.5
A pydantic -> spark schema library
30 versions - Latest release: about 2 months ago - 32.6 thousand downloads last month - 267 stars on GitHub - 1 maintainer
pyspark-asyncactions 0.0.4 💰
A proof of concept asynchronous actions for PySpark using concurent.futures
4 versions - Latest release: over 3 years ago - 1 dependent repositories - 5.19 thousand downloads last month - 44 stars on GitHub - 2 maintainers
sparkora 0.0.1
Exploratory data analysis toolkit for Pyspark
1 version - Latest release: over 2 years ago - 1 dependent repositories - 16 downloads last month - 53 stars on GitHub - 2 maintainers
fiware-pyspark-connector 0.0.10
Connects FIWARE Context Brokers with fiware_pyspark_connector
5 versions - Latest release: 10 months ago - 33 downloads last month - 2 stars on GitHub - 1 maintainer
Top 3.3% on pypi.org
synapseml 1.0.4
Synapse Machine Learning
16 versions - Latest release: 29 days ago - 2 dependent packages - 3 dependent repositories - 235 thousand downloads last month - 4,975 stars on GitHub - 2 maintainers
Top 2.7% on pypi.org
datacompy 0.12.0
Dataframe comparison in Python
25 versions - Latest release: 8 days ago - 7 dependent packages - 16 dependent repositories - 686 thousand downloads last month - 389 stars on GitHub - 8 maintainers
databricks-utils 0.0.7
Ease-of-use utility tools for databricks notebooks.
6 versions - Latest release: almost 6 years ago - 1 dependent repositories - 296 thousand downloads last month - 1 stars on GitHub - 2 maintainers
graphlet 0.1.1
Graphlet AI Knowledge Graph Factory
1 version - Latest release: almost 2 years ago - 12 downloads last month - 27 stars on GitHub - 1 maintainer
spark-emr 0.1.2
Run python packages on AWS EMR
2 versions - Latest release: about 5 years ago - 1 dependent repositories - 20 downloads last month - 0 stars on GitHub - 2 maintainers
Top 2.3% on pypi.org
petastorm 0.12.1
Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Py...
86 versions - Latest release: over 1 year ago - 4 dependent packages - 26 dependent repositories - 36.9 thousand downloads last month - 1,752 stars on GitHub - 2 maintainers
firespark 0.0.32
FireSpark data processing utility library
16 versions - Latest release: almost 4 years ago - 1 dependent repositories - 140 downloads last month - 1,752 stars on GitHub - 2 maintainers
hops-petastorm 0.9.4
Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Py...
6 versions - Latest release: over 3 years ago - 1 dependent repositories - 93 downloads last month - 1,752 stars on GitHub - 3 maintainers
Top 6.7% on pypi.org
handyspark 0.2.2a1
HandySpark - bringing pandas-like capabilities to Spark dataframes
7 versions - Latest release: almost 5 years ago - 3 dependent repositories - 156 thousand downloads last month - 183 stars on GitHub - 2 maintainers
fifeforspark 0.0.2
Finite-Interval Forecasting Engine for Spark: Machine learning models for discrete-time survival ...
2 versions - Latest release: about 2 years ago - 1 dependent repositories - 55 downloads last month - 3 stars on GitHub - 6 maintainers
google-dataproc-templates 0.1.0
Google Dataproc templates written in Python
8 versions - Latest release: about 1 year ago - 29.6 thousand downloads last month - 111 stars on GitHub - 3 maintainers
turntable-spoonbill 9.0.5
Productivity-centric Python Big Data Framework
4 versions - Latest release: about 1 month ago - 106 downloads last month - 4,254 stars on GitHub - 2 maintainers
Top 6.8% on pypi.org
dataproc-templates 0.0.2 removed
Dataproc templates written in Python
2 versions - Latest release: over 1 year ago - 40 stars on GitHub
Top 1.4% on pypi.org
ibis-framework 9.0.0
The portable Python dataframe library
90 versions - Latest release: 9 days ago - 13 dependent packages - 130 dependent repositories - 191 thousand downloads last month - 3,333 stars on GitHub - 6 maintainers
Top 9.4% on pypi.org
pyoptimus 0.1.0
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.
32 versions - Latest release: over 1 year ago - 1 dependent repositories - 346 downloads last month - 1,441 stars on GitHub - 2 maintainers
repartipy 0.1.8
Helper for handling PySpark DataFrame partition size 📑🎛️
6 versions - Latest release: 2 months ago - 505 downloads last month - 2 stars on GitHub - 2 maintainers
jsonspark 0.0.2
This is a wrapper package for pyspark to process json files. It pythonifies the json pyspark object.
2 versions - Latest release: almost 3 years ago - 1 dependent repositories - 26 downloads last month - 3 stars on GitHub - 2 maintainers
cuallee 0.10.1
Python library for data validation on DataFrame APIs including Snowflake/Snowpark, Apache/PySpark...
74 versions - Latest release: 9 days ago - 1 dependent package - 1 dependent repositories - 11.2 thousand downloads last month - 109 stars on GitHub - 2 maintainers
imnet 0.2.1
imNet: a Sequence Network Construction Toolkit
5 versions - Latest release: almost 4 years ago - 2 dependent repositories - 30 downloads last month - 16 stars on GitHub - 2 maintainers
Top 3.3% on pypi.org
quinn 0.10.3
Pyspark helper methods to maximize developer efficiency
16 versions - Latest release: 3 months ago - 3 dependent packages - 11 dependent repositories - 781 thousand downloads last month - 580 stars on GitHub - 2 maintainers
scopt 0.0.5
Calculate optimized properties of Spark configuration
5 versions - Latest release: about 2 years ago - 1 dependent repositories - 213 downloads last month - 5 stars on GitHub - 1 maintainer
pbspark 0.9.0
Convert between protobuf messages and pyspark dataframes
11 versions - Latest release: 11 months ago - 1 dependent repositories - 276 thousand downloads last month - 21 stars on GitHub - 1 maintainer
pramen-py 1.8.6
Pramen transformations written in python
28 versions - Latest release: 3 days ago - 295 downloads last month - 22 stars on GitHub - 5 maintainers
Top 1.4% on pypi.org
spark-nlp 5.3.3
John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML...
141 versions - Latest release: about 1 month ago - 31 dependent packages - 35 dependent repositories - 4.06 million downloads last month - 3,700 stars on GitHub - 3 maintainers
Top 10.0% on pypi.org
ckls-test-lib 4.2.7 removed
John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML...
1 version - Latest release: over 1 year ago - 85 downloads last month - 3,066 stars on GitHub - 1 maintainer
bpd 2.0.2
bpd
8 versions - Latest release: over 1 year ago - 140 downloads last month - 4 stars on GitHub - 2 maintainers
pysparkifier 0.9.1
Streamlined pyspark usage
3 versions - Latest release: over 1 year ago - 1 dependent repositories - 16 downloads last month - 1 stars on GitHub - 1 maintainer
fink-science 3.13.3
User-defined science module for the Fink broker.
62 versions - Latest release: over 1 year ago - 5 dependent repositories - 577 downloads last month - 10 stars on GitHub - 1 maintainer
pysparkgateway 0.0.22
Connect Pyspark to remote clusters
18 versions - Latest release: about 3 years ago - 1 dependent repositories - 18.1 thousand downloads last month - 3 stars on GitHub - 2 maintainers
compars 0.0.0
DataFrame comparison done right (AKA the Bear-agnostic DataFrame comparison library)
1 version - Latest release: 19 days ago - 156 downloads last month - 0 stars on GitHub - 2 maintainers
sparkorm 1.2.16
SparkORM: Python Spark SQL & DataFrame schema management and basic Object Relational Mapping.
20 versions - Latest release: about 1 month ago - 386 thousand downloads last month - 9 stars on GitHub - 2 maintainers
spark-pager 1.1.2
A Python library for sending notifications on Spark Job Status.
3 versions - Latest release: almost 3 years ago - 1 dependent repositories - 187 downloads last month - 0 stars on GitHub - 2 maintainers
dcborow-mmlspark 0.14.dev1
Microsoft ML for Spark
1 version - Latest release: about 4 years ago - 1 dependent repositories - 58 downloads last month - 4,972 stars on GitHub - 2 maintainers
finkfilters 0.1.4
User-defined filters for the Fink broker.
2 versions - Latest release: over 4 years ago - 1 dependent repositories - 7 downloads last month - 1 stars on GitHub - 2 maintainers
spark-map 0.2.78
Pyspark implementation of `map()` function for spark DataFrames
3 versions - Latest release: 12 months ago - 419 downloads last month - 1 stars on GitHub - 1 maintainer
tgedr-pycode 0.0.24
python handy code
23 versions - Latest release: 4 months ago - 130 downloads last month - 2 maintainers
artan 0.5.1
Online latent state estimation with Apache Spark.
9 versions - Latest release: over 3 years ago - 22 downloads last month - 5 stars on GitHub - 1 maintainer
pyspark-data-mocker 3.0.0
Mock a datalake easily to be able to test your pyspark data application
10 versions - Latest release: about 1 month ago - 72 downloads last month - 0 stars on GitHub - 2 maintainers
autofeatures 1.0.2
PySpark Auto Feature Selector
3 versions - Latest release: almost 4 years ago - 116 downloads last month - 8 stars on GitHub - 1 maintainer
Top 8.6% on pypi.org
spark-df-profiling-new 1.1.14
Create HTML profiling reports from Apache Spark DataFrames
1 version - Latest release: almost 3 years ago - 1 dependent repositories - 38.6 thousand downloads last month - 194 stars on GitHub - 2 maintainers
smartdriveml 0.0.12
SmartDriveML: Driving Industrial Projects with Affordable ML Solutions, Frameworks and Cloud Choices
4 versions - Latest release: 10 months ago - 15 downloads last month - 2 maintainers
pyspark-connectors 0.2.0
The easy and quickly way to connect and integrate the Spark project with many others data sources.
8 versions - Latest release: almost 2 years ago - 113 downloads last month - 5 stars on GitHub - 2 maintainers
codeme 0.1.9
CodeMe - Automatic Python Coder
20 versions - Latest release: over 1 year ago - 247 downloads last month - 1 stars on GitHub - 2 maintainers
spalah 1.0.6
Spalah is a set of PySpark dataframe helpers
12 versions - Latest release: 4 months ago - 19 downloads last month - 7 stars on GitHub - 2 maintainers
marshmallow-pyspark 0.2.4
PySpark data serializer
6 versions - Latest release: 4 months ago - 1 dependent repositories - 2.14 thousand downloads last month - 12 stars on GitHub - 2 maintainers
glue-utils 0.2.1
Reusable utilities for working with Glue PySpark jobs
10 versions - Latest release: 7 days ago - 473 downloads last month - 1 stars on GitHub - 2 maintainers
markdown-frames 1.0.6
Markdown tables parsing to pyspark / pandas DataFrames
7 versions - Latest release: over 1 year ago - 1 dependent repositories - 24.2 thousand downloads last month - 3 stars on GitHub - 6 maintainers
spark-scaffolder-transforms-tools 0.0.1
spark_scaffolder_transforms_tools
1 version - Latest release: about 1 month ago - 173 downloads last month - 2 maintainers
waterfall-logging 0.1.0
Waterfall statistic logging for data quality or filtering steps.
1 version - Latest release: about 1 year ago - 15 downloads last month - 2 stars on GitHub - 2 maintainers
tmlt-core 0.13.0
Tumult's differential privacy primitives
40 versions - Latest release: about 1 month ago - 5.72 thousand downloads last month - 10 stars on GitLab.com - 1 maintainer
sparkpickle 1.0.1
Provides functions for reading SequenceFile-s with Python pickles.
2 versions - Latest release: over 7 years ago - 1 dependent repositories - 5.3 thousand downloads last month - 24 stars on GitHub - 1 maintainer
pyspark-bucketmap 0.0.5
Easily group pyspark data into buckets and map them to different values.
4 versions - Latest release: over 1 year ago - 51 downloads last month - 1 stars on GitHub - 2 maintainers
pysparkplus 0.0.3
Pyspark extra functions!
3 versions - Latest release: about 1 year ago - 9 downloads last month - 2 stars on GitHub - 2 maintainers
pypair 3.0.9 💰
Pairwise association measures of statistical variable types
11 versions - Latest release: over 2 years ago - 1 dependent repositories - 954 downloads last month - 21 stars on GitHub - 2 maintainers
pyspark-testing 0.0.5
Testing Framework for PySpark
2 versions - Latest release: about 4 years ago - 1 dependent repositories - 179 downloads last month - 3 stars on GitHub - 2 maintainers
pyspark-sugar 0.4.1
SparkUI enchancements with pyspark
4 versions - Latest release: about 5 years ago - 1 dependent repositories - 27.1 thousand downloads last month - 5 stars on GitHub - 2 maintainers
Top 6.6% on pypi.org
spark-df-profiling 1.1.13
Create HTML profiling reports from Apache Spark DataFrames
13 versions - Latest release: over 7 years ago - 2 dependent repositories - 53.9 thousand downloads last month - 194 stars on GitHub - 2 maintainers
metaspore 1.1.0
Metaspore: A Unified End-to-end Machine Intelligence Platform
3 versions - Latest release: over 1 year ago - 10 downloads last month - 626 stars on GitHub - 4 maintainers
tidypyspark 0.0.1
dplyr for pyspark
1 version - Latest release: about 1 year ago - 17 downloads last month - 14 stars on GitHub - 4 maintainers
yummy 0.0.11
14 versions - Latest release: 3 months ago - 1 dependent repositories - 64 downloads last month - 30 stars on GitHub - 1 maintainer
ydot 0.0.6 💰
R-like formulas for Spark Dataframes
6 versions - Latest release: over 3 years ago - 1 dependent repositories - 37 downloads last month - 10 stars on GitHub - 2 maintainers
spark-lean 0.3.3
An interactive PySpark-based Data Cleaning Library
4 versions - Latest release: about 6 years ago - 1 dependent repositories - 20 downloads last month - 7 stars on GitHub - 4 maintainers
Top 9.9% on pypi.org
soda-spark 0.3.3
Soda SQL API for PySpark data frame
11 versions - Latest release: almost 2 years ago - 1 dependent package - 1 dependent repositories - 46.3 thousand downloads last month - 60 stars on GitHub - 2 maintainers
sourced-jgit-spark-connector 2.0.1
Engine to use Spark on top of source code repositories.
2 versions - Latest release: over 5 years ago - 1 dependent repositories - 26 downloads last month - 71 stars on GitHub - 2 maintainers
fink-filters 0.2.18
User-defined filters for the Fink broker.
65 versions - Latest release: over 2 years ago - 3 dependent repositories - 740 downloads last month - 1 stars on GitHub - 1 maintainer
Top 1.9% on pypi.org
autovizwidget 0.21.0
AutoVizWidget: An Auto-Visualization library for pandas dataframes
54 versions - Latest release: 8 months ago - 2 dependent packages - 93 dependent repositories - 99 thousand downloads last month - 1,286 stars on GitHub - 4 maintainers
Top 1.7% on pypi.org
hdijupyterutils 0.21.0
HdiJupyterUtils: Utils for Jupyter projects from HDInsight team
53 versions - Latest release: 8 months ago - 2 dependent packages - 92 dependent repositories - 99.8 thousand downloads last month - 1,286 stars on GitHub - 4 maintainers
Top 1.7% on pypi.org
sparkmagic 0.21.0
SparkMagic: Spark execution via Livy
56 versions - Latest release: 8 months ago - 4 dependent packages - 86 dependent repositories - 50.6 thousand downloads last month - 1,273 stars on GitHub - 5 maintainers
pyspark-supp 0.1.0
Data Engineer Support PySpark Library
2 versions - Latest release: 12 months ago - 266 downloads last month - 1 stars on GitHub - 2 maintainers
Top 2.9% on pypi.org
tdigest 0.4.0 💰
T-Digest data structure
14 versions - Latest release: almost 9 years ago - 13 dependent packages - 41 dependent repositories - 299 thousand downloads last month - 376 stars on GitHub - 1 maintainer
data-manipulation 0.37
Powerful data manipulation
34 versions - Latest release: 4 months ago - 1 dependent repositories - 279 downloads last month - 2 stars on GitHub - 2 maintainers
pagaya-mapinpandas 0.5
Easy python wrapper for Spark mapInPandas, applyInPandas
1 version - Latest release: almost 2 years ago - 9 downloads last month - 1 maintainer
pyspark-extension 2.12.0.3.5
A library that provides useful extensions to Apache Spark.
45 versions - Latest release: 13 days ago - 50.6 thousand downloads last month - 171 stars on GitHub - 2 maintainers
spinelibs 0.0.17
Libs for spine project
7 versions - Latest release: 4 months ago - 1 dependent package - 86 downloads last month - 2 stars on GitHub - 2 maintainers
ibmaemagic 0.0.4
Make accessing IBM Analytic Engine easier.
4 versions - Latest release: over 3 years ago - 1 dependent repositories - 30 downloads last month - 6 maintainers
pysparkutils 0.2.5
A collection of utilities for handling pySpark's SparkContext
14 versions - Latest release: over 6 years ago - 1 dependent repositories - 42 downloads last month - 2 stars on GitHub - 2 maintainers
exelog 0.0.1
Enabling meticulous logging for Spark Applications
1 version - Latest release: over 2 years ago - 1 dependent repositories - 18 downloads last month - 5 stars on GitHub - 2 maintainers
meteo-spark 0.1.0 removed 💰
A python package to process climate scientific files using pyspark.
3 versions - Latest release: over 2 years ago - 1 dependent repositories - 12 downloads last month - 0 stars on GitHub - 1 maintainer
scimple 1.11.5
Scimplify Ploting, Graph manipulation, Spark Streaming & Kafka and other tools
20 versions - Latest release: almost 4 years ago - 1 dependent repositories - 33 downloads last month - 1 stars on GitHub - 2 maintainers
spinecore 0.0.20
The core lib of spine library
10 versions - Latest release: 4 months ago - 2 dependent packages - 100 downloads last month - 2 stars on GitHub - 2 maintainers
pyspark-spy 1.0.2
Collect and aggregate on spark events for profitz. In 🐍 way!
3 versions - Latest release: about 3 years ago - 1 dependent repositories - 218 downloads last month - 10 stars on GitHub - 2 maintainers
sourced-spark-api 0.0.12
API to use Spark on top of source code repositories.
10 versions - Latest release: over 6 years ago - 1 dependent repositories - 35 downloads last month - 71 stars on GitHub - 2 maintainers
sparksnake 0.2.2
Improving the development of Spark applications deployed as jobs on AWS services like Glue and EMR
24 versions - Latest release: 10 months ago - 23.3 thousand downloads last month - 12 stars on GitHub - 2 maintainers
watson-transformer 0.0.17
wrap Watson API into pyspark transformers
15 versions - Latest release: over 2 years ago - 1 dependent repositories - 143 downloads last month - 2 maintainers
pyspine 0.0.14
Spine: The backbone of your project
3 versions - Latest release: about 1 year ago - 49 downloads last month - 2 stars on GitHub - 2 maintainers
Top 3.7% on pypi.org
pyspark-stubs 3.0.0 💰
A collection of the Apache Spark stub files
38 versions - Latest release: almost 4 years ago - 2 dependent packages - 146 dependent repositories - 211 thousand downloads last month - 114 stars on GitHub - 2 maintainers
microdrill 0.0.3
Simple Apache Drill alternative using PySpark
3 versions - Latest release: about 8 years ago - 2 dependent repositories - 13 downloads last month - 7 stars on GitHub - 3 maintainers
Top 7.7% on pypi.org
cluster-pack 0.3.7
A library on top of either pex or conda-packto make your Python code easily available on a cluster
44 versions - Latest release: 10 days ago - 2 dependent packages - 5 dependent repositories - 683 downloads last month - 43 stars on GitHub - 14 maintainers
typedspark 1.4.2
Column-wise type annotations for pyspark DataFrames
38 versions - Latest release: 10 days ago - 10.6 thousand downloads last month - 46 stars on GitHub - 2 maintainers
bigdatasml 0.1.3
This package calculates average student performances
3 versions - Latest release: over 2 years ago - 40 downloads last month - 2 maintainers
spark-xarray 0.1.dev0
This is an experimental project that seeks to integrate PySpark and xarray for Climate Data Analy...
1 version - Latest release: 5 months ago - 20 downloads last month - 22 stars on GitHub - 2 maintainers
namedframes 0.1.4
Named Data Frames
3 versions - Latest release: over 3 years ago - 1 dependent package - 1 dependent repositories - 36 downloads last month - 0 stars on GitHub - 2 maintainers
spark-eda 0.0.2
Exploratory data analysis for pyspark
1 version - Latest release: over 2 years ago - 1 dependent repositories - 14 downloads last month - 0 stars on GitHub - 1 maintainer
sparglim 0.2.1 💰
sparglim
16 versions - Latest release: 3 months ago - 1 dependent package - 1 dependent repositories - 95 downloads last month - 28 stars on GitHub - 2 maintainers
mack 0.5.0
Delta Lake helper methods in PySpark
5 versions - Latest release: 3 months ago - 9.8 thousand downloads last month - 265 stars on GitHub - 2 maintainers