Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "pyspark" keyword

irmagic 0.1.0 💰
Intelligent Reliability Platform
2 versions - Latest release: over 6 years ago - 1 dependent repositories - 24 downloads last month - 5,003 stars on GitHub - 1 maintainer
miss-lightgbm-mmlspark
Microsoft ML for Spark
1 version - 4,986 stars on GitHub
Top 3.3% on pypi.org
synapseml 1.0.4
Synapse Machine Learning
16 versions - Latest release: about 2 months ago - 2 dependent packages - 3 dependent repositories - 230 thousand downloads last month - 4,985 stars on GitHub - 1 maintainer
dcborow-mmlspark 0.14.dev1
Microsoft ML for Spark
1 version - Latest release: over 4 years ago - 1 dependent repositories - 58 downloads last month - 4,972 stars on GitHub - 1 maintainer
nozberkman-mmlspark 1.0.0
Microsoft ML for Spark
1 version - Latest release: over 2 years ago - 1 dependent repositories - 18 downloads last month - 4,472 stars on GitHub - 1 maintainer
turntable-spoonbill 10.0.0
Productivity-centric Python Big Data Framework
5 versions - Latest release: 19 days ago - 239 downloads last month - 4,327 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
spark-nlp 5.3.3
John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML...
141 versions - Latest release: about 2 months ago - 35 dependent packages - 35 dependent repositories - 4.15 million downloads last month - 3,717 stars on GitHub - 3 maintainers
Top 1.4% on pypi.org
ibis-framework 9.0.0
The portable Python dataframe library
94 versions - Latest release: about 1 month ago - 25 dependent packages - 130 dependent repositories - 186 thousand downloads last month - 3,333 stars on GitHub - 6 maintainers
Top 10.0% on pypi.org
ckls-test-lib 4.2.7 removed
John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML...
1 version - Latest release: over 1 year ago - 85 downloads last month - 3,066 stars on GitHub - 1 maintainer
firespark 0.0.32
FireSpark data processing utility library
16 versions - Latest release: about 4 years ago - 1 dependent repositories - 140 downloads last month - 1,752 stars on GitHub - 1 maintainer
hops-petastorm 0.9.4
Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Py...
6 versions - Latest release: over 3 years ago - 1 dependent repositories - 93 downloads last month - 1,752 stars on GitHub - 3 maintainers
Top 2.3% on pypi.org
petastorm 0.12.1
Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Py...
86 versions - Latest release: over 1 year ago - 4 dependent packages - 26 dependent repositories - 36.9 thousand downloads last month - 1,752 stars on GitHub - 2 maintainers
Top 9.4% on pypi.org
pyoptimus 0.1.0
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.
32 versions - Latest release: over 1 year ago - 1 dependent repositories - 240 downloads last month - 1,447 stars on GitHub - 2 maintainers
Top 4.8% on pypi.org
optimuspyspark 2.2.32
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion wi...
83 versions - Latest release: almost 4 years ago - 8 dependent repositories - 10.7 thousand downloads last month - 1,441 stars on GitHub - 2 maintainers
Top 1.9% on pypi.org
autovizwidget 0.21.0
AutoVizWidget: An Auto-Visualization library for pandas dataframes
54 versions - Latest release: 9 months ago - 4 dependent packages - 93 dependent repositories - 86 thousand downloads last month - 1,289 stars on GitHub - 4 maintainers
emr-serverless-customauth
EMR Serverless Custom Authenticator for spark magic kernel.
1 version - 122 downloads last month - 1,289 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
hdijupyterutils 0.21.0
HdiJupyterUtils: Utils for Jupyter projects from HDInsight team
53 versions - Latest release: 9 months ago - 3 dependent packages - 92 dependent repositories - 86.4 thousand downloads last month - 1,289 stars on GitHub - 4 maintainers
Top 1.7% on pypi.org
sparkmagic 0.21.0
SparkMagic: Spark execution via Livy
56 versions - Latest release: 9 months ago - 4 dependent packages - 86 dependent repositories - 46.3 thousand downloads last month - 1,273 stars on GitHub - 5 maintainers
Top 8.8% on pypi.org
h2o-pysparkling-scoring-3.1 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
37 versions - Latest release: about 2 months ago - 1.49 thousand downloads last month - 952 stars on GitHub - 1 maintainer
Top 5.2% on pypi.org
h2o-pysparkling-3.0 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
49 versions - Latest release: about 2 months ago - 2 dependent repositories - 8.52 thousand downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-2.1 3.32.1.7-1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
7 versions - Latest release: over 2 years ago - 17 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-2.2 3.26.11
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
97 versions - Latest release: over 4 years ago - 351 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-3.4 0.0.2
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
10 versions - Latest release: about 1 year ago - 6.18 thousand downloads last month - 952 stars on GitHub - 1 maintainer
Top 9.2% on pypi.org
h2o-pysparkling-2.3 3.26.11
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
102 versions - Latest release: over 4 years ago - 524 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-2.1 3.26.11
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
88 versions - Latest release: over 4 years ago - 394 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-2.4 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
38 versions - Latest release: about 2 months ago - 91 downloads last month - 952 stars on GitHub - 1 maintainer
Top 8.7% on pypi.org
h2o-pysparkling-3.2 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
24 versions - Latest release: about 2 months ago - 40.4 thousand downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-3.2 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
23 versions - Latest release: about 2 months ago - 52 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-3.4 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
8 versions - Latest release: about 2 months ago - 185 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-3.3 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
19 versions - Latest release: about 2 months ago - 97 downloads last month - 952 stars on GitHub - 1 maintainer
Top 5.0% on pypi.org
h2o-pysparkling-3.1 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
37 versions - Latest release: about 2 months ago - 2 dependent repositories - 75.6 thousand downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-3.5 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
4 versions - Latest release: about 2 months ago - 155 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-3.5 0.0.0
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
5 versions - Latest release: 8 months ago - 1.27 thousand downloads last month - 952 stars on GitHub - 1 maintainer
Top 9.0% on pypi.org
h2o-pysparkling-3.3 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
20 versions - Latest release: about 2 months ago - 32 thousand downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-2.2 3.36.1.5.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
22 versions - Latest release: over 1 year ago - 71 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-3.0 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
38 versions - Latest release: about 2 months ago - 91 downloads last month - 952 stars on GitHub - 1 maintainer
h2o-pysparkling-scoring-2.3 3.46.0.1.post1
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
38 versions - Latest release: about 2 months ago - 93 downloads last month - 952 stars on GitHub - 1 maintainer
Top 3.7% on pypi.org
h2o-pysparkling-2.4 3.26.11
Sparkling Water integrates H2O's Fast Scalable Machine Learning with Spark
81 versions - Latest release: over 4 years ago - 12 dependent repositories - 11.2 thousand downloads last month - 952 stars on GitHub - 1 maintainer
metaspore 1.1.0
Metaspore: A Unified End-to-end Machine Intelligence Platform
3 versions - Latest release: over 1 year ago - 10 downloads last month - 626 stars on GitHub - 2 maintainers
Top 3.3% on pypi.org
quinn 0.10.3
Pyspark helper methods to maximize developer efficiency
16 versions - Latest release: 4 months ago - 3 dependent packages - 11 dependent repositories - 781 thousand downloads last month - 580 stars on GitHub - 1 maintainer
Top 2.7% on pypi.org
datacompy 0.12.0
Dataframe comparison in Python
25 versions - Latest release: about 1 month ago - 7 dependent packages - 16 dependent repositories - 683 thousand downloads last month - 399 stars on GitHub - 4 maintainers
Top 2.9% on pypi.org
tdigest 0.4.0 💰
T-Digest data structure
14 versions - Latest release: almost 9 years ago - 13 dependent packages - 41 dependent repositories - 259 thousand downloads last month - 376 stars on GitHub - 1 maintainer
mack 0.5.0
Delta Lake helper methods in PySpark
5 versions - Latest release: 4 months ago - 12.3 thousand downloads last month - 271 stars on GitHub - 1 maintainer
sparkdantic 0.20.5
A pydantic -> spark schema library
34 versions - Latest release: 3 months ago - 33.7 thousand downloads last month - 270 stars on GitHub - 1 maintainer
Top 5.7% on pypi.org
dbldatagen 0.3.6
Databricks Labs - PySpark Synthetic Data Generator
13 versions - Latest release: 3 months ago - 1 dependent package - 2 dependent repositories - 171 thousand downloads last month - 270 stars on GitHub - 1 maintainer
Top 8.6% on pypi.org
butterfree 1.2.4
A tool for building feature stores - Transform your raw data into beautiful features.
36 versions - Latest release: about 1 month ago - 1 dependent repositories - 78.5 thousand downloads last month - 268 stars on GitHub - 1 maintainer
Top 8.9% on pypi.org
pyspark-hnsw 1.1.0
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World gr...
30 versions - Latest release: over 1 year ago - 1 dependent repositories - 31.4 thousand downloads last month - 242 stars on GitHub - 1 maintainer
Top 6.6% on pypi.org
spark-df-profiling 1.1.13
Create HTML profiling reports from Apache Spark DataFrames
13 versions - Latest release: over 7 years ago - 2 dependent repositories - 53.9 thousand downloads last month - 194 stars on GitHub - 1 maintainer
Top 8.6% on pypi.org
spark-df-profiling-new 1.1.14
Create HTML profiling reports from Apache Spark DataFrames
1 version - Latest release: about 3 years ago - 1 dependent repositories - 38.6 thousand downloads last month - 194 stars on GitHub - 1 maintainer
Top 6.7% on pypi.org
handyspark 0.2.2a1
HandySpark - bringing pandas-like capabilities to Spark dataframes
7 versions - Latest release: about 5 years ago - 3 dependent repositories - 156 thousand downloads last month - 183 stars on GitHub - 1 maintainer
pyspark-extension 2.12.0.3.5
A library that provides useful extensions to Apache Spark.
45 versions - Latest release: about 1 month ago - 38.6 thousand downloads last month - 174 stars on GitHub - 1 maintainer
replay-rec 0.16.0
RecSys Library
17 versions - Latest release: 3 months ago - 1 dependent package - 1 dependent repositories - 6.17 thousand downloads last month - 127 stars on GitHub - 1 maintainer
Top 3.7% on pypi.org
pyspark-stubs 3.0.0 💰
A collection of the Apache Spark stub files
38 versions - Latest release: almost 4 years ago - 2 dependent packages - 146 dependent repositories - 187 thousand downloads last month - 114 stars on GitHub - 1 maintainer
google-dataproc-templates 0.1.0
Google Dataproc templates written in Python
8 versions - Latest release: about 1 year ago - 29.6 thousand downloads last month - 111 stars on GitHub - 3 maintainers
cuallee 0.10.2
Python library for data validation on DataFrame APIs including Snowflake/Snowpark, Apache/PySpark...
76 versions - Latest release: 22 days ago - 1 dependent package - 1 dependent repositories - 11.6 thousand downloads last month - 111 stars on GitHub - 2 maintainers
Top 9.0% on pypi.org
jupyterlab-sparkmonitor 4.1.0
Spark Monitor Extension for Jupyter Lab
14 versions - Latest release: almost 3 years ago - 1 dependent repositories - 10.3 thousand downloads last month - 91 stars on GitHub - 1 maintainer
Top 3.9% on pypi.org
pytest-spark 0.6.0
pytest plugin to run the tests with support of pyspark.
15 versions - Latest release: over 4 years ago - 7 dependent packages - 71 dependent repositories - 290 thousand downloads last month - 82 stars on GitHub - 1 maintainer
Top 9.3% on pypi.org
anovos 1.1.0
An Open Source tool for Feature Engineering in Machine Learning
8 versions - Latest release: over 1 year ago - 2 dependent repositories - 2.01 thousand downloads last month - 77 stars on GitHub - 1 maintainer
sourced-spark-api 0.0.12
API to use Spark on top of source code repositories.
10 versions - Latest release: over 6 years ago - 1 dependent repositories - 35 downloads last month - 71 stars on GitHub - 1 maintainer
sourced-jgit-spark-connector 2.0.1
Engine to use Spark on top of source code repositories.
2 versions - Latest release: over 5 years ago - 1 dependent repositories - 26 downloads last month - 71 stars on GitHub - 1 maintainer
mmtfpyspark 0.3.6
Methods for parallel and distributed analysis and mining of the Protein Data Bank using MMTF and ...
10 versions - Latest release: over 5 years ago - 1 dependent repositories - 41 downloads last month - 67 stars on GitHub - 2 maintainers
sparksteps 3.0.1
Workflow tool to launch Spark jobs on AWS EMR
20 versions - Latest release: over 3 years ago - 1 dependent repositories - 266 downloads last month - 67 stars on GitHub - 2 maintainers
sparkly 2.8.2
Helpers & syntax sugar for PySpark.
21 versions - Latest release: almost 4 years ago - 1 dependent repositories - 14.1 thousand downloads last month - 60 stars on GitHub - 3 maintainers
Top 9.9% on pypi.org
soda-spark 0.3.3
Soda SQL API for PySpark data frame
11 versions - Latest release: about 2 years ago - 1 dependent package - 1 dependent repositories - 46.3 thousand downloads last month - 60 stars on GitHub - 1 maintainer
sparkora 0.0.1
Exploratory data analysis toolkit for Pyspark
1 version - Latest release: over 2 years ago - 1 dependent repositories - 16 downloads last month - 53 stars on GitHub - 1 maintainer
typedspark 1.4.2
Column-wise type annotations for pyspark DataFrames
38 versions - Latest release: about 1 month ago - 1 dependent package - 11.9 thousand downloads last month - 52 stars on GitHub - 1 maintainer
pyspark-asyncactions 0.0.4 💰
A proof of concept asynchronous actions for PySpark using concurent.futures
4 versions - Latest release: over 3 years ago - 1 dependent repositories - 5.19 thousand downloads last month - 44 stars on GitHub - 1 maintainer
Top 7.7% on pypi.org
cluster-pack 0.3.7
A library on top of either pex or conda-packto make your Python code easily available on a cluster
44 versions - Latest release: about 1 month ago - 2 dependent packages - 5 dependent repositories - 1.27 thousand downloads last month - 43 stars on GitHub - 7 maintainers
Top 6.8% on pypi.org
dataproc-templates 0.0.2 removed
Dataproc templates written in Python
2 versions - Latest release: over 1 year ago - 40 stars on GitHub
pyjaws 0.1.7
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
10 versions - Latest release: 9 months ago - 1 dependent repositories - 46 downloads last month - 38 stars on GitHub - 1 maintainer
sparkdataset 1.0.0
Provides instant access to many popular datasets right from Pyspark (in dataframe structure).
1 version - Latest release: over 2 years ago - 1 dependent repositories - 9 downloads last month - 34 stars on GitHub - 1 maintainer
checkengine 0.2.0 removed
Data-quality checks for PySpark
1 version - Latest release: almost 3 years ago - 16 downloads last month - 30 stars on GitHub
yummy 0.0.11
14 versions - Latest release: 4 months ago - 1 dependent repositories - 64 downloads last month - 30 stars on GitHub - 1 maintainer
pyspark3d 0.3.1
Spark extension for processing large-scale 3D data sets
4 versions - Latest release: over 5 years ago - 1 dependent repositories - 62 downloads last month - 29 stars on GitHub - 1 maintainer
sparglim 0.2.1 💰
sparglim
16 versions - Latest release: 4 months ago - 1 dependent package - 1 dependent repositories - 75 downloads last month - 28 stars on GitHub - 1 maintainer
graphlet 0.1.1
Graphlet AI Knowledge Graph Factory
1 version - Latest release: almost 2 years ago - 12 downloads last month - 27 stars on GitHub - 1 maintainer
dummyrdd 0.1.2
A pure python mocked version of pyspark's rdd class
11 versions - Latest release: about 7 years ago - 1 dependent repositories - 51 downloads last month - 27 stars on GitHub - 1 maintainer
dummy_spark 0.0.1
A pure python mocked version of pyspark's rdd class
1 version - Latest release: almost 8 years ago - 32 downloads last month - 27 stars on GitHub - 1 maintainer
sparkpickle 1.0.1
Provides functions for reading SequenceFile-s with Python pickles.
2 versions - Latest release: over 7 years ago - 1 dependent repositories - 5.3 thousand downloads last month - 24 stars on GitHub - 1 maintainer
spark-xarray 0.1.dev0
This is an experimental project that seeks to integrate PySpark and xarray for Climate Data Analy...
1 version - Latest release: 6 months ago - 41 downloads last month - 22 stars on GitHub - 1 maintainer
pramen-py 1.8.8
Pramen transformations written in python
30 versions - Latest release: 17 days ago - 401 downloads last month - 22 stars on GitHub - 3 maintainers
pbspark 0.9.0
Convert between protobuf messages and pyspark dataframes
11 versions - Latest release: 12 months ago - 1 dependent repositories - 276 thousand downloads last month - 21 stars on GitHub - 1 maintainer
pypair 3.0.9 💰
Pairwise association measures of statistical variable types
11 versions - Latest release: over 2 years ago - 1 dependent repositories - 954 downloads last month - 21 stars on GitHub - 1 maintainer
Top 9.6% on pypi.org
pyspark-test 0.2.0
Check that left and right spark DataFrame are equal.
2 versions - Latest release: over 2 years ago - 5 dependent repositories - 169 thousand downloads last month - 21 stars on GitHub - 1 maintainer
spetlr 5.1.6
A python ETL libRary (SPETLR) for Databricks powered by Apache SPark.
53 versions - Latest release: 18 days ago - 1 dependent package - 22.4 thousand downloads last month - 18 stars on GitHub - 1 maintainer
mse 0.1.4
Make Structs Easy (MSE)
3 versions - Latest release: about 4 years ago - 1 dependent repositories - 2.72 thousand downloads last month - 18 stars on GitHub - 1 maintainer
copybook 1.0.16
python copybook parser
12 versions - Latest release: over 1 year ago - 1 dependent repositories - 29.9 thousand downloads last month - 17 stars on GitHub - 1 maintainer
tmlt-analytics 0.10.0
Tumult's differential privacy analytics API
27 versions - Latest release: 16 days ago - 1.03 thousand downloads last month - 17 stars on GitLab.com - 1 maintainer
imnet 0.2.1
imNet: a Sequence Network Construction Toolkit
5 versions - Latest release: almost 4 years ago - 2 dependent repositories - 30 downloads last month - 16 stars on GitHub - 1 maintainer
sparklanes 0.2.4
A lightweight framework to build and execute data processing pipelines in pyspark (Apache Spark's...
5 versions - Latest release: over 5 years ago - 1 dependent repositories - 27 downloads last month - 16 stars on GitHub - 1 maintainer
tidypyspark 0.0.1
dplyr for pyspark
1 version - Latest release: about 1 year ago - 17 downloads last month - 14 stars on GitHub - 2 maintainers
typed-pyspark 0.0.5
Contains a set of abstractions to type annotate and validate dataframes in pyspark
5 versions - Latest release: about 2 years ago - 1 dependent package - 17.2 thousand downloads last month - 14 stars on GitHub - 3 maintainers
dot-connect 0.3.32
Improve your workflow efficiency by connecting to databases and cloud systems effortlessly.
7 versions - Latest release: 9 months ago - 47 downloads last month - 13 stars on GitHub - 1 maintainer
sparkql 0.10.0
sparkql: Apache Spark SQL DataFrame schema management for sensible humans
20 versions - Latest release: 9 months ago - 1 dependent repositories - 28.9 thousand downloads last month - 13 stars on GitHub - 1 maintainer
aws-insurancelake-etl 3.3.1
A CDK Python app for deploying ETL jobs that operate data pipelines for InsuranceLake in AWS
8 versions - Latest release: 2 months ago - 32 downloads last month - 12 stars on GitHub - 1 maintainer
marshmallow-pyspark 0.2.4
PySpark data serializer
6 versions - Latest release: 5 months ago - 1 dependent repositories - 2.14 thousand downloads last month - 12 stars on GitHub - 1 maintainer
sparksnake 0.2.2
Improving the development of Spark applications deployed as jobs on AWS services like Glue and EMR
24 versions - Latest release: 11 months ago - 23.3 thousand downloads last month - 12 stars on GitHub - 1 maintainer
sparkorm 1.2.17
SparkORM: Python Spark SQL & DataFrame schema management and basic Object Relational Mapping.
23 versions - Latest release: 18 days ago - 213 thousand downloads last month - 11 stars on GitHub - 1 maintainer
ydot 0.0.6 💰
R-like formulas for Spark Dataframes
6 versions - Latest release: over 3 years ago - 1 dependent repositories - 37 downloads last month - 10 stars on GitHub - 1 maintainer
pyspark-spy 1.0.2
Collect and aggregate on spark events for profitz. In 🐍 way!
3 versions - Latest release: about 3 years ago - 1 dependent repositories - 218 downloads last month - 10 stars on GitHub - 1 maintainer