Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "pyspark" keyword

exelog 0.0.1
Enabling meticulous logging for Spark Applications
1 version - Latest release: over 2 years ago - 1 dependent repositories - 14 downloads last month - 5 stars on GitHub - 1 maintainer
scimple 1.11.5
Scimplify Ploting, Graph manipulation, Spark Streaming & Kafka and other tools
20 versions - Latest release: about 4 years ago - 1 dependent repositories - 39 downloads last month - 1 stars on GitHub - 1 maintainer
typedspark 1.4.2
Column-wise type annotations for pyspark DataFrames
39 versions - Latest release: about 1 month ago - 1 dependent package - 18 thousand downloads last month - 52 stars on GitHub - 1 maintainer
pyspark-spy 1.0.2
Collect and aggregate on spark events for profitz. In 🐍 way!
3 versions - Latest release: about 3 years ago - 1 dependent repositories - 207 downloads last month - 10 stars on GitHub - 1 maintainer
spinelibs 0.0.17
Libs for spine project
7 versions - Latest release: 5 months ago - 1 dependent package - 58 downloads last month - 2 stars on GitHub - 1 maintainer
ibmaemagic 0.0.4
Make accessing IBM Analytic Engine easier.
4 versions - Latest release: over 3 years ago - 1 dependent repositories - 19 downloads last month - 3 maintainers
data-manipulation 0.37
Powerful data manipulation
37 versions - Latest release: 5 months ago - 1 dependent repositories - 403 downloads last month - 2 stars on GitHub - 1 maintainer
sourced-spark-api 0.0.12
API to use Spark on top of source code repositories.
10 versions - Latest release: over 6 years ago - 1 dependent repositories - 29 downloads last month - 71 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
ibis-framework 9.0.0
The portable Python dataframe library
94 versions - Latest release: about 1 month ago - 25 dependent packages - 130 dependent repositories - 186 thousand downloads last month - 3,333 stars on GitHub - 6 maintainers
Top 3.7% on pypi.org
pyspark-stubs 3.0.0 πŸ’°
A collection of the Apache Spark stub files
38 versions - Latest release: almost 4 years ago - 2 dependent packages - 146 dependent repositories - 187 thousand downloads last month - 114 stars on GitHub - 1 maintainer
fink-filters 0.2.18
User-defined filters for the Fink broker.
68 versions - Latest release: almost 3 years ago - 3 dependent repositories - 984 downloads last month - 1 stars on GitHub - 1 maintainer
watson-transformer 0.0.17
wrap Watson API into pyspark transformers
15 versions - Latest release: almost 3 years ago - 1 dependent repositories - 143 downloads last month - 1 maintainer
pyspine 0.0.14
Spine: The backbone of your project
3 versions - Latest release: about 1 year ago - 41 downloads last month - 2 stars on GitHub - 1 maintainer
Top 2.9% on pypi.org
tdigest 0.4.0 πŸ’°
T-Digest data structure
14 versions - Latest release: almost 9 years ago - 13 dependent packages - 41 dependent repositories - 259 thousand downloads last month - 376 stars on GitHub - 1 maintainer
Top 7.7% on pypi.org
cluster-pack 0.3.7
A library on top of either pex or conda-packto make your Python code easily available on a cluster
44 versions - Latest release: about 1 month ago - 2 dependent packages - 5 dependent repositories - 1.27 thousand downloads last month - 43 stars on GitHub - 7 maintainers
microdrill 0.0.3
Simple Apache Drill alternative using PySpark
3 versions - Latest release: over 8 years ago - 2 dependent repositories - 10 downloads last month - 7 stars on GitHub - 3 maintainers
pyspark-extension 2.12.0.3.5
A library that provides useful extensions to Apache Spark.
45 versions - Latest release: about 2 months ago - 38.6 thousand downloads last month - 174 stars on GitHub - 1 maintainer
sparglim 0.2.1 πŸ’°
sparglim
16 versions - Latest release: 5 months ago - 1 dependent package - 1 dependent repositories - 75 downloads last month - 28 stars on GitHub - 1 maintainer
sparkdataset 1.0.0
Provides instant access to many popular datasets right from Pyspark (in dataframe structure).
1 version - Latest release: over 2 years ago - 1 dependent repositories - 9 downloads last month - 34 stars on GitHub - 1 maintainer
lexy 0.1.0
Lexy enables you to easily build and share data dictionaries to explain and document your data te...
1 version - Latest release: over 2 years ago - 1 dependent repositories - 10 downloads last month - 0 stars on GitHub - 1 maintainer
tdml 0.1.1
Transform Dataframe for Machine Learning
1 version - Latest release: almost 4 years ago - 1 dependent repositories - 26 downloads last month - 1 stars on GitHub - 1 maintainer
Top 8.6% on pypi.org
butterfree 1.2.4
A tool for building feature stores - Transform your raw data into beautiful features.
36 versions - Latest release: about 1 month ago - 1 dependent repositories - 78.5 thousand downloads last month - 268 stars on GitHub - 1 maintainer
bigdatasml 0.1.3
This package calculates average student performances
3 versions - Latest release: over 2 years ago - 22 downloads last month - 1 maintainer
spark-xarray 0.1.dev0
This is an experimental project that seeks to integrate PySpark and xarray for Climate Data Analy...
1 version - Latest release: 6 months ago - 41 downloads last month - 22 stars on GitHub - 1 maintainer
namedframes 0.1.4
Named Data Frames
3 versions - Latest release: over 3 years ago - 1 dependent package - 1 dependent repositories - 51 downloads last month - 0 stars on GitHub - 1 maintainer
mack 0.5.0
Delta Lake helper methods in PySpark
5 versions - Latest release: 4 months ago - 12.3 thousand downloads last month - 271 stars on GitHub - 1 maintainer
atc-dataplatform-tools 0.1.26
A common set of python libraries for DataBricks, supplement to atc-dataplatform
26 versions - Latest release: about 1 year ago - 1 dependent repositories - 88 downloads last month - 1 stars on GitHub - 1 maintainer
nose2-spark 0.3
nose2 plugin to run the tests with support of pyspark.
3 versions - Latest release: over 7 years ago - 1 dependent repositories - 36 downloads last month - 0 stars on GitHub - 1 maintainer
spark-eda 0.0.2
Exploratory data analysis for pyspark
1 version - Latest release: almost 3 years ago - 1 dependent repositories - 8 downloads last month - 0 stars on GitHub - 1 maintainer
patek 0.5.2
A collection of utilities and tools for accelerating pyspark development and productivity.
7 versions - Latest release: over 1 year ago - 225 downloads last month - 0 stars on GitHub - 1 maintainer
pyspark3d 0.3.1
Spark extension for processing large-scale 3D data sets
4 versions - Latest release: over 5 years ago - 1 dependent repositories - 62 downloads last month - 29 stars on GitHub - 1 maintainer
aws-insurancelake-etl 3.3.1
A CDK Python app for deploying ETL jobs that operate data pipelines for InsuranceLake in AWS
8 versions - Latest release: 3 months ago - 32 downloads last month - 12 stars on GitHub - 1 maintainer
miss-lightgbm-mmlspark
Microsoft ML for Spark
1 version - 4,986 stars on GitHub
scikit-spark 0.4.0
Spark acceleration for Scikit-Learn cross validation techniques
6 versions - Latest release: over 4 years ago - 1 dependent repositories - 8.71 thousand downloads last month - 8 stars on GitHub - 1 maintainer
spark-df-profiling-optimus 0.1.1
Create HTML profiling reports from Apache Spark DataFrames
6 versions - Latest release: almost 7 years ago - 3 dependent repositories - 347 downloads last month - 2 stars on GitHub - 1 maintainer
mmtfpyspark 0.3.6
Methods for parallel and distributed analysis and mining of the Protein Data Bank using MMTF and ...
10 versions - Latest release: over 5 years ago - 1 dependent repositories - 41 downloads last month - 67 stars on GitHub - 2 maintainers
sysxtract 1.0.0
Extract logs based off events from sysmon. Comes as a package, cli and ui.
1 version - Latest release: about 4 years ago - 1 dependent repositories - 14 downloads last month - 3 stars on GitHub - 1 maintainer
sparkql 0.10.0
sparkql: Apache Spark SQL DataFrame schema management for sensible humans
20 versions - Latest release: 9 months ago - 1 dependent repositories - 28.9 thousand downloads last month - 13 stars on GitHub - 1 maintainer
pyspark-val 0.1.4
PySpark validation & testing tooling
5 versions - Latest release: 5 months ago - 27 downloads last month - 0 stars on GitHub - 1 maintainer
databricks-bridge 0.0.4
Databricks read and write with sql connection
4 versions - Latest release: 3 months ago - 25 downloads last month - 1 maintainer
cleanflow 1.3.3a1
A a framework for cleaning, pre-processing and exploring data in a scalable and distributed manner.
11 versions - Latest release: about 6 years ago - 26 downloads last month - 1 stars on GitHub - 1 maintainer
pybda 0.1.0
Analysis of big biological data sets for distributed HPC clusters.
6 versions - Latest release: almost 5 years ago - 1 dependent repositories - 33 downloads last month - 9 stars on GitHub - 1 maintainer
sparklanes 0.2.4
A lightweight framework to build and execute data processing pipelines in pyspark (Apache Spark's...
5 versions - Latest release: over 5 years ago - 1 dependent repositories - 27 downloads last month - 16 stars on GitHub - 1 maintainer
flicker 1.0.1
Provides FlickerDataFrame, a wrapper over Pyspark DataFrame to provide a pandas-like API
17 versions - Latest release: 8 months ago - 2 dependent repositories - 105 downloads last month - 3 stars on GitHub - 1 maintainer
pysparkproxy 0.0.17
Seamlessly execute pyspark code on remote clusters
9 versions - Latest release: over 5 years ago - 1 dependent repositories - 28 downloads last month - 4 stars on GitHub - 1 maintainer
pyspark_dfreport 0.1
Simple Python Package to save (small) PySpark DataFrames to one Excel File.
1 version - Latest release: over 7 years ago - 12 downloads last month - 0 stars on GitHub - 1 maintainer
Top 10.0% on pypi.org
ckls-test-lib 4.2.7 removed
John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML...
1 version - Latest release: over 1 year ago - 85 downloads last month - 3,066 stars on GitHub - 1 maintainer
pyspark-easy 1.5
Makes pyspark dataframe exploration easy
10 versions - Latest release: about 3 years ago - 1 dependent repositories - 28 downloads last month - 0 stars on GitHub - 1 maintainer
atc-dataplatform 1.1.69
A common set of python libraries for DataBricks
90 versions - Latest release: about 1 year ago - 2 dependent packages - 1 dependent repositories - 19.9 thousand downloads last month - 8 stars on GitHub - 1 maintainer
pysparrow 1.0.4
An arrow interface for PySpark RDDs
1 version - Latest release: over 2 years ago - 1 dependent repositories - 165 downloads last month - 0 stars on GitHub - 1 maintainer
pyspark-event-correlation 1.0.2
Event Correlation and Changing Detection Algorithm
3 versions - Latest release: over 2 years ago - 1 dependent repositories - 21 downloads last month - 1 stars on GitHub - 1 maintainer
Top 5.7% on pypi.org
dbldatagen 0.3.6
Databricks Labs - PySpark Synthetic Data Generator
13 versions - Latest release: 4 months ago - 1 dependent package - 2 dependent repositories - 171 thousand downloads last month - 270 stars on GitHub - 1 maintainer
Top 2.7% on pypi.org
datacompy 0.12.0
Dataframe comparison in Python
25 versions - Latest release: about 1 month ago - 7 dependent packages - 16 dependent repositories - 683 thousand downloads last month - 399 stars on GitHub - 4 maintainers
protodf 0.1
A package which lets you run PySpark SQL on your Protobuf data
1 version - Latest release: almost 5 years ago - 1 dependent repositories - 1.11 thousand downloads last month - 8 stars on GitHub - 1 maintainer
turntable-spoonbill 10.0.0
Productivity-centric Python Big Data Framework
5 versions - Latest release: 29 days ago - 239 downloads last month - 4,327 stars on GitHub - 1 maintainer
pyspark-model-plus 1.0.2
Enhancements to commonly used pyspark functions for building models
2 versions - Latest release: over 3 years ago - 1 dependent repositories - 25 downloads last month - 0 stars on GitHub - 1 maintainer
pyjaws 0.1.7
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
10 versions - Latest release: 9 months ago - 1 dependent repositories - 46 downloads last month - 38 stars on GitHub - 1 maintainer
nozberkman-mmlspark 1.0.0
Microsoft ML for Spark
1 version - Latest release: over 2 years ago - 1 dependent repositories - 18 downloads last month - 4,472 stars on GitHub - 1 maintainer
sparkdh 0.0.1
1 version - Latest release: over 2 years ago - 1 dependent repositories - 14 downloads last month - 0 stars on GitHub - 1 maintainer
isparkcache 0.1.12
Cache Spark Dataframes for Jupyter
5 versions - Latest release: almost 7 years ago - 1 dependent repositories - 19 downloads last month - 2 stars on GitHub - 1 maintainer
dfanalyzer 0.0.4
Pyspark Dataframe Analyzer - Smartest DataFrame Analysis
4 versions - Latest release: over 1 year ago - 34 downloads last month - 1 maintainer
sparkhpc 0.1
spark deployment on hpc resources made easy
11 versions - Latest release: 10 months ago - 1 dependent repositories - 34 downloads last month - 1 maintainer
mse 0.1.4
Make Structs Easy (MSE)
3 versions - Latest release: about 4 years ago - 1 dependent repositories - 2.72 thousand downloads last month - 18 stars on GitHub - 1 maintainer
pyspark-pandas 0.0.7
Tools and algorithms for pandas Dataframes distributed on pyspark. Please consider the SparklingP...
5 versions - Latest release: over 9 years ago - 2 dependent repositories - 236 thousand downloads last month - 6 stars on GitHub - 1 maintainer
e2fyi-pyspark 0.1.0a1
Productivity functions for common but painful pyspark tasks.
1 version - Latest release: over 4 years ago - 1 dependent repositories - 8 downloads last month - 0 stars on GitHub - 1 maintainer
Top 3.3% on pypi.org
synapseml 1.0.4
Synapse Machine Learning
16 versions - Latest release: 2 months ago - 2 dependent packages - 3 dependent repositories - 230 thousand downloads last month - 4,985 stars on GitHub - 1 maintainer
pyspark_db_utils 0.0.7
Usefull functions for working with Database in PySpark (PostgreSQL, ClickHouse)
6 versions - Latest release: about 6 years ago - 217 downloads last month - 8 stars on GitHub - 1 maintainer
Top 9.3% on pypi.org
anovos 1.1.0
An Open Source tool for Feature Engineering in Machine Learning
8 versions - Latest release: over 1 year ago - 2 dependent repositories - 2.01 thousand downloads last month - 77 stars on GitHub - 1 maintainer
tuberia 0.0.1
Tuberia... when data engineering meets software engineering
2 versions - Latest release: over 1 year ago - 1 dependent repositories - 23 downloads last month - 3 stars on GitHub - 1 maintainer
sparkypandy 0.1.4 πŸ’°
It's not spark, it's now pandas, it's just awkward...
5 versions - Latest release: almost 3 years ago - 1 dependent repositories - 13 downloads last month - 2 stars on GitHub - 1 maintainer
typed-pyspark 0.0.5
Contains a set of abstractions to type annotate and validate dataframes in pyspark
5 versions - Latest release: about 2 years ago - 1 dependent package - 17.2 thousand downloads last month - 14 stars on GitHub - 3 maintainers
sparksteps 3.0.1
Workflow tool to launch Spark jobs on AWS EMR
20 versions - Latest release: over 3 years ago - 1 dependent repositories - 266 downloads last month - 67 stars on GitHub - 2 maintainers
tinsel 0.3.0
PySpark schema generator
3 versions - Latest release: almost 6 years ago - 1 dependent repositories - 214 thousand downloads last month - 1 maintainer
cuallee 0.10.2
Python library for data validation on DataFrame APIs including Snowflake/Snowpark, Apache/PySpark...
76 versions - Latest release: about 1 month ago - 1 dependent package - 1 dependent repositories - 11.6 thousand downloads last month - 111 stars on GitHub - 2 maintainers
Top 8.9% on pypi.org
pyspark-hnsw 1.1.0
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World gr...
30 versions - Latest release: over 1 year ago - 1 dependent repositories - 31.4 thousand downloads last month - 242 stars on GitHub - 1 maintainer
dummy_spark 0.0.1
A pure python mocked version of pyspark's rdd class
1 version - Latest release: almost 8 years ago - 32 downloads last month - 27 stars on GitHub - 1 maintainer
pysparkaudittest 1.0.0
PySpark Data Audit library
1 version - Latest release: about 4 years ago - 1 dependent repositories - 22 downloads last month - 9 stars on GitHub - 1 maintainer
dummyrdd 0.1.2
A pure python mocked version of pyspark's rdd class
11 versions - Latest release: about 7 years ago - 1 dependent repositories - 51 downloads last month - 27 stars on GitHub - 1 maintainer
pysparkaudit 1.0.0
PySpark Data Audit library
1 version - Latest release: almost 5 years ago - 1 dependent repositories - 42 downloads last month - 9 stars on GitHub - 1 maintainer
hermione-databricks 1.0.7
Tool to create ML project structure inside the databricks framework
18 versions - Latest release: over 3 years ago - 1 dependent repositories - 167 downloads last month - 4 stars on GitHub - 1 maintainer
Top 9.6% on pypi.org
pyspark-test 0.2.0
Check that left and right spark DataFrame are equal.
2 versions - Latest release: over 2 years ago - 5 dependent repositories - 169 thousand downloads last month - 21 stars on GitHub - 1 maintainer
Top 3.9% on pypi.org
pytest-spark 0.6.0
pytest plugin to run the tests with support of pyspark.
15 versions - Latest release: over 4 years ago - 7 dependent packages - 71 dependent repositories - 290 thousand downloads last month - 82 stars on GitHub - 1 maintainer
spark-config-builder 0.2
Build an Apache Spark configuration easily using a config file.
1 version - Latest release: 8 months ago - 16 downloads last month - 1 maintainer
sparkminiohandle 0.0.7
Spark MinIO Handler Package
7 versions - Latest release: 11 months ago - 55 downloads last month - 1 maintainer
gluesnake 0.1.1 removed
Funcionalidades Spark criadas para facilitar a criação de jobs Glue na AWS
6 versions - Latest release: over 1 year ago - 557 downloads last month - 0 stars on GitHub - 1 maintainer
chisquaretestforstring 0.0.1
Chi-Square Test for string columns
1 version - Latest release: about 2 years ago - 24 downloads last month - 1 maintainer
sparkly 2.8.2
Helpers & syntax sugar for PySpark.
21 versions - Latest release: almost 4 years ago - 1 dependent repositories - 14.1 thousand downloads last month - 60 stars on GitHub - 3 maintainers
Top 4.8% on pypi.org
optimuspyspark 2.2.32
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion wi...
83 versions - Latest release: almost 4 years ago - 8 dependent repositories - 10.7 thousand downloads last month - 1,441 stars on GitHub - 2 maintainers
oarphpy 0.1.1
A collection of Python utils with an emphasis on Data Science
6 versions - Latest release: over 1 year ago - 1 dependent repositories - 180 downloads last month - 1 stars on GitHub - 1 maintainer
Top 9.0% on pypi.org
jupyterlab-sparkmonitor 4.1.0
Spark Monitor Extension for Jupyter Lab
14 versions - Latest release: almost 3 years ago - 1 dependent repositories - 10.3 thousand downloads last month - 91 stars on GitHub - 1 maintainer
irmagic 0.1.0 πŸ’°
Intelligent Reliability Platform
2 versions - Latest release: over 6 years ago - 1 dependent repositories - 24 downloads last month - 5,003 stars on GitHub - 1 maintainer
pyspark-iomete 0.0.3
IOMETE's PySpark library that contains useful utilities for working with PySpark
3 versions - Latest release: 11 months ago - 1 dependent repositories - 37 downloads last month - 0 stars on GitHub - 2 maintainers
dbloy 0.3.0
Continuous Delivery tool for PySpark Notebooks based jobs on Databricks.
1 version - Latest release: almost 5 years ago - 1 dependent repositories - 13 downloads last month - 1 stars on GitHub - 1 maintainer
meteo-spark 0.1.0 removed πŸ’°
A python package to process climate scientific files using pyspark.
3 versions - Latest release: over 2 years ago - 1 dependent repositories - 12 downloads last month - 0 stars on GitHub - 1 maintainer
datacorecommon 0.5.0
Wrapper functions for PySpark
27 versions - Latest release: 4 months ago - 1 dependent repositories - 85.7 thousand downloads last month - 3 maintainers
feast-yummy 0.0.6 removed
3 versions - Latest release: about 2 years ago
jennytest 0.2.9 removed
Data quality and profiling tool powered by Apache Spark.
5 versions - Latest release: almost 4 years ago - 39 downloads last month
checkengine 0.2.0 removed
Data-quality checks for PySpark
1 version - Latest release: almost 3 years ago - 16 downloads last month - 30 stars on GitHub