Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "pyspark" keyword

typed-pyspark 0.0.5
Contains a set of abstractions to type annotate and validate dataframes in pyspark
5 versions - Latest release: about 2 years ago - 1 dependent package - 17.2 thousand downloads last month - 14 stars on GitHub - 3 maintainers
sparksteps 3.0.1
Workflow tool to launch Spark jobs on AWS EMR
20 versions - Latest release: over 3 years ago - 1 dependent repositories - 266 downloads last month - 67 stars on GitHub - 2 maintainers
Top 1.4% on pypi.org
spark-nlp 5.3.3
John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML...
141 versions - Latest release: about 1 month ago - 35 dependent packages - 35 dependent repositories - 4.12 million downloads last month - 3,716 stars on GitHub - 3 maintainers
Top 10.0% on pypi.org
ckls-test-lib 4.2.7 removed
John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML...
1 version - Latest release: over 1 year ago - 85 downloads last month - 3,066 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
ibis-framework 9.0.0
The portable Python dataframe library
92 versions - Latest release: 19 days ago - 25 dependent packages - 130 dependent repositories - 187 thousand downloads last month - 3,333 stars on GitHub - 6 maintainers
turntable-spoonbill 10.0.0
Productivity-centric Python Big Data Framework
5 versions - Latest release: 6 days ago - 254 downloads last month - 4,307 stars on GitHub - 1 maintainer
tinsel 0.3.0
PySpark schema generator
3 versions - Latest release: over 5 years ago - 1 dependent repositories - 214 thousand downloads last month - 1 maintainer
cuallee 0.10.2
Python library for data validation on DataFrame APIs including Snowflake/Snowpark, Apache/PySpark...
76 versions - Latest release: 8 days ago - 1 dependent package - 1 dependent repositories - 11.6 thousand downloads last month - 111 stars on GitHub - 2 maintainers
Top 8.9% on pypi.org
pyspark-hnsw 1.1.0
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World gr...
30 versions - Latest release: over 1 year ago - 1 dependent repositories - 31.4 thousand downloads last month - 242 stars on GitHub - 1 maintainer
Top 5.7% on pypi.org
dbldatagen 0.3.6
Databricks Labs - PySpark Synthetic Data Generator
13 versions - Latest release: 3 months ago - 1 dependent package - 2 dependent repositories - 173 thousand downloads last month - 268 stars on GitHub - 1 maintainer
sparkdantic 0.20.5
A pydantic -> spark schema library
30 versions - Latest release: 2 months ago - 32.9 thousand downloads last month - 268 stars on GitHub - 1 maintainer
tmlt-analytics 0.10.0
Tumult's differential privacy analytics API
25 versions - Latest release: 2 days ago - 501 downloads last month - 17 stars on GitLab.com - 1 maintainer
dummy_spark 0.0.1
A pure python mocked version of pyspark's rdd class
1 version - Latest release: almost 8 years ago - 32 downloads last month - 27 stars on GitHub - 1 maintainer
Top 9.4% on pypi.org
pyoptimus 0.1.0
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.
32 versions - Latest release: over 1 year ago - 1 dependent repositories - 371 downloads last month - 1,441 stars on GitHub - 2 maintainers
pysparkaudittest 1.0.0
PySpark Data Audit library
1 version - Latest release: almost 4 years ago - 1 dependent repositories - 22 downloads last month - 9 stars on GitHub - 1 maintainer
dummyrdd 0.1.2
A pure python mocked version of pyspark's rdd class
11 versions - Latest release: almost 7 years ago - 1 dependent repositories - 51 downloads last month - 27 stars on GitHub - 1 maintainer
tmlt-core 0.14.0
Tumult's differential privacy primitives
41 versions - Latest release: 3 days ago - 5.22 thousand downloads last month - 10 stars on GitLab.com - 1 maintainer
pysparkaudit 1.0.0
PySpark Data Audit library
1 version - Latest release: almost 5 years ago - 1 dependent repositories - 42 downloads last month - 9 stars on GitHub - 1 maintainer
hermione-databricks 1.0.7
Tool to create ML project structure inside the databricks framework
18 versions - Latest release: over 3 years ago - 1 dependent repositories - 167 downloads last month - 4 stars on GitHub - 1 maintainer
pramen-py 1.8.8
Pramen transformations written in python
30 versions - Latest release: 4 days ago - 451 downloads last month - 22 stars on GitHub - 3 maintainers
glue-utils 0.4.0
Reusable utilities for working with Glue PySpark jobs
19 versions - Latest release: 4 days ago - 2.57 thousand downloads last month - 1 stars on GitHub - 1 maintainer
Top 3.3% on pypi.org
synapseml 1.0.4
Synapse Machine Learning
16 versions - Latest release: about 1 month ago - 2 dependent packages - 3 dependent repositories - 233 thousand downloads last month - 4,981 stars on GitHub - 1 maintainer
Top 9.6% on pypi.org
pyspark-test 0.2.0
Check that left and right spark DataFrame are equal.
2 versions - Latest release: over 2 years ago - 5 dependent repositories - 169 thousand downloads last month - 21 stars on GitHub - 1 maintainer
sparkorm 1.2.17
SparkORM: Python Spark SQL & DataFrame schema management and basic Object Relational Mapping.
21 versions - Latest release: 4 days ago - 386 thousand downloads last month - 9 stars on GitHub - 1 maintainer
spetlr 5.1.6
A python ETL libRary (SPETLR) for Databricks powered by Apache SPark.
52 versions - Latest release: 4 days ago - 1 dependent package - 25.1 thousand downloads last month - 18 stars on GitHub - 1 maintainer
replay-rec 0.16.0
RecSys Library
17 versions - Latest release: 2 months ago - 1 dependent package - 1 dependent repositories - 4.05 thousand downloads last month - 125 stars on GitHub - 1 maintainer
typedspark 1.4.2
Column-wise type annotations for pyspark DataFrames
38 versions - Latest release: 19 days ago - 1 dependent package - 11.9 thousand downloads last month - 52 stars on GitHub - 1 maintainer
Top 3.9% on pypi.org
pytest-spark 0.6.0
pytest plugin to run the tests with support of pyspark.
15 versions - Latest release: about 4 years ago - 7 dependent packages - 71 dependent repositories - 290 thousand downloads last month - 82 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
hdijupyterutils 0.21.0
HdiJupyterUtils: Utils for Jupyter projects from HDInsight team
53 versions - Latest release: 8 months ago - 3 dependent packages - 92 dependent repositories - 93.5 thousand downloads last month - 1,286 stars on GitHub - 4 maintainers
fink-science 3.13.3
User-defined science module for the Fink broker.
62 versions - Latest release: over 1 year ago - 1 dependent package - 5 dependent repositories - 695 downloads last month - 10 stars on GitHub - 1 maintainer
Top 1.9% on pypi.org
autovizwidget 0.21.0
AutoVizWidget: An Auto-Visualization library for pandas dataframes
54 versions - Latest release: 8 months ago - 4 dependent packages - 93 dependent repositories - 92.7 thousand downloads last month - 1,286 stars on GitHub - 4 maintainers
compars 0.0.0
DataFrame comparison done right (AKA the Bear-agnostic DataFrame comparison library)
1 version - Latest release: 29 days ago - 156 downloads last month - 0 stars on GitHub - 1 maintainer
spark-scaffolder-transforms-tools 0.0.1
spark_scaffolder_transforms_tools
1 version - Latest release: about 2 months ago - 173 downloads last month - 1 maintainer
repartipy 0.1.8
Helper for handling PySpark DataFrame partition size πŸ“‘πŸŽ›οΈ
6 versions - Latest release: 2 months ago - 505 downloads last month - 2 stars on GitHub - 1 maintainer
spark-xarray 0.1.dev0
This is an experimental project that seeks to integrate PySpark and xarray for Climate Data Analy...
1 version - Latest release: 6 months ago - 20 downloads last month - 22 stars on GitHub - 1 maintainer
databricks-bridge 0.0.4
Databricks read and write with sql connection
4 versions - Latest release: about 2 months ago - 57 downloads last month - 1 maintainer
spark-config-builder 0.2
Build an Apache Spark configuration easily using a config file.
1 version - Latest release: 7 months ago - 16 downloads last month - 1 maintainer
aws-insurancelake-etl 3.3.1
A CDK Python app for deploying ETL jobs that operate data pipelines for InsuranceLake in AWS
8 versions - Latest release: about 2 months ago - 82 downloads last month - 11 stars on GitHub - 1 maintainer
pyspark-data-mocker 3.0.0
Mock a datalake easily to be able to test your pyspark data application
10 versions - Latest release: about 2 months ago - 72 downloads last month - 0 stars on GitHub - 1 maintainer
dot-connect 0.3.32
Improve your workflow efficiency by connecting to databases and cloud systems effortlessly.
7 versions - Latest release: 9 months ago - 53 downloads last month - 12 stars on GitHub - 1 maintainer
sparglim 0.2.1 πŸ’°
sparglim
16 versions - Latest release: 4 months ago - 1 dependent package - 1 dependent repositories - 95 downloads last month - 28 stars on GitHub - 1 maintainer
smartdriveml 0.0.12
SmartDriveML: Driving Industrial Projects with Affordable ML Solutions, Frameworks and Cloud Choices
4 versions - Latest release: 10 months ago - 15 downloads last month - 1 maintainer
sparkminiohandle 0.0.7
Spark MinIO Handler Package
7 versions - Latest release: 10 months ago - 55 downloads last month - 1 maintainer
pyjaws 0.1.7
PyJaws: A Pythonic Way to Define Databricks Jobs and Workflows
10 versions - Latest release: 8 months ago - 1 dependent repositories - 60 downloads last month - 37 stars on GitHub - 1 maintainer
pyspark-supp 0.1.0
Data Engineer Support PySpark Library
2 versions - Latest release: about 1 year ago - 266 downloads last month - 1 stars on GitHub - 1 maintainer
pysparkplus 0.0.3
Pyspark extra functions!
3 versions - Latest release: about 1 year ago - 9 downloads last month - 2 stars on GitHub - 1 maintainer
pyspark-extension 2.12.0.3.5
A library that provides useful extensions to Apache Spark.
45 versions - Latest release: 23 days ago - 50.6 thousand downloads last month - 171 stars on GitHub - 1 maintainer
spetlr-tools 0.1.65
Supplements to the python SPark ETL libRary (SPETLR) for Databricks.
31 versions - Latest release: 3 months ago - 1 dependent repositories - 8.07 thousand downloads last month - 0 stars on GitHub - 1 maintainer
pyspine 0.0.14
Spine: The backbone of your project
3 versions - Latest release: about 1 year ago - 49 downloads last month - 2 stars on GitHub - 1 maintainer
spinecore 0.0.20
The core lib of spine library
10 versions - Latest release: 5 months ago - 2 dependent packages - 100 downloads last month - 2 stars on GitHub - 1 maintainer
spinelibs 0.0.17
Libs for spine project
7 versions - Latest release: 5 months ago - 1 dependent package - 86 downloads last month - 2 stars on GitHub - 1 maintainer
tidypyspark 0.0.1
dplyr for pyspark
1 version - Latest release: about 1 year ago - 17 downloads last month - 14 stars on GitHub - 2 maintainers
sparksnake 0.2.2
Improving the development of Spark applications deployed as jobs on AWS services like Glue and EMR
24 versions - Latest release: 10 months ago - 23.3 thousand downloads last month - 12 stars on GitHub - 1 maintainer
gluesnake 0.1.1 removed
Funcionalidades Spark criadas para facilitar a criação de jobs Glue na AWS
6 versions - Latest release: about 1 year ago - 557 downloads last month - 0 stars on GitHub - 1 maintainer
waterfall-logging 0.1.0
Waterfall statistic logging for data quality or filtering steps.
1 version - Latest release: about 1 year ago - 15 downloads last month - 2 stars on GitHub - 1 maintainer
codeme 0.1.9
CodeMe - Automatic Python Coder
20 versions - Latest release: over 1 year ago - 247 downloads last month - 1 stars on GitHub - 1 maintainer
dfanalyzer 0.0.4
Pyspark Dataframe Analyzer - Smartest DataFrame Analysis
4 versions - Latest release: over 1 year ago - 29 downloads last month - 1 maintainer
mack 0.5.0
Delta Lake helper methods in PySpark
5 versions - Latest release: 3 months ago - 9.8 thousand downloads last month - 265 stars on GitHub - 1 maintainer
pyspark-bucketmap 0.0.5
Easily group pyspark data into buckets and map them to different values.
4 versions - Latest release: over 1 year ago - 51 downloads last month - 1 stars on GitHub - 1 maintainer
hlink 3.5.4
Fast supervised pyspark record linkage software
15 versions - Latest release: 3 months ago - 29 downloads last month - 10 stars on GitHub - 4 maintainers
spalah 1.0.6
Spalah is a set of PySpark dataframe helpers
12 versions - Latest release: 5 months ago - 19 downloads last month - 7 stars on GitHub - 1 maintainer
pyspark-connectors 0.2.0
The easy and quickly way to connect and integrate the Spark project with many others data sources.
8 versions - Latest release: almost 2 years ago - 113 downloads last month - 5 stars on GitHub - 1 maintainer
metaspore 1.1.0
Metaspore: A Unified End-to-end Machine Intelligence Platform
3 versions - Latest release: over 1 year ago - 10 downloads last month - 626 stars on GitHub - 2 maintainers
chisquaretestforstring 0.0.1
Chi-Square Test for string columns
1 version - Latest release: about 2 years ago - 24 downloads last month - 1 maintainer
ydot 0.0.6 πŸ’°
R-like formulas for Spark Dataframes
6 versions - Latest release: over 3 years ago - 1 dependent repositories - 37 downloads last month - 10 stars on GitHub - 1 maintainer
watson-transformer 0.0.17
wrap Watson API into pyspark transformers
15 versions - Latest release: over 2 years ago - 1 dependent repositories - 143 downloads last month - 1 maintainer
tdml 0.1.1
Transform Dataframe for Machine Learning
1 version - Latest release: almost 4 years ago - 1 dependent repositories - 11 downloads last month - 1 stars on GitHub - 1 maintainer
sysxtract 1.0.0
Extract logs based off events from sysmon. Comes as a package, cli and ui.
1 version - Latest release: almost 4 years ago - 1 dependent repositories - 9 downloads last month - 3 stars on GitHub - 1 maintainer
sparkypandy 0.1.4 πŸ’°
It's not spark, it's now pandas, it's just awkward...
5 versions - Latest release: almost 3 years ago - 1 dependent repositories - 13 downloads last month - 2 stars on GitHub - 1 maintainer
sparkql 0.10.0
sparkql: Apache Spark SQL DataFrame schema management for sensible humans
20 versions - Latest release: 8 months ago - 1 dependent repositories - 27.9 thousand downloads last month - 13 stars on GitHub - 1 maintainer
spark-pager 1.1.2
A Python library for sending notifications on Spark Job Status.
3 versions - Latest release: almost 3 years ago - 1 dependent repositories - 187 downloads last month - 0 stars on GitHub - 1 maintainer
sparkora 0.0.1
Exploratory data analysis toolkit for Pyspark
1 version - Latest release: over 2 years ago - 1 dependent repositories - 16 downloads last month - 53 stars on GitHub - 1 maintainer
sparkly 2.8.2
Helpers & syntax sugar for PySpark.
21 versions - Latest release: almost 4 years ago - 1 dependent repositories - 14.1 thousand downloads last month - 60 stars on GitHub - 3 maintainers
spark-lean 0.3.3
An interactive PySpark-based Data Cleaning Library
4 versions - Latest release: about 6 years ago - 1 dependent repositories - 20 downloads last month - 7 stars on GitHub - 2 maintainers
spark-emr 0.1.2
Run python packages on AWS EMR
2 versions - Latest release: about 5 years ago - 1 dependent repositories - 20 downloads last month - 0 stars on GitHub - 1 maintainer
sparkdh 0.0.1
1 version - Latest release: over 2 years ago - 1 dependent repositories - 6 downloads last month - 0 stars on GitHub - 1 maintainer
spark-df-profiling-optimus 0.1.1
Create HTML profiling reports from Apache Spark DataFrames
6 versions - Latest release: over 6 years ago - 3 dependent repositories - 598 downloads last month - 2 stars on GitHub - 1 maintainer
Top 8.6% on pypi.org
spark-df-profiling-new 1.1.14
Create HTML profiling reports from Apache Spark DataFrames
1 version - Latest release: almost 3 years ago - 1 dependent repositories - 38.6 thousand downloads last month - 194 stars on GitHub - 1 maintainer
Top 6.6% on pypi.org
spark-df-profiling 1.1.13
Create HTML profiling reports from Apache Spark DataFrames
13 versions - Latest release: over 7 years ago - 2 dependent repositories - 53.9 thousand downloads last month - 194 stars on GitHub - 1 maintainer
sparkdataset 1.0.0
Provides instant access to many popular datasets right from Pyspark (in dataframe structure).
1 version - Latest release: over 2 years ago - 1 dependent repositories - 9 downloads last month - 34 stars on GitHub - 1 maintainer
sourced-spark-api 0.0.12
API to use Spark on top of source code repositories.
10 versions - Latest release: over 6 years ago - 1 dependent repositories - 35 downloads last month - 71 stars on GitHub - 1 maintainer
sourced-jgit-spark-connector 2.0.1
Engine to use Spark on top of source code repositories.
2 versions - Latest release: over 5 years ago - 1 dependent repositories - 26 downloads last month - 71 stars on GitHub - 1 maintainer
Top 9.9% on pypi.org
soda-spark 0.3.3
Soda SQL API for PySpark data frame
11 versions - Latest release: about 2 years ago - 1 dependent package - 1 dependent repositories - 46.3 thousand downloads last month - 60 stars on GitHub - 1 maintainer
scimple 1.11.5
Scimplify Ploting, Graph manipulation, Spark Streaming & Kafka and other tools
20 versions - Latest release: almost 4 years ago - 1 dependent repositories - 33 downloads last month - 1 stars on GitHub - 1 maintainer
scikit-spark 0.4.0
Spark acceleration for Scikit-Learn cross validation techniques
6 versions - Latest release: about 4 years ago - 1 dependent repositories - 8.45 thousand downloads last month - 8 stars on GitHub - 1 maintainer
Top 3.3% on pypi.org
quinn 0.10.3
Pyspark helper methods to maximize developer efficiency
16 versions - Latest release: 3 months ago - 3 dependent packages - 11 dependent repositories - 781 thousand downloads last month - 580 stars on GitHub - 1 maintainer
pysparrow 1.0.4
An arrow interface for PySpark RDDs
1 version - Latest release: about 2 years ago - 1 dependent repositories - 117 downloads last month - 0 stars on GitHub - 1 maintainer
pysparkutils 0.2.5
A collection of utilities for handling pySpark's SparkContext
14 versions - Latest release: over 6 years ago - 1 dependent repositories - 42 downloads last month - 2 stars on GitHub - 1 maintainer
pyspark-sugar 0.4.1
SparkUI enchancements with pyspark
4 versions - Latest release: about 5 years ago - 1 dependent repositories - 27.1 thousand downloads last month - 5 stars on GitHub - 1 maintainer
pyspark-testing 0.0.5
Testing Framework for PySpark
2 versions - Latest release: about 4 years ago - 1 dependent repositories - 179 downloads last month - 3 stars on GitHub - 1 maintainer
Top 3.7% on pypi.org
pyspark-stubs 3.0.0 πŸ’°
A collection of the Apache Spark stub files
38 versions - Latest release: almost 4 years ago - 2 dependent packages - 146 dependent repositories - 211 thousand downloads last month - 114 stars on GitHub - 1 maintainer
pyspark-spy 1.0.2
Collect and aggregate on spark events for profitz. In 🐍 way!
3 versions - Latest release: about 3 years ago - 1 dependent repositories - 218 downloads last month - 10 stars on GitHub - 1 maintainer
pysparkproxy 0.0.17
Seamlessly execute pyspark code on remote clusters
9 versions - Latest release: over 5 years ago - 1 dependent repositories - 43 downloads last month - 4 stars on GitHub - 1 maintainer
pyspark-pandas 0.0.7
Tools and algorithms for pandas Dataframes distributed on pyspark. Please consider the SparklingP...
5 versions - Latest release: over 9 years ago - 2 dependent repositories - 395 thousand downloads last month - 6 stars on GitHub - 1 maintainer
pyspark-model-plus 1.0.2
Enhancements to commonly used pyspark functions for building models
2 versions - Latest release: over 3 years ago - 1 dependent repositories - 26 downloads last month - 0 stars on GitHub - 1 maintainer
pysparkgateway 0.0.22
Connect Pyspark to remote clusters
18 versions - Latest release: about 3 years ago - 1 dependent repositories - 18.1 thousand downloads last month - 3 stars on GitHub - 1 maintainer
pyspark-event-correlation 1.0.2
Event Correlation and Changing Detection Algorithm
3 versions - Latest release: over 2 years ago - 1 dependent repositories - 15 downloads last month - 1 stars on GitHub - 1 maintainer
pyspark_dfreport 0.1
Simple Python Package to save (small) PySpark DataFrames to one Excel File.
1 version - Latest release: over 7 years ago - 8 downloads last month - 0 stars on GitHub - 1 maintainer
pyspark-easy 1.5
Makes pyspark dataframe exploration easy
10 versions - Latest release: about 3 years ago - 1 dependent repositories - 38 downloads last month - 0 stars on GitHub - 1 maintainer
pyspark_db_utils 0.0.7
Usefull functions for working with Database in PySpark (PostgreSQL, ClickHouse)
6 versions - Latest release: almost 6 years ago - 87 downloads last month - 8 stars on GitHub - 1 maintainer