Ecosyste.ms: Packages
An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.
pypi.org "pyspark" keyword
Top 5.7% on pypi.org
13 versions - Latest release: 3 months ago - 1 dependent package - 2 dependent repositories - 156 thousand downloads last month - 267 stars on GitHub - 1 maintainer
dbldatagen 0.3.6
Databricks Labs - PySpark Synthetic Data Generator13 versions - Latest release: 3 months ago - 1 dependent package - 2 dependent repositories - 156 thousand downloads last month - 267 stars on GitHub - 1 maintainer
sparkdantic 0.20.5
A pydantic -> spark schema library30 versions - Latest release: about 2 months ago - 32.6 thousand downloads last month - 267 stars on GitHub - 1 maintainer
pyspark-asyncactions 0.0.4 💰
A proof of concept asynchronous actions for PySpark using concurent.futures4 versions - Latest release: over 3 years ago - 1 dependent repositories - 5.19 thousand downloads last month - 44 stars on GitHub - 2 maintainers
sparkora 0.0.1
Exploratory data analysis toolkit for Pyspark1 version - Latest release: over 2 years ago - 1 dependent repositories - 16 downloads last month - 53 stars on GitHub - 2 maintainers
fiware-pyspark-connector 0.0.10
Connects FIWARE Context Brokers with fiware_pyspark_connector5 versions - Latest release: 10 months ago - 33 downloads last month - 2 stars on GitHub - 1 maintainer
Top 3.3% on pypi.org
16 versions - Latest release: 29 days ago - 2 dependent packages - 3 dependent repositories - 235 thousand downloads last month - 4,975 stars on GitHub - 2 maintainers
synapseml 1.0.4
Synapse Machine Learning16 versions - Latest release: 29 days ago - 2 dependent packages - 3 dependent repositories - 235 thousand downloads last month - 4,975 stars on GitHub - 2 maintainers
Top 2.7% on pypi.org
25 versions - Latest release: 8 days ago - 7 dependent packages - 16 dependent repositories - 686 thousand downloads last month - 389 stars on GitHub - 8 maintainers
datacompy 0.12.0
Dataframe comparison in Python25 versions - Latest release: 8 days ago - 7 dependent packages - 16 dependent repositories - 686 thousand downloads last month - 389 stars on GitHub - 8 maintainers
databricks-utils 0.0.7
Ease-of-use utility tools for databricks notebooks.6 versions - Latest release: almost 6 years ago - 1 dependent repositories - 296 thousand downloads last month - 1 stars on GitHub - 2 maintainers
graphlet 0.1.1
Graphlet AI Knowledge Graph Factory1 version - Latest release: almost 2 years ago - 12 downloads last month - 27 stars on GitHub - 1 maintainer
spark-emr 0.1.2
Run python packages on AWS EMR2 versions - Latest release: about 5 years ago - 1 dependent repositories - 20 downloads last month - 0 stars on GitHub - 2 maintainers
Top 2.3% on pypi.org
86 versions - Latest release: over 1 year ago - 4 dependent packages - 26 dependent repositories - 36.9 thousand downloads last month - 1,752 stars on GitHub - 2 maintainers
petastorm 0.12.1
Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Py...86 versions - Latest release: over 1 year ago - 4 dependent packages - 26 dependent repositories - 36.9 thousand downloads last month - 1,752 stars on GitHub - 2 maintainers
firespark 0.0.32
FireSpark data processing utility library16 versions - Latest release: almost 4 years ago - 1 dependent repositories - 140 downloads last month - 1,752 stars on GitHub - 2 maintainers
hops-petastorm 0.9.4
Petastorm is a library enabling the use of Parquet storage from Tensorflow, Pytorch, and other Py...6 versions - Latest release: over 3 years ago - 1 dependent repositories - 93 downloads last month - 1,752 stars on GitHub - 3 maintainers
Top 6.7% on pypi.org
7 versions - Latest release: almost 5 years ago - 3 dependent repositories - 156 thousand downloads last month - 183 stars on GitHub - 2 maintainers
handyspark 0.2.2a1
HandySpark - bringing pandas-like capabilities to Spark dataframes7 versions - Latest release: almost 5 years ago - 3 dependent repositories - 156 thousand downloads last month - 183 stars on GitHub - 2 maintainers
fifeforspark 0.0.2
Finite-Interval Forecasting Engine for Spark: Machine learning models for discrete-time survival ...2 versions - Latest release: about 2 years ago - 1 dependent repositories - 55 downloads last month - 3 stars on GitHub - 6 maintainers
google-dataproc-templates 0.1.0
Google Dataproc templates written in Python8 versions - Latest release: about 1 year ago - 29.6 thousand downloads last month - 111 stars on GitHub - 3 maintainers
turntable-spoonbill 9.0.5
Productivity-centric Python Big Data Framework4 versions - Latest release: about 1 month ago - 106 downloads last month - 4,254 stars on GitHub - 2 maintainers
Top 6.8% on pypi.org
2 versions - Latest release: over 1 year ago - 40 stars on GitHub
dataproc-templates 0.0.2 removed
Dataproc templates written in Python2 versions - Latest release: over 1 year ago - 40 stars on GitHub
Top 1.4% on pypi.org
90 versions - Latest release: 9 days ago - 13 dependent packages - 130 dependent repositories - 191 thousand downloads last month - 3,333 stars on GitHub - 6 maintainers
ibis-framework 9.0.0
The portable Python dataframe library90 versions - Latest release: 9 days ago - 13 dependent packages - 130 dependent repositories - 191 thousand downloads last month - 3,333 stars on GitHub - 6 maintainers
Top 9.4% on pypi.org
32 versions - Latest release: over 1 year ago - 1 dependent repositories - 346 downloads last month - 1,441 stars on GitHub - 2 maintainers
pyoptimus 0.1.0
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.32 versions - Latest release: over 1 year ago - 1 dependent repositories - 346 downloads last month - 1,441 stars on GitHub - 2 maintainers
repartipy 0.1.8
Helper for handling PySpark DataFrame partition size 📑🎛️6 versions - Latest release: 2 months ago - 505 downloads last month - 2 stars on GitHub - 2 maintainers
jsonspark 0.0.2
This is a wrapper package for pyspark to process json files. It pythonifies the json pyspark object.2 versions - Latest release: almost 3 years ago - 1 dependent repositories - 26 downloads last month - 3 stars on GitHub - 2 maintainers
cuallee 0.10.1
Python library for data validation on DataFrame APIs including Snowflake/Snowpark, Apache/PySpark...74 versions - Latest release: 9 days ago - 1 dependent package - 1 dependent repositories - 11.2 thousand downloads last month - 109 stars on GitHub - 2 maintainers
imnet 0.2.1
imNet: a Sequence Network Construction Toolkit5 versions - Latest release: almost 4 years ago - 2 dependent repositories - 30 downloads last month - 16 stars on GitHub - 2 maintainers
Top 3.3% on pypi.org
16 versions - Latest release: 3 months ago - 3 dependent packages - 11 dependent repositories - 781 thousand downloads last month - 580 stars on GitHub - 2 maintainers
quinn 0.10.3
Pyspark helper methods to maximize developer efficiency16 versions - Latest release: 3 months ago - 3 dependent packages - 11 dependent repositories - 781 thousand downloads last month - 580 stars on GitHub - 2 maintainers
scopt 0.0.5
Calculate optimized properties of Spark configuration5 versions - Latest release: about 2 years ago - 1 dependent repositories - 213 downloads last month - 5 stars on GitHub - 1 maintainer
pbspark 0.9.0
Convert between protobuf messages and pyspark dataframes11 versions - Latest release: 11 months ago - 1 dependent repositories - 276 thousand downloads last month - 21 stars on GitHub - 1 maintainer
pramen-py 1.8.6
Pramen transformations written in python28 versions - Latest release: 3 days ago - 295 downloads last month - 22 stars on GitHub - 5 maintainers
Top 1.4% on pypi.org
141 versions - Latest release: about 1 month ago - 31 dependent packages - 35 dependent repositories - 4.06 million downloads last month - 3,700 stars on GitHub - 3 maintainers
spark-nlp 5.3.3
John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML...141 versions - Latest release: about 1 month ago - 31 dependent packages - 35 dependent repositories - 4.06 million downloads last month - 3,700 stars on GitHub - 3 maintainers
Top 10.0% on pypi.org
1 version - Latest release: over 1 year ago - 85 downloads last month - 3,066 stars on GitHub - 1 maintainer
ckls-test-lib 4.2.7 removed
John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML...1 version - Latest release: over 1 year ago - 85 downloads last month - 3,066 stars on GitHub - 1 maintainer
bpd 2.0.2
bpd8 versions - Latest release: over 1 year ago - 140 downloads last month - 4 stars on GitHub - 2 maintainers
pysparkifier 0.9.1
Streamlined pyspark usage3 versions - Latest release: over 1 year ago - 1 dependent repositories - 16 downloads last month - 1 stars on GitHub - 1 maintainer
fink-science 3.13.3
User-defined science module for the Fink broker.62 versions - Latest release: over 1 year ago - 5 dependent repositories - 577 downloads last month - 10 stars on GitHub - 1 maintainer
pysparkgateway 0.0.22
Connect Pyspark to remote clusters18 versions - Latest release: about 3 years ago - 1 dependent repositories - 18.1 thousand downloads last month - 3 stars on GitHub - 2 maintainers
compars 0.0.0
DataFrame comparison done right (AKA the Bear-agnostic DataFrame comparison library)1 version - Latest release: 19 days ago - 156 downloads last month - 0 stars on GitHub - 2 maintainers
sparkorm 1.2.16
SparkORM: Python Spark SQL & DataFrame schema management and basic Object Relational Mapping.20 versions - Latest release: about 1 month ago - 386 thousand downloads last month - 9 stars on GitHub - 2 maintainers
spark-pager 1.1.2
A Python library for sending notifications on Spark Job Status.3 versions - Latest release: almost 3 years ago - 1 dependent repositories - 187 downloads last month - 0 stars on GitHub - 2 maintainers
dcborow-mmlspark 0.14.dev1
Microsoft ML for Spark1 version - Latest release: about 4 years ago - 1 dependent repositories - 58 downloads last month - 4,972 stars on GitHub - 2 maintainers
finkfilters 0.1.4
User-defined filters for the Fink broker.2 versions - Latest release: over 4 years ago - 1 dependent repositories - 7 downloads last month - 1 stars on GitHub - 2 maintainers
spark-map 0.2.78
Pyspark implementation of `map()` function for spark DataFrames3 versions - Latest release: 12 months ago - 419 downloads last month - 1 stars on GitHub - 1 maintainer
tgedr-pycode 0.0.24
python handy code23 versions - Latest release: 4 months ago - 130 downloads last month - 2 maintainers
artan 0.5.1
Online latent state estimation with Apache Spark.9 versions - Latest release: over 3 years ago - 22 downloads last month - 5 stars on GitHub - 1 maintainer
pyspark-data-mocker 3.0.0
Mock a datalake easily to be able to test your pyspark data application10 versions - Latest release: about 1 month ago - 72 downloads last month - 0 stars on GitHub - 2 maintainers
autofeatures 1.0.2
PySpark Auto Feature Selector3 versions - Latest release: almost 4 years ago - 116 downloads last month - 8 stars on GitHub - 1 maintainer
Top 8.6% on pypi.org
1 version - Latest release: almost 3 years ago - 1 dependent repositories - 38.6 thousand downloads last month - 194 stars on GitHub - 2 maintainers
spark-df-profiling-new 1.1.14
Create HTML profiling reports from Apache Spark DataFrames1 version - Latest release: almost 3 years ago - 1 dependent repositories - 38.6 thousand downloads last month - 194 stars on GitHub - 2 maintainers
smartdriveml 0.0.12
SmartDriveML: Driving Industrial Projects with Affordable ML Solutions, Frameworks and Cloud Choices4 versions - Latest release: 10 months ago - 15 downloads last month - 2 maintainers
pyspark-connectors 0.2.0
The easy and quickly way to connect and integrate the Spark project with many others data sources.8 versions - Latest release: almost 2 years ago - 113 downloads last month - 5 stars on GitHub - 2 maintainers
codeme 0.1.9
CodeMe - Automatic Python Coder20 versions - Latest release: over 1 year ago - 247 downloads last month - 1 stars on GitHub - 2 maintainers
spalah 1.0.6
Spalah is a set of PySpark dataframe helpers12 versions - Latest release: 4 months ago - 19 downloads last month - 7 stars on GitHub - 2 maintainers
marshmallow-pyspark 0.2.4
PySpark data serializer6 versions - Latest release: 4 months ago - 1 dependent repositories - 2.14 thousand downloads last month - 12 stars on GitHub - 2 maintainers
glue-utils 0.2.1
Reusable utilities for working with Glue PySpark jobs10 versions - Latest release: 7 days ago - 473 downloads last month - 1 stars on GitHub - 2 maintainers
markdown-frames 1.0.6
Markdown tables parsing to pyspark / pandas DataFrames7 versions - Latest release: over 1 year ago - 1 dependent repositories - 24.2 thousand downloads last month - 3 stars on GitHub - 6 maintainers
spark-scaffolder-transforms-tools 0.0.1
spark_scaffolder_transforms_tools1 version - Latest release: about 1 month ago - 173 downloads last month - 2 maintainers
waterfall-logging 0.1.0
Waterfall statistic logging for data quality or filtering steps.1 version - Latest release: about 1 year ago - 15 downloads last month - 2 stars on GitHub - 2 maintainers
tmlt-core 0.13.0
Tumult's differential privacy primitives40 versions - Latest release: about 1 month ago - 5.72 thousand downloads last month - 10 stars on GitLab.com - 1 maintainer
sparkpickle 1.0.1
Provides functions for reading SequenceFile-s with Python pickles.2 versions - Latest release: over 7 years ago - 1 dependent repositories - 5.3 thousand downloads last month - 24 stars on GitHub - 1 maintainer
pyspark-bucketmap 0.0.5
Easily group pyspark data into buckets and map them to different values.4 versions - Latest release: over 1 year ago - 51 downloads last month - 1 stars on GitHub - 2 maintainers
pysparkplus 0.0.3
Pyspark extra functions!3 versions - Latest release: about 1 year ago - 9 downloads last month - 2 stars on GitHub - 2 maintainers
pypair 3.0.9 💰
Pairwise association measures of statistical variable types11 versions - Latest release: over 2 years ago - 1 dependent repositories - 954 downloads last month - 21 stars on GitHub - 2 maintainers
pyspark-testing 0.0.5
Testing Framework for PySpark2 versions - Latest release: about 4 years ago - 1 dependent repositories - 179 downloads last month - 3 stars on GitHub - 2 maintainers
pyspark-sugar 0.4.1
SparkUI enchancements with pyspark4 versions - Latest release: about 5 years ago - 1 dependent repositories - 27.1 thousand downloads last month - 5 stars on GitHub - 2 maintainers
Top 6.6% on pypi.org
13 versions - Latest release: over 7 years ago - 2 dependent repositories - 53.9 thousand downloads last month - 194 stars on GitHub - 2 maintainers
spark-df-profiling 1.1.13
Create HTML profiling reports from Apache Spark DataFrames13 versions - Latest release: over 7 years ago - 2 dependent repositories - 53.9 thousand downloads last month - 194 stars on GitHub - 2 maintainers
metaspore 1.1.0
Metaspore: A Unified End-to-end Machine Intelligence Platform3 versions - Latest release: over 1 year ago - 10 downloads last month - 626 stars on GitHub - 4 maintainers
tidypyspark 0.0.1
dplyr for pyspark1 version - Latest release: about 1 year ago - 17 downloads last month - 14 stars on GitHub - 4 maintainers
yummy 0.0.11
14 versions - Latest release: 3 months ago - 1 dependent repositories - 64 downloads last month - 30 stars on GitHub - 1 maintainerydot 0.0.6 💰
R-like formulas for Spark Dataframes6 versions - Latest release: over 3 years ago - 1 dependent repositories - 37 downloads last month - 10 stars on GitHub - 2 maintainers
spark-lean 0.3.3
An interactive PySpark-based Data Cleaning Library4 versions - Latest release: about 6 years ago - 1 dependent repositories - 20 downloads last month - 7 stars on GitHub - 4 maintainers
Top 9.9% on pypi.org
11 versions - Latest release: almost 2 years ago - 1 dependent package - 1 dependent repositories - 46.3 thousand downloads last month - 60 stars on GitHub - 2 maintainers
soda-spark 0.3.3
Soda SQL API for PySpark data frame11 versions - Latest release: almost 2 years ago - 1 dependent package - 1 dependent repositories - 46.3 thousand downloads last month - 60 stars on GitHub - 2 maintainers
sourced-jgit-spark-connector 2.0.1
Engine to use Spark on top of source code repositories.2 versions - Latest release: over 5 years ago - 1 dependent repositories - 26 downloads last month - 71 stars on GitHub - 2 maintainers
fink-filters 0.2.18
User-defined filters for the Fink broker.65 versions - Latest release: over 2 years ago - 3 dependent repositories - 740 downloads last month - 1 stars on GitHub - 1 maintainer
Top 1.9% on pypi.org
54 versions - Latest release: 8 months ago - 2 dependent packages - 93 dependent repositories - 99 thousand downloads last month - 1,286 stars on GitHub - 4 maintainers
autovizwidget 0.21.0
AutoVizWidget: An Auto-Visualization library for pandas dataframes54 versions - Latest release: 8 months ago - 2 dependent packages - 93 dependent repositories - 99 thousand downloads last month - 1,286 stars on GitHub - 4 maintainers
Top 1.7% on pypi.org
53 versions - Latest release: 8 months ago - 2 dependent packages - 92 dependent repositories - 99.8 thousand downloads last month - 1,286 stars on GitHub - 4 maintainers
hdijupyterutils 0.21.0
HdiJupyterUtils: Utils for Jupyter projects from HDInsight team53 versions - Latest release: 8 months ago - 2 dependent packages - 92 dependent repositories - 99.8 thousand downloads last month - 1,286 stars on GitHub - 4 maintainers
Top 1.7% on pypi.org
56 versions - Latest release: 8 months ago - 4 dependent packages - 86 dependent repositories - 50.6 thousand downloads last month - 1,273 stars on GitHub - 5 maintainers
sparkmagic 0.21.0
SparkMagic: Spark execution via Livy56 versions - Latest release: 8 months ago - 4 dependent packages - 86 dependent repositories - 50.6 thousand downloads last month - 1,273 stars on GitHub - 5 maintainers
pyspark-supp 0.1.0
Data Engineer Support PySpark Library2 versions - Latest release: 12 months ago - 266 downloads last month - 1 stars on GitHub - 2 maintainers
Top 2.9% on pypi.org
14 versions - Latest release: almost 9 years ago - 13 dependent packages - 41 dependent repositories - 299 thousand downloads last month - 376 stars on GitHub - 1 maintainer
tdigest 0.4.0 💰
T-Digest data structure14 versions - Latest release: almost 9 years ago - 13 dependent packages - 41 dependent repositories - 299 thousand downloads last month - 376 stars on GitHub - 1 maintainer
data-manipulation 0.37
Powerful data manipulation34 versions - Latest release: 4 months ago - 1 dependent repositories - 279 downloads last month - 2 stars on GitHub - 2 maintainers
pagaya-mapinpandas 0.5
Easy python wrapper for Spark mapInPandas, applyInPandas1 version - Latest release: almost 2 years ago - 9 downloads last month - 1 maintainer
pyspark-extension 2.12.0.3.5
A library that provides useful extensions to Apache Spark.45 versions - Latest release: 13 days ago - 50.6 thousand downloads last month - 171 stars on GitHub - 2 maintainers
spinelibs 0.0.17
Libs for spine project7 versions - Latest release: 4 months ago - 1 dependent package - 86 downloads last month - 2 stars on GitHub - 2 maintainers
ibmaemagic 0.0.4
Make accessing IBM Analytic Engine easier.4 versions - Latest release: over 3 years ago - 1 dependent repositories - 30 downloads last month - 6 maintainers
pysparkutils 0.2.5
A collection of utilities for handling pySpark's SparkContext14 versions - Latest release: over 6 years ago - 1 dependent repositories - 42 downloads last month - 2 stars on GitHub - 2 maintainers
exelog 0.0.1
Enabling meticulous logging for Spark Applications1 version - Latest release: over 2 years ago - 1 dependent repositories - 18 downloads last month - 5 stars on GitHub - 2 maintainers
meteo-spark 0.1.0 removed 💰
A python package to process climate scientific files using pyspark.3 versions - Latest release: over 2 years ago - 1 dependent repositories - 12 downloads last month - 0 stars on GitHub - 1 maintainer
scimple 1.11.5
Scimplify Ploting, Graph manipulation, Spark Streaming & Kafka and other tools20 versions - Latest release: almost 4 years ago - 1 dependent repositories - 33 downloads last month - 1 stars on GitHub - 2 maintainers
spinecore 0.0.20
The core lib of spine library10 versions - Latest release: 4 months ago - 2 dependent packages - 100 downloads last month - 2 stars on GitHub - 2 maintainers
pyspark-spy 1.0.2
Collect and aggregate on spark events for profitz. In 🐍 way!3 versions - Latest release: about 3 years ago - 1 dependent repositories - 218 downloads last month - 10 stars on GitHub - 2 maintainers
sourced-spark-api 0.0.12
API to use Spark on top of source code repositories.10 versions - Latest release: over 6 years ago - 1 dependent repositories - 35 downloads last month - 71 stars on GitHub - 2 maintainers
sparksnake 0.2.2
Improving the development of Spark applications deployed as jobs on AWS services like Glue and EMR24 versions - Latest release: 10 months ago - 23.3 thousand downloads last month - 12 stars on GitHub - 2 maintainers
watson-transformer 0.0.17
wrap Watson API into pyspark transformers15 versions - Latest release: over 2 years ago - 1 dependent repositories - 143 downloads last month - 2 maintainers
pyspine 0.0.14
Spine: The backbone of your project3 versions - Latest release: about 1 year ago - 49 downloads last month - 2 stars on GitHub - 2 maintainers
Top 3.7% on pypi.org
38 versions - Latest release: almost 4 years ago - 2 dependent packages - 146 dependent repositories - 211 thousand downloads last month - 114 stars on GitHub - 2 maintainers
pyspark-stubs 3.0.0 💰
A collection of the Apache Spark stub files38 versions - Latest release: almost 4 years ago - 2 dependent packages - 146 dependent repositories - 211 thousand downloads last month - 114 stars on GitHub - 2 maintainers
microdrill 0.0.3
Simple Apache Drill alternative using PySpark3 versions - Latest release: about 8 years ago - 2 dependent repositories - 13 downloads last month - 7 stars on GitHub - 3 maintainers
Top 7.7% on pypi.org
44 versions - Latest release: 10 days ago - 2 dependent packages - 5 dependent repositories - 683 downloads last month - 43 stars on GitHub - 14 maintainers
cluster-pack 0.3.7
A library on top of either pex or conda-packto make your Python code easily available on a cluster44 versions - Latest release: 10 days ago - 2 dependent packages - 5 dependent repositories - 683 downloads last month - 43 stars on GitHub - 14 maintainers
typedspark 1.4.2
Column-wise type annotations for pyspark DataFrames38 versions - Latest release: 10 days ago - 10.6 thousand downloads last month - 46 stars on GitHub - 2 maintainers
bigdatasml 0.1.3
This package calculates average student performances3 versions - Latest release: over 2 years ago - 40 downloads last month - 2 maintainers
spark-xarray 0.1.dev0
This is an experimental project that seeks to integrate PySpark and xarray for Climate Data Analy...1 version - Latest release: 5 months ago - 20 downloads last month - 22 stars on GitHub - 2 maintainers
namedframes 0.1.4
Named Data Frames3 versions - Latest release: over 3 years ago - 1 dependent package - 1 dependent repositories - 36 downloads last month - 0 stars on GitHub - 2 maintainers
spark-eda 0.0.2
Exploratory data analysis for pyspark1 version - Latest release: over 2 years ago - 1 dependent repositories - 14 downloads last month - 0 stars on GitHub - 1 maintainer
sparglim 0.2.1 💰
sparglim16 versions - Latest release: 3 months ago - 1 dependent package - 1 dependent repositories - 95 downloads last month - 28 stars on GitHub - 2 maintainers
mack 0.5.0
Delta Lake helper methods in PySpark5 versions - Latest release: 3 months ago - 9.8 thousand downloads last month - 265 stars on GitHub - 2 maintainers
Related Keywords
spark
113
python
73
machine-learning
36
big-data
32
scala
31
distributed
23
machine learning
23
apache-spark
23
pandas
22
modeling
21
big data
21
statistical analysis
20
parallel
20
data mining
20
h2o
20
integration
20
pysparkling
20
rsparkling
20
databricks
19
dataframe
18
data-science
15
data
11
data-engineering
11
bigdata
11
polars
10
scoring
10
python3
10
jupyter
9
aws
8
deep-learning
8
dask
8
etl
7
data-analysis
7
pytorch
6
jupyter-notebook
6
tensorflow
6
sql
6
pyarrow
5
mysql
5
test
5
data-quality
5
ipython
5
ai
5
azure
5
graphs
4
parquet
4
pipeline
4
distributed-computing
4
glue
4
Spark
4
testing
4
bigquery
4
notebook
4
workflow
4
learning
4
cluster
4
pipelines
4
data-cleaning
4
postgres
4
pandas-dataframe
4
schema
4
analysis
4
magic
4
snowflake
3
sqlalchemy
3
postgresql
3
parquet-files
3
astronomy
3
data-profiling
3
streaming
3
sysml
3
hacktoberfest
3
hadoop
3
spark-sql
3
mssql
3
impala
3
gcp
3
report
3
clickhouse
3
apachespark
3
deltalake
3
preprocessing
3
s3
3
kernel
3
kerberos
3
dataframes
3
compare
3
synapse
3
opencv
3
onnx
3
model-deployment
3
ml
3
microsoft
3
lightgbm
3
http
3
cognitive-services
3
processing
3
dag
3
AWS
3
framework
3