Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "spark" keyword

Top 1.4% on pypi.org
impyla 0.19.0
Python client for the Impala distributed query engine
52 versions - Latest release: 6 months ago - 29 dependent packages - 251 dependent repositories - 689 thousand downloads last month - 723 stars on GitHub - 13 maintainers
recordflux 0.21.0
A toolset for the formal specification and generation of verifiable binary parsers, message gener...
23 versions - Latest release: 25 days ago - 1 dependent package - 1 dependent repositories - 304 downloads last month - 101 stars on GitHub - 3 maintainers
Top 5.5% on pypi.org
sk-dist 0.1.9
Distributed scikit-learn meta-estimators with PySpark
10 versions - Latest release: about 4 years ago - 4 dependent repositories - 439 thousand downloads last month - 286 stars on GitHub - 1 maintainer
spark-yarn-submit 1.0.0
library to handle spark job submit in a yarn cluster in different environment
1 version - Latest release: over 7 years ago - 1 dependent repositories - 12 downloads last month - 3 stars on GitHub - 1 maintainer
tinsel 0.3.0
PySpark schema generator
3 versions - Latest release: over 5 years ago - 1 dependent repositories - 214 thousand downloads last month - 1 maintainer
Top 8.7% on pypi.org
jumpy 0.2.4
Numpy and nd4j interop
6 versions - Latest release: over 5 years ago - 4 dependent repositories - 90 downloads last month - 13,453 stars on GitHub - 3 maintainers
pydatavec 0.1.2
Python interface for DataVec
2 versions - Latest release: over 4 years ago - 1 dependent package - 1 dependent repositories - 37 downloads last month - 13,290 stars on GitHub - 1 maintainer
lython 1.0
Lisp dialect compiler to Python byte-code
1 version - Latest release: 9 months ago - 1 dependent repositories - 1 maintainer
Top 1.2% on pypi.org
koalas 1.8.2
Koalas: pandas API on Apache Spark
47 versions - Latest release: over 2 years ago - 11 dependent packages - 444 dependent repositories - 2.24 million downloads last month - 3,308 stars on GitHub - 7 maintainers
bigdl-llm 2.4.0
Large Language Model Develop Toolkit
332 versions - Latest release: 6 months ago - 24.6 thousand downloads last month - 4,693 stars on GitHub - 1 maintainer
aj-zsl-nlu 4.2.0
John Snow Labs NLU provides state of the art algorithms for NLP&NLU with 10000+ of pretrained mod...
1 version - Latest release: 10 months ago - 30 downloads last month - 821 stars on GitHub - 1 maintainer
nlu-by-samed 5.1.4
John Snow Labs NLU provides state of the art algorithms for NLP&NLU with 20000+ of pretrained mod...
2 versions - Latest release: 3 months ago - 19 downloads last month - 821 stars on GitHub - 1 maintainer
shailesh-text-gen 4.2.1
John Snow Labs NLU provides state of the art algorithms for NLP&NLU with 10000+ of pretrained mod...
1 version - Latest release: 10 months ago - 19 downloads last month - 821 stars on GitHub - 1 maintainer
nlu-ocr-shailesh 5.0.0
John Snow Labs NLU provides state of the art algorithms for NLP&NLU with 20000+ of pretrained mod...
1 version - Latest release: 7 months ago - 20 downloads last month - 821 stars on GitHub - 1 maintainer
shailesh-bart 4.2.1
John Snow Labs NLU provides state of the art algorithms for NLP&NLU with 10000+ of pretrained mod...
1 version - Latest release: 10 months ago - 22 downloads last month - 821 stars on GitHub - 1 maintainer
nlu-by-ckl 5.0.2rc1
John Snow Labs NLU provides state of the art algorithms for NLP&NLU with 20000+ of pretrained mod...
15 versions - Latest release: 8 months ago - 2 dependent packages - 154 downloads last month - 821 stars on GitHub - 1 maintainer
table-extractor-new 5.1.0
John Snow Labs NLU provides state of the art algorithms for NLP&NLU with 20000+ of pretrained mod...
1 version - Latest release: 5 months ago - 28 downloads last month - 821 stars on GitHub - 1 maintainer
nlu-spark23 1.1.1rc2
John Snow Labs NLU provides state of the art algorithms for NLP&NLU with hundreds of pretrained m...
1 version - Latest release: over 3 years ago - 1 dependent repositories - 20 downloads last month - 821 stars on GitHub - 2 maintainers
Top 2.9% on pypi.org
nlu 5.3.1
John Snow Labs NLU provides state of the art algorithms for NLP&NLU with 20000+ of pretrained mod...
127 versions - Latest release: 18 days ago - 13 dependent packages - 8 dependent repositories - 18.8 thousand downloads last month - 821 stars on GitHub - 2 maintainers
shailesh 4.2.1
John Snow Labs NLU provides state of the art algorithms for NLP&NLU with 10000+ of pretrained mod...
1 version - Latest release: 10 months ago - 18 downloads last month - 814 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
databricks-connect 14.3.2
Databricks Connect Client
211 versions - Latest release: 11 days ago - 15 dependent packages - 64 dependent repositories - 1.02 million downloads last month - 37,738 stars on GitHub - 19 maintainers
freeza-offset 1.0.10
Spark stream consumption commit in kafka consumer group
10 versions - Latest release: almost 4 years ago - 1 dependent repositories - 964 downloads last month - 14 stars on GitHub - 1 maintainer
Top 8.9% on pypi.org
pyspark-hnsw 1.1.0
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World gr...
30 versions - Latest release: over 1 year ago - 1 dependent repositories - 31.4 thousand downloads last month - 242 stars on GitHub - 1 maintainer
pyspark-hyperloglog 2.1.1
PySpark UDFs for HyperLogLog
1 version - Latest release: almost 6 years ago - 3 dependent repositories - 64 downloads last month - 6 stars on GitHub - 1 maintainer
Top 5.7% on pypi.org
dbldatagen 0.3.6
Databricks Labs - PySpark Synthetic Data Generator
13 versions - Latest release: 3 months ago - 1 dependent package - 2 dependent repositories - 173 thousand downloads last month - 268 stars on GitHub - 1 maintainer
sparkdantic 0.20.5
A pydantic -> spark schema library
30 versions - Latest release: 2 months ago - 32.9 thousand downloads last month - 268 stars on GitHub - 1 maintainer
Top 0.9% on pypi.org
delta-spark 3.2.0
Python APIs for using Delta Lake with Apache Spark
19 versions - Latest release: 10 days ago - 38 dependent packages - 90 dependent repositories - 11.2 million downloads last month - 6,958 stars on GitHub - 6 maintainers
jupyterlab-spark-ui-tab 0.0.14
Spark UI extension for jupyterlab
9 versions - Latest release: about 5 years ago - 1 dependent repositories - 91 downloads last month - 8 stars on GitHub - 1 maintainer
sparkmonitor-s 0.0.22
Spark Monitor Extension for Jupyter Notebook
48 versions - Latest release: almost 3 years ago - 1 dependent repositories - 250 downloads last month - 172 stars on GitHub - 2 maintainers
Top 3.9% on pypi.org
splink 3.9.14
Fast probabilistic data linkage at scale
129 versions - Latest release: about 2 months ago - 3 dependent packages - 4 dependent repositories - 108 thousand downloads last month - 1,072 stars on GitHub - 4 maintainers
ppextensions 0.0.6
PPExtenions - Set of iPython and Jupyter extensions
5 versions - Latest release: almost 5 years ago - 1 dependent repositories - 62 downloads last month - 50 stars on GitHub - 7 maintainers
spark-ml-utils 0.0.3
Some spark ml utilities, for easy checking/modifying spark pipeline, extracting feature importanc...
3 versions - Latest release: over 2 years ago - 1 dependent repositories - 135 downloads last month - 1 stars on GitHub - 1 maintainer
dummy_spark 0.0.1
A pure python mocked version of pyspark's rdd class
1 version - Latest release: almost 8 years ago - 32 downloads last month - 27 stars on GitHub - 1 maintainer
sqlglot-doris 1.1.9
An easily customizable SQL parser and transpiler
42 versions - Latest release: 9 days ago - 513 downloads last month - 4,253 stars on GitHub - 1 maintainer
pathling 6.4.2
Python API for Pathling
44 versions - Latest release: 5 months ago - 1 dependent repositories - 781 downloads last month - 78 stars on GitHub - 1 maintainer
nessiedemo 0.0.19
Project Nessie Demos Helper
4 versions - Latest release: almost 3 years ago - 1 dependent repositories - 41 downloads last month - 824 stars on GitHub - 5 maintainers
Top 1.4% on pypi.org
spark-nlp 5.3.3
John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML...
141 versions - Latest release: about 1 month ago - 35 dependent packages - 35 dependent repositories - 4.1 million downloads last month - 3,715 stars on GitHub - 3 maintainers
Top 10.0% on pypi.org
ckls-test-lib 4.2.7 removed
John Snow Labs Spark NLP is a natural language processing library built on top of Apache Spark ML...
1 version - Latest release: over 1 year ago - 85 downloads last month - 3,066 stars on GitHub - 1 maintainer
Top 9.4% on pypi.org
pyoptimus 0.1.0
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.
32 versions - Latest release: over 1 year ago - 1 dependent repositories - 371 downloads last month - 1,441 stars on GitHub - 2 maintainers
Top 5.2% on pypi.org
flytekitplugins-deck-standard 1.12.0
This Plugin provides more renderers to improve task visibility
141 versions - Latest release: 13 days ago - 3 dependent repositories - 22 thousand downloads last month - 200 stars on GitHub - 1 maintainer
Top 9.5% on pypi.org
flytekitplugins-whylogs 1.12.0
Enable the use of whylogs profiles to be used in flyte tasks to get aggregate statistics about data.
104 versions - Latest release: 13 days ago - 1 dependent repositories - 5.37 thousand downloads last month - 200 stars on GitHub - 1 maintainer
Top 8.4% on pypi.org
flytekitplugins-data-fsspec 1.12.0
This is a deprecated plugin as of flytekit 1.5
201 versions - Latest release: 13 days ago - 1 dependent repositories - 2.42 thousand downloads last month - 200 stars on GitHub - 1 maintainer
Top 9.6% on pypi.org
flytekitplugins-huggingface 1.12.0
Hugging Face plugin for flytekit
96 versions - Latest release: 13 days ago - 1 dependent repositories - 8.43 thousand downloads last month - 200 stars on GitHub - 1 maintainer
Top 2.4% on pypi.org
flytekit 1.12.0
Flyte SDK for Python
376 versions - Latest release: 13 days ago - 53 dependent packages - 41 dependent repositories - 290 thousand downloads last month - 200 stars on GitHub - 7 maintainers
Top 8.9% on pypi.org
flytekitplugins-dbt 1.12.0
DBT Plugin for Flytekit
96 versions - Latest release: 13 days ago - 1 dependent repositories - 2.77 thousand downloads last month - 200 stars on GitHub - 1 maintainer
flytekitplugins-identity-aware-proxy 1.12.0
External command plugin to generate ID tokens for GCP Identity Aware Proxy
31 versions - Latest release: 13 days ago - 3.27 thousand downloads last month - 200 stars on GitHub - 1 maintainer
Top 8.5% on pypi.org
flytekitplugins-dask 1.12.0
Dask plugin for flytekit
69 versions - Latest release: 13 days ago - 1 dependent repositories - 2.89 thousand downloads last month - 200 stars on GitHub - 1 maintainer
dummyrdd 0.1.2
A pure python mocked version of pyspark's rdd class
11 versions - Latest release: almost 7 years ago - 1 dependent repositories - 51 downloads last month - 27 stars on GitHub - 1 maintainer
flytekitplugins-pydantic 1.12.0
Plugin adding type support for Pydantic models
32 versions - Latest release: 13 days ago - 3.71 thousand downloads last month - 200 stars on GitHub - 1 maintainer
Top 8.3% on pypi.org
pyrasterframes 0.11.1
Access and process geospatial raster data in PySpark DataFrames
15 versions - Latest release: about 1 year ago - 1 dependent repositories - 4.58 thousand downloads last month - 205 stars on GitHub - 2 maintainers
gor-pyspark 3.22.6
Python helper function for gor-spark
13 versions - Latest release: almost 2 years ago - 1 dependent repositories - 129 downloads last month - 0 stars on GitHub - 1 maintainer
Top 2.8% on pypi.org
visions 0.7.6
Visions
28 versions - Latest release: 3 months ago - 6 dependent packages - 722 dependent repositories - 1.75 million downloads last month - 198 stars on GitHub - 2 maintainers
seipy 1.3.2
Helper functions for data science
4 versions - Latest release: almost 6 years ago - 1 dependent repositories - 101 downloads last month - 4 stars on GitHub - 1 maintainer
Top 5.7% on pypi.org
pynessie 0.67.0
Project Nessie: Transactional Catalog for Data Lakes with Git-like semantics
79 versions - Latest release: 3 months ago - 8 dependent repositories - 2.16 thousand downloads last month - 824 stars on GitHub - 6 maintainers
Top 5.4% on pypi.org
hsfs 3.7.6
HSFS: An environment independent client to interact with the Hopsworks Featurestore
163 versions - Latest release: 16 days ago - 1 dependent package - 15 dependent repositories - 10.8 thousand downloads last month - 51 stars on GitHub - 1 maintainer
pramen-py 1.8.8
Pramen transformations written in python
30 versions - Latest release: 3 days ago - 451 downloads last month - 22 stars on GitHub - 3 maintainers
glue-utils 0.4.0
Reusable utilities for working with Glue PySpark jobs
19 versions - Latest release: 3 days ago - 2.57 thousand downloads last month - 1 stars on GitHub - 1 maintainer
Top 3.3% on pypi.org
synapseml 1.0.4
Synapse Machine Learning
16 versions - Latest release: about 1 month ago - 2 dependent packages - 3 dependent repositories - 233 thousand downloads last month - 4,981 stars on GitHub - 1 maintainer
Top 1.2% on pypi.org
sqlglot 23.13.1
An easily customizable SQL parser and transpiler
521 versions - Latest release: 15 days ago - 104 dependent packages - 272 dependent repositories - 2.86 million downloads last month - 5,389 stars on GitHub - 1 maintainer
Top 1.9% on pypi.org
fugue 0.9.0
An abstraction layer for distributed computation
114 versions - Latest release: 20 days ago - 19 dependent packages - 97 dependent repositories - 687 thousand downloads last month - 1,866 stars on GitHub - 2 maintainers
Top 9.6% on pypi.org
pyspark-test 0.2.0
Check that left and right spark DataFrame are equal.
2 versions - Latest release: over 2 years ago - 5 dependent repositories - 169 thousand downloads last month - 21 stars on GitHub - 1 maintainer
webexteamsarchiver 0.11.3
Room archiver utility for Webex Teams
10 versions - Latest release: about 2 years ago - 1 dependent repositories - 93 downloads last month - 22 stars on GitHub - 1 maintainer
Top 8.8% on pypi.org
zingg 0.4.0
Zingg Entity Resolution, Data Mastering and Deduplication
3 versions - Latest release: 5 months ago - 1 dependent repositories - 1.15 thousand downloads last month - 890 stars on GitHub - 1 maintainer
pymrgeo 1.0.2
MrGeo (pronounced "Mister Geo") is an open source geospatial toolkit designed to provide raster-b...
3 versions - Latest release: almost 7 years ago - 1 dependent repositories - 23 downloads last month - 203 stars on GitHub - 1 maintainer
brewblox-devcon-spark 0.5.2
Communication with Spark controllers
285 versions - Latest release: about 5 years ago - 1 dependent repositories - 1.55 thousand downloads last month - 3 stars on GitHub - 1 maintainer
Top 7.3% on pypi.org
seldon 2.2.5
Seldon Python Utilities
44 versions - Latest release: almost 7 years ago - 2 dependent repositories - 576 downloads last month - 1,476 stars on GitHub - 1 maintainer
Top 5.1% on pypi.org
tensorframes 0.2.9
Integration tools for running deep learning on Spark
2 versions - Latest release: about 6 years ago - 4 dependent repositories - 11.1 thousand downloads last month - 751 stars on GitHub - 1 maintainer
sparkorm 1.2.17
SparkORM: Python Spark SQL & DataFrame schema management and basic Object Relational Mapping.
21 versions - Latest release: 4 days ago - 386 thousand downloads last month - 9 stars on GitHub - 1 maintainer
sparkflow 0.7.0
Deep learning on Spark with Tensorflow
13 versions - Latest release: about 5 years ago - 1 dependent repositories - 362 downloads last month - 300 stars on GitHub - 1 maintainer
intel-optimization-for-horovod 0.5.0
Intel® Optimization for Horovod* is the distributed training framework for TensorFlow* and PyTorch*.
8 versions - Latest release: about 1 year ago - 290 downloads last month - 4 stars on GitHub - 4 maintainers
Top 8.7% on pypi.org
fugue-sql-antlr-cpp 0.2.0
Fugue SQL Antlr C++ Parser
15 versions - Latest release: 6 months ago - 1 dependent repositories - 1.31 thousand downloads last month - 1,866 stars on GitHub - 2 maintainers
aztk 0.10.3
On-demand, Dockerized, Spark Jobs on Azure (powered by Azure Batch)
20 versions - Latest release: over 4 years ago - 1 dependent repositories - 276 downloads last month - 151 stars on GitHub - 1 maintainer
Top 1.9% on pypi.org
graphframes 0.6
GraphFrames: DataFrame-based Graphs
1 version - Latest release: over 5 years ago - 3 dependent packages - 36 dependent repositories - 2.09 million downloads last month - 971 stars on GitHub - 1 maintainer
Top 2.3% on pypi.org
fugue-sql-antlr 0.2.0
Fugue SQL Antlr Parser
16 versions - Latest release: 6 months ago - 2 dependent packages - 110 dependent repositories - 433 thousand downloads last month - 1,866 stars on GitHub - 2 maintainers
spark-submit 1.4.0
Python manager for spark-submit jobs
7 versions - Latest release: about 1 year ago - 1 dependent repositories - 1.29 thousand downloads last month - 10 stars on GitHub - 1 maintainer
cspark-python 0.0.13
Python library for Cisco Spark
4 versions - Latest release: about 7 years ago - 1 dependent repositories - 22 downloads last month - 0 stars on GitHub - 1 maintainer
webexsdk 2.0.5
Community-developed Python SDK for the Webex Teams APIs
6 versions - Latest release: 6 months ago - 64 downloads last month - 1 maintainer
lofn 0.4.0
Lightweight Orchestration For Now: Wrapper for serial tools using Spark and Docker to parallelize
1 version - Latest release: over 6 years ago - 1 dependent repositories - 21 downloads last month - 2 stars on GitHub - 1 maintainer
Top 1.8% on pypi.org
webexteamssdk 1.6.1
Community-developed Python SDK for the Webex Teams APIs
12 versions - Latest release: almost 2 years ago - 11 dependent packages - 1,467 dependent repositories - 265 thousand downloads last month - 229 stars on GitHub - 2 maintainers
typedspark 1.4.2
Column-wise type annotations for pyspark DataFrames
38 versions - Latest release: 19 days ago - 1 dependent package - 11.9 thousand downloads last month - 52 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
mleap 0.23.1
MLeap Python API
15 versions - Latest release: 6 months ago - 3 dependent packages - 61 dependent repositories - 158 thousand downloads last month - 1,494 stars on GitHub - 2 maintainers
pyspark-data-sources 0.1.2
Custom Spark data sources for reading and writing data in Apache Spark, using the Python Data Sou...
3 versions - Latest release: 3 months ago - 47 downloads last month - 38,255 stars on GitHub - 1 maintainer
starlake-orchestration 0.1.2
Starlake Python Distribution For orchestration
6 versions - Latest release: 3 months ago - 2 dependent packages - 58 downloads last month - 31 stars on GitHub - 1 maintainer
starlake-dagster 0.1.2
Starlake Python Distribution For Dagster
5 versions - Latest release: 3 months ago - 1 dependent package - 45 downloads last month - 31 stars on GitHub - 1 maintainer
starlake-airflow 0.1.2
Starlake Python Distribution For Airflow
23 versions - Latest release: 3 months ago - 1 dependent package - 163 downloads last month - 31 stars on GitHub - 1 maintainer
spark-dataframe-tools 0.6.7
spark_dataframe_tools
17 versions - Latest release: about 1 month ago - 12 dependent packages - 351 downloads last month - 1 maintainer
h2o-mlflow-flavor 0.1.0
A mlflow flavor for working with H2O-3 MOJO and POJO models
1 version - Latest release: 6 months ago - 39 downloads last month - 6,710 stars on GitHub - 1 maintainer
Top 3.9% on pypi.org
pytest-spark 0.6.0
pytest plugin to run the tests with support of pyspark.
15 versions - Latest release: about 4 years ago - 7 dependent packages - 71 dependent repositories - 290 thousand downloads last month - 82 stars on GitHub - 1 maintainer
Top 0.1% on pypi.org
pyspark 3.5.1
Apache Spark Python API
44 versions - Latest release: 3 months ago - 588 dependent packages - 6,227 dependent repositories - 29 million downloads last month - 38,255 stars on GitHub - 1 maintainer
pydoris 1.0.5
Python interface to Doris
8 versions - Latest release: 3 months ago - 2 dependent packages - 24.2 thousand downloads last month - 10,473 stars on GitHub - 1 maintainer
pydantic-spark 1.0.1
Converting pydantic classes to spark schemas
6 versions - Latest release: 6 months ago - 2 dependent packages - 1 dependent repositories - 59.3 thousand downloads last month - 22 stars on GitHub - 1 maintainer
dbl-waterbear 0.1.1
Automated provisioning of an industry Lakehouse with enterprise data model
2 versions - Latest release: about 2 years ago - 1 dependent repositories - 12 downloads last month - 9 stars on GitHub - 1 maintainer
lftakakura-mage-ai 0.9.37a1
Mage is a tool for building and deploying data pipelines.
1 version - Latest release: 7 months ago - 15 downloads last month - 6,086 stars on GitHub - 1 maintainer
cz-sqlglot 0.0.1
An easily customizable SQL parser and transpiler
1 version - Latest release: 8 months ago - 26 downloads last month - 4,253 stars on GitHub - 1 maintainer
pyspark-connectby 1.1.3
connectby hierarchy query in spark
6 versions - Latest release: 2 months ago - 656 downloads last month - 38,255 stars on GitHub - 1 maintainer
Top 7.5% on pypi.org
jupyter-enterprise-gateway 3.2.3
A web server for spawning and communicating with remote Jupyter kernels
41 versions - Latest release: 3 months ago - 1 dependent package - 1 dependent repositories - 2.82 thousand downloads last month - 607 stars on GitHub - 4 maintainers
Top 0.7% on pypi.org
horovod 0.28.1
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
73 versions - Latest release: 11 months ago - 13 dependent packages - 327 dependent repositories - 66.2 thousand downloads last month - 13,954 stars on GitHub - 2 maintainers
Top 1.7% on pypi.org
hdijupyterutils 0.21.0
HdiJupyterUtils: Utils for Jupyter projects from HDInsight team
53 versions - Latest release: 8 months ago - 3 dependent packages - 92 dependent repositories - 93.5 thousand downloads last month - 1,286 stars on GitHub - 4 maintainers
Top 0.7% on pypi.org
h2o 3.46.0.2
H2O, Fast Scalable Machine Learning, for python
113 versions - Latest release: 6 days ago - 14 dependent packages - 393 dependent repositories - 323 thousand downloads last month - 6,710 stars on GitHub - 2 maintainers
onetl 0.10.2
One ETL tool to rule them all
14 versions - Latest release: about 2 months ago - 797 downloads last month - 58 stars on GitHub - 2 maintainers