An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "hadoop" keyword

View the packages on the pypi.org package registry that are tagged with the "hadoop" keyword.

Top 7.7% on pypi.org
cluster-pack 0.3.14
A library on top of pex to make your Python code easily available on a cluster
51 versions - Latest release: about 1 month ago - 2 dependent packages - 5 dependent repositories - 1.17 thousand downloads last month - 43 stars on GitHub - 7 maintainers
kiji-bento-cluster 2.0.10
CDH and Datastax Enterprise docker single-node development cluster.
6 versions - Latest release: almost 11 years ago - 2 dependent repositories - 15 downloads last month - 1 maintainer
Top 0.7% on pypi.org
h2o 3.46.0.8
H2O, Fast Scalable Machine Learning, for python
119 versions - Latest release: about 1 month ago - 14 dependent packages - 393 dependent repositories - 194 thousand downloads last month - 6,710 stars on GitHub - 2 maintainers
Top 4.8% on pypi.org
dask-gateway-server 2025.4.0 💰
A multi-tenant server for securely deploying and managing multiple Dask clusters.
23 versions - Latest release: 7 months ago - 1 dependent package - 10 dependent repositories - 3.34 thousand downloads last month - 142 stars on GitHub - 4 maintainers
dvc-hdfs 3.0.0
hdfs plugin for dvc
3 versions - Latest release: almost 2 years ago - 5 dependent packages - 3 dependent repositories - 67.2 thousand downloads last month - 2 stars on GitHub - 3 maintainers
Top 0.5% on pypi.org
luigi 3.6.0
Workflow mgmgt + task scheduling + dependency resolution.
84 versions - Latest release: 11 months ago - 36 dependent packages - 586 dependent repositories - 2.52 million downloads last month - 17,072 stars on GitHub - 14 maintainers
h2o-mlflow-flavor 0.1.0
A mlflow flavor for working with H2O-3 MOJO and POJO models
1 version - Latest release: about 2 years ago - 656 downloads last month - 7,363 stars on GitHub - 1 maintainer
Top 8.6% on pypi.org
h2o-client 3.46.0.8
H2O, Fast Scalable Machine Learning, for python
19 versions - Latest release: about 1 month ago - 3 dependent repositories - 3.13 thousand downloads last month - 7,363 stars on GitHub - 1 maintainer
Top 7.1% on pypi.org
starbase 0.3.3
Python client for HBase Stargate REST server
13 versions - Latest release: about 11 years ago - 14 dependent repositories - 93 downloads last month - 53 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
impyla 0.22.0
Python client for the Impala distributed query engine
61 versions - Latest release: 4 months ago - 29 dependent packages - 251 dependent repositories - 11.2 million downloads last month - 740 stars on GitHub - 13 maintainers
splitlog 4.1.1
Utility to split aggregated logs from Apache Hadoop Yarn applications into a folder hierarchy
16 versions - Latest release: 4 days ago - 32 downloads last month - 0 stars on GitHub - 1 maintainer
hadoop-protoseq 0.0.1
Python library for Hadoop Streaming with support of protobuf sequences
1 version - Latest release: over 4 years ago - 1 dependent repositories - 11 downloads last month - 1 stars on GitHub - 1 maintainer
cobra-policytool 1.1.6
Tool for manage Hadoop access using Apache Atlas and Ranger.
9 versions - Latest release: over 6 years ago - 1 dependent repositories - 73 downloads last month - 16 stars on GitHub - 1 maintainer
sparkdh 0.0.1
1 version - Latest release: almost 4 years ago - 1 dependent repositories - 7 downloads last month - 0 stars on GitHub - 1 maintainer
pymongo_hadoop 1.1.0
UNKNOWN
2 versions - Latest release: over 12 years ago - 3 dependent repositories - 20 downloads last month - 1,519 stars on GitHub - 1 maintainer
sqoopy 0.0.75
UNKNOWN
21 versions - Latest release: almost 10 years ago - 2 dependent repositories - 78 downloads last month - 1 maintainer
python_hiveish 1.1.0
A hive-like interface wrapper around Hadoopy that allows SQL like queries ontop of MapReduce dire...
2 versions - Latest release: about 10 years ago - 2 dependent repositories - 17 downloads last month - 1 stars on GitHub - 1 maintainer
sdctool 0.11.0
Streamsets DataCollector API utility
3 versions - Latest release: over 7 years ago - 1 dependent repositories - 42 downloads last month - 13 stars on GitHub - 1 maintainer
Top 4.6% on pypi.org
snakebite-py3 3.0.6
Pure Python HDFS client
6 versions - Latest release: 9 months ago - 4 dependent packages - 18 dependent repositories - 187 thousand downloads last month - 23 stars on GitHub - 3 maintainers
snakeriver 0.1.3
Another way to think about Hadoop Streaming in Python
4 versions - Latest release: over 12 years ago - 2 dependent repositories - 8 downloads last month - 1 maintainer
luigi-k8s-jobs-runner 2.8.10
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles depende...
11 versions - Latest release: over 5 years ago - 1 dependent repositories - 39 downloads last month - 18,267 stars on GitHub - 1 maintainer
Top 4.6% on pypi.org
hive-thrift-py 0.0.1
Hive Python Thrift Libs
1 version - Latest release: about 13 years ago - 32 dependent repositories - 944 downloads last month - 1 maintainer
Top 9.2% on pypi.org
madoop 1.3.2
A light weight MapReduce framework for education.
14 versions - Latest release: 10 days ago - 2 dependent repositories - 424 downloads last month - 9 stars on GitHub - 1 maintainer
hadopy 0.1.8
Easy parallel map-reduce command line tool
8 versions - Latest release: over 4 years ago - 1 dependent repositories - 45 downloads last month - 7 stars on GitHub - 1 maintainer
dbsync 0.1.1
Sync database to hadoop
1 version - Latest release: over 10 years ago - 1 dependent repositories - 20 downloads last month - 0 stars on GitHub - 1 maintainer
nutch 1.10.3 💰
Apache Nutch Python library
4 versions - Latest release: about 10 years ago - 2 dependent repositories - 34 downloads last month - 39 stars on GitHub - 1 maintainer
clusterdock 2.3.0
clusterdock is a framework for creating Docker-based container clusters
24 versions - Latest release: over 5 years ago - 1 dependent repositories - 68 downloads last month - 29 stars on GitHub - 3 maintainers
trustedanalytics 0.7.3.post20161020785
Trusted Analytics Toolkit
161 versions - Latest release: about 9 years ago - 2 dependent repositories - 125 downloads last month - 43 stars on GitHub - 1 maintainer
Top 7.8% on pypi.org
hbase-python 0.5
User friendly HBase client for Python 3. (Pure python implementation)
5 versions - Latest release: over 7 years ago - 6 dependent repositories - 5.46 thousand downloads last month - 41 stars on GitHub - 1 maintainer
cyhdfs 0.1.3
Cython wrapper around libhdfs
2 versions - Latest release: about 13 years ago - 2 dependent repositories - 23 downloads last month - 1 maintainer
analytics-command-center 3.0.14
Command Center for Data Ingestion, Advanced Analytics and Artificial Intelligence process
1 version - Latest release: almost 4 years ago - 28 downloads last month - 11 stars on GitHub - 1 maintainer
hadoop-mapreduce 0.5
Implementation of Hadoop Mapreduce on text files
4 versions - Latest release: about 3 years ago - 27 downloads last month - 1 stars on GitHub - 1 maintainer
hivejdbc 0.2.3
Hive database driver via jdbc
5 versions - Latest release: almost 5 years ago - 1 dependent repositories - 45.2 thousand downloads last month - 5 stars on GitHub - 1 maintainer
tf-yarn 0.7.0
Distributed TensorFlow or pythorch on a YARN cluster
19 versions - Latest release: about 2 years ago - 2 dependent repositories - 170 downloads last month - 90 stars on GitHub - 7 maintainers
Top 4.4% on pypi.org
nflx-genie-client 3.6.19
Genie Python Client.
104 versions - Latest release: 8 months ago - 2 dependent repositories - 69.8 thousand downloads last month - 1,756 stars on GitHub - 3 maintainers
thumbor_hbase 0.11
HBase image storage for Thumbor
11 versions - Latest release: over 12 years ago - 2 dependent repositories - 61 downloads last month - 9 stars on GitHub - 1 maintainer
yarnlog 0.2.1
Download Apache Hadoop YARN log to your local machine.
3 versions - Latest release: almost 5 years ago - 1 dependent repositories - 19 downloads last month - 0 stars on GitHub - 1 maintainer
pydoris 1.1.0
Python interface to Doris
10 versions - Latest release: 2 months ago - 2 dependent packages - 92.6 thousand downloads last month - 10,473 stars on GitHub - 2 maintainers
bigdata 0.0.3
IPython magic for running Apache tools for Big Data
4 versions - Latest release: over 6 years ago - 363 downloads last month - 1 maintainer
pydistcp 1.0.7
pydistcp: python WebHDFS inter/intra-cluster data copy tool.
6 versions - Latest release: about 5 years ago - 1 dependent repositories - 31 downloads last month - 9 stars on GitHub - 1 maintainer
cornet 0.1.3
Easily generate Apache Sqoop commands based on YAML config file
5 versions - Latest release: over 10 years ago - 2 dependent repositories - 109 downloads last month - 10 stars on GitHub - 1 maintainer
Top 7.8% on pypi.org
dist-keras 0.2.1
Distributed Deep learning with Apache Spark with Keras.
3 versions - Latest release: about 8 years ago - 1 dependent repositories - 546 downloads last month - 623 stars on GitHub - 1 maintainer
Top 9.8% on pypi.org
pinball 0.2.12
Workflow manager and scheduler
13 versions - Latest release: over 9 years ago - 4 dependent repositories - 82 downloads last month - 1,044 stars on GitHub - 1 maintainer
Top 4.8% on pypi.org
dispy 4.15.2
Distributed and Parallel Computing with/for Python.
80 versions - Latest release: about 3 years ago - 2 dependent packages - 21 dependent repositories - 2.9 thousand downloads last month - 267 stars on GitHub - 2 maintainers
Top 6.1% on pypi.org
pyhdfs 0.3.1
Pure Python HDFS client
7 versions - Latest release: almost 6 years ago - 2 dependent packages - 11 dependent repositories - 18 thousand downloads last month - 94 stars on GitHub - 1 maintainer
cassandralauncher 1.20
Command line utilities for launching Cassandra clusters in EC2
42 versions - Latest release: over 11 years ago - 512 downloads last month - 46 stars on GitHub - 1 maintainer
impyla-jz 0.16.3
Python client for the Impala distributed query engine
1 version - Latest release: over 4 years ago - 1 dependent repositories - 13 downloads last month - 0 stars on GitHub - 1 maintainer
spark-yarn-submit 1.0.0
library to handle spark job submit in a yarn cluster in different environment
1 version - Latest release: almost 9 years ago - 1 dependent repositories - 6 downloads last month - 3 stars on GitHub - 1 maintainer
dfspy 0.1.0
Distributed File System written in Python
1 version - Latest release: over 3 years ago - 19 downloads last month - 14 stars on GitHub - 1 maintainer
hadoop-fs-wrapper 0.7.1
Python Wrapper for Hadoop Java API
10 versions - Latest release: about 1 year ago - 1 dependent package - 1 dependent repositories - 850 downloads last month - 4 stars on GitHub - 1 maintainer
pydatavec 0.1.2
Python interface for DataVec
2 versions - Latest release: almost 6 years ago - 1 dependent package - 1 dependent repositories - 21 downloads last month - 13,595 stars on GitHub - 1 maintainer
Top 8.7% on pypi.org
jumpy 0.2.4
Numpy and nd4j interop
6 versions - Latest release: about 7 years ago - 4 dependent repositories - 207 downloads last month - 13,595 stars on GitHub - 3 maintainers
tinyhdfs 1.1.4
Tiny client for HDFS, base on WebHDFS
1 version - Latest release: about 9 years ago - 1 dependent repositories - 5 downloads last month - 2 stars on GitHub - 1 maintainer
hueclientrest 0.2.0
A Python REST client for interacting with Hadoop Hue's REST API
1 version - Latest release: 5 months ago - 18 downloads last month - 0 stars on GitHub - 1 maintainer
py4hdfs 0.0.1
Fast Queries to HDFS
1 version - Latest release: over 9 years ago - 2 dependent repositories - 12 downloads last month - 1 stars on GitHub - 1 maintainer
tf-yarn-gpu 0.6.3
Distributed TensorFlow on a YARN cluster with Gpu support
1 version - Latest release: almost 4 years ago - 1 dependent repositories - 5 downloads last month - 89 stars on GitHub - 1 maintainer
pycarbon-sdk 0.1.0
Pycarbon is a library that optimizes data access for AI based on CarbonData files, and it is bas...
1 version - Latest release: over 5 years ago - 1 dependent repositories - 14 downloads last month - 1,438 stars on GitHub - 1 maintainer
hivehoney 1.0.4
Client-less data retrieval from Hive.
5 versions - Latest release: almost 7 years ago - 1 dependent repositories - 12 downloads last month - 3 stars on GitHub - 1 maintainer
aiowebhdfs 0.0.2
A modern and asynchronous web client for WebHDFS
2 versions - Latest release: over 5 years ago - 49 downloads last month - 6 stars on GitHub - 1 maintainer
odata2avro 1.0.0
Convert OData datasets to Avro
3 versions - Latest release: over 10 years ago - 2 dependent repositories - 12 downloads last month - 2 stars on GitHub - 1 maintainer
hadeploy 0.6.1
An Hadoop Application deployment tool
12 versions - Latest release: almost 7 years ago - 1 dependent repositories - 26 downloads last month - 10 stars on GitHub - 1 maintainer
streamsx.hdfs 1.5.9
HDFS integration for IBM Streams
18 versions - Latest release: almost 5 years ago - 62 downloads last month - 9 stars on GitHub - 4 maintainers
sparkpickle 1.0.1
Provides functions for reading SequenceFile-s with Python pickles.
2 versions - Latest release: about 9 years ago - 1 dependent repositories - 2.14 thousand downloads last month - 24 stars on GitHub - 1 maintainer
Top 4.4% on pypi.org
skein 0.8.2
A simple tool and library for deploying applications on Apache YARN
23 versions - Latest release: over 3 years ago - 2 dependent packages - 12 dependent repositories - 28.3 thousand downloads last month - 144 stars on GitHub - 1 maintainer
spooq 3.4.2
Spooq is a PySpark based helper library for ETL data ingestion pipeline in Data Lakes.
13 versions - Latest release: over 1 year ago - 1 dependent repositories - 6.98 thousand downloads last month - 9 stars on GitHub - 1 maintainer
pyhadoop 0.1
Python based hadoop command-line interface
1 version - Latest release: over 11 years ago - 2 dependent repositories - 26 downloads last month - 3 stars on GitHub - 1 maintainer
Top 5.0% on pypi.org
pydoop 2.0.0
Pydoop: a Python MapReduce and HDFS API for Hadoop
17 versions - Latest release: over 6 years ago - 8 dependent repositories - 232 thousand downloads last month - 239 stars on GitHub - 4 maintainers
yarn-dev-tools 2.0.13
Various scripts to automate and ease Apache Hadoop YARN development.
30 versions - Latest release: about 1 year ago - 107 downloads last month - 2 stars on GitHub - 1 maintainer
webhdfspy 0.3.5
A wrapper library to access Hadoop HTTP REST API
8 versions - Latest release: about 9 years ago - 2 dependent repositories - 226 downloads last month - 8 stars on GitHub - 1 maintainer
cirruscluster 0.0.1-17
A batteries-included MapReduce cluster-in-a-can for scientists, researchers, and engineers.
5 versions - Latest release: over 12 years ago - 3 dependent repositories - 20 downloads last month - 2 stars on GitHub - 1 maintainer
risk-command-center 1.0.37
Risk Command Center, manage your risk easly.
2 versions - Latest release: over 3 years ago - 1 dependent repositories - 7 downloads last month - 11 stars on GitHub - 1 maintainer
yarntf 0.0.3.dev3
Easy distributed TensorFlow on Hops Hadoop
3 versions - Latest release: over 8 years ago - 1 dependent repositories - 6 downloads last month - 31 stars on GitHub - 2 maintainers
ym-impyla 0.14.0
Python client for the Impala distributed query engine
1 version - Latest release: almost 9 years ago - 1 dependent repositories - 7 downloads last month - 1 stars on GitHub - 1 maintainer
ambari-ldap-manager 0.7
A tool to manage Ambari users and groups when authentication uses LDAP.
5 versions - Latest release: over 8 years ago - 1 dependent repositories - 81 downloads last month - 0 stars on GitHub - 1 maintainer
jupyterhub-yarnspawner 0.4.0
JupyterHub Spawner for Apache Hadoop/YARN Clusters
4 versions - Latest release: over 6 years ago - 1 dependent repositories - 86 downloads last month - 2 stars on GitHub - 1 maintainer
datagen 1.0.1
Generate delimited sample data with a simple schema.
1 version - Latest release: about 2 years ago - 2 dependent repositories - 8 stars on GitHub - 1 maintainer
knit 0.2.4 💰
Python wrapper for YARN Applications
6 versions - Latest release: almost 8 years ago - 4 dependent repositories - 99 downloads last month - 52 stars on GitHub - 3 maintainers
Top 2.3% on pypi.org
snakebite 2.11.0
Pure Python HDFS client
63 versions - Latest release: over 9 years ago - 1 dependent package - 88 dependent repositories - 7.54 thousand downloads last month - 858 stars on GitHub - 1 maintainer
pymrgeo 1.0.2
MrGeo (pronounced "Mister Geo") is an open source geospatial toolkit designed to provide raster-b...
3 versions - Latest release: over 8 years ago - 1 dependent repositories - 15 downloads last month - 209 stars on GitHub - 1 maintainer
pomsets-core 1.0.9
workflow management for the cloud
10 versions - Latest release: about 15 years ago - 2 dependent repositories - 64 downloads last month - 1 maintainer
pomsets-gui 1.0.10
GUI for workflow management for the cloud
11 versions - Latest release: about 15 years ago - 2 dependent repositories - 62 downloads last month - 1 maintainer
python3-lzo-indexer 0.3.0
Library for indexing LZO compressed files
4 versions - Latest release: about 7 years ago - 1 dependent repositories - 28 downloads last month - 2 stars on GitHub - 1 maintainer
hadoop-yarn-rest-api 1.1.0
Python wrapper for Hadoop YARN REST API
5 versions - Latest release: over 6 years ago - 1 dependent repositories - 1.9 thousand downloads last month - 0 stars on GitHub - 1 maintainer
Top 3.3% on pypi.org
dask-gateway 2025.4.0 💰
A client library for interacting with a dask-gateway server
23 versions - Latest release: 7 months ago - 13 dependent packages - 25 dependent repositories - 54.5 thousand downloads last month - 142 stars on GitHub - 4 maintainers
Top 3.4% on pypi.org
yarn-api-client 1.0.3
Python client for Hadoop® YARN API
21 versions - Latest release: almost 4 years ago - 5 dependent packages - 19 dependent repositories - 557 thousand downloads last month - 109 stars on GitHub - 4 maintainers
sqoopit 0.0.12
A simple package to let you Sqoop into HDFS/Hive/HBase with python
1 version - Latest release: over 5 years ago - 1 dependent repositories - 5 downloads last month - 0 stars on GitHub - 1 maintainer
ls-thrift-py-hadoop 1-cdh4.3.0
Hadoop and Hive Python Thrift Libs
3 versions - Latest release: over 12 years ago - 2 dependent repositories - 6 downloads last month - 10 stars on GitHub - 1 maintainer
hcompressor 1.0.0 removed
Hcompressor is a tool to compress files in HDFS
1 version - Latest release: about 3 years ago - 15 downloads last month - 1 stars on GitHub - 1 maintainer