pypi.org "hadoop" keyword
View the packages on the pypi.org package registry that are tagged with the "hadoop" keyword.
Top 7.7% on pypi.org
51 versions - Latest release: about 1 month ago - 2 dependent packages - 5 dependent repositories - 1.17 thousand downloads last month - 43 stars on GitHub - 7 maintainers
cluster-pack 0.3.14
A library on top of pex to make your Python code easily available on a cluster51 versions - Latest release: about 1 month ago - 2 dependent packages - 5 dependent repositories - 1.17 thousand downloads last month - 43 stars on GitHub - 7 maintainers
kiji-bento-cluster 2.0.10
CDH and Datastax Enterprise docker single-node development cluster.6 versions - Latest release: almost 11 years ago - 2 dependent repositories - 15 downloads last month - 1 maintainer
Top 0.7% on pypi.org
119 versions - Latest release: about 1 month ago - 14 dependent packages - 393 dependent repositories - 194 thousand downloads last month - 6,710 stars on GitHub - 2 maintainers
h2o 3.46.0.8
H2O, Fast Scalable Machine Learning, for python119 versions - Latest release: about 1 month ago - 14 dependent packages - 393 dependent repositories - 194 thousand downloads last month - 6,710 stars on GitHub - 2 maintainers
Top 4.8% on pypi.org
23 versions - Latest release: 7 months ago - 1 dependent package - 10 dependent repositories - 3.34 thousand downloads last month - 142 stars on GitHub - 4 maintainers
dask-gateway-server 2025.4.0 💰
A multi-tenant server for securely deploying and managing multiple Dask clusters.23 versions - Latest release: 7 months ago - 1 dependent package - 10 dependent repositories - 3.34 thousand downloads last month - 142 stars on GitHub - 4 maintainers
dvc-hdfs 3.0.0
hdfs plugin for dvc3 versions - Latest release: almost 2 years ago - 5 dependent packages - 3 dependent repositories - 67.2 thousand downloads last month - 2 stars on GitHub - 3 maintainers
Top 0.5% on pypi.org
84 versions - Latest release: 11 months ago - 36 dependent packages - 586 dependent repositories - 2.52 million downloads last month - 17,072 stars on GitHub - 14 maintainers
luigi 3.6.0
Workflow mgmgt + task scheduling + dependency resolution.84 versions - Latest release: 11 months ago - 36 dependent packages - 586 dependent repositories - 2.52 million downloads last month - 17,072 stars on GitHub - 14 maintainers
h2o-mlflow-flavor 0.1.0
A mlflow flavor for working with H2O-3 MOJO and POJO models1 version - Latest release: about 2 years ago - 656 downloads last month - 7,363 stars on GitHub - 1 maintainer
Top 8.6% on pypi.org
19 versions - Latest release: about 1 month ago - 3 dependent repositories - 3.13 thousand downloads last month - 7,363 stars on GitHub - 1 maintainer
h2o-client 3.46.0.8
H2O, Fast Scalable Machine Learning, for python19 versions - Latest release: about 1 month ago - 3 dependent repositories - 3.13 thousand downloads last month - 7,363 stars on GitHub - 1 maintainer
Top 7.1% on pypi.org
13 versions - Latest release: about 11 years ago - 14 dependent repositories - 93 downloads last month - 53 stars on GitHub - 1 maintainer
starbase 0.3.3
Python client for HBase Stargate REST server13 versions - Latest release: about 11 years ago - 14 dependent repositories - 93 downloads last month - 53 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
61 versions - Latest release: 4 months ago - 29 dependent packages - 251 dependent repositories - 11.2 million downloads last month - 740 stars on GitHub - 13 maintainers
impyla 0.22.0
Python client for the Impala distributed query engine61 versions - Latest release: 4 months ago - 29 dependent packages - 251 dependent repositories - 11.2 million downloads last month - 740 stars on GitHub - 13 maintainers
splitlog 4.1.1
Utility to split aggregated logs from Apache Hadoop Yarn applications into a folder hierarchy16 versions - Latest release: 4 days ago - 32 downloads last month - 0 stars on GitHub - 1 maintainer
hadoop-protoseq 0.0.1
Python library for Hadoop Streaming with support of protobuf sequences1 version - Latest release: over 4 years ago - 1 dependent repositories - 11 downloads last month - 1 stars on GitHub - 1 maintainer
cobra-policytool 1.1.6
Tool for manage Hadoop access using Apache Atlas and Ranger.9 versions - Latest release: over 6 years ago - 1 dependent repositories - 73 downloads last month - 16 stars on GitHub - 1 maintainer
sparkdh 0.0.1
1 version - Latest release: almost 4 years ago - 1 dependent repositories - 7 downloads last month - 0 stars on GitHub - 1 maintainerpymongo_hadoop 1.1.0
UNKNOWN2 versions - Latest release: over 12 years ago - 3 dependent repositories - 20 downloads last month - 1,519 stars on GitHub - 1 maintainer
sqoopy 0.0.75
UNKNOWN21 versions - Latest release: almost 10 years ago - 2 dependent repositories - 78 downloads last month - 1 maintainer
python_hiveish 1.1.0
A hive-like interface wrapper around Hadoopy that allows SQL like queries ontop of MapReduce dire...2 versions - Latest release: about 10 years ago - 2 dependent repositories - 17 downloads last month - 1 stars on GitHub - 1 maintainer
sdctool 0.11.0
Streamsets DataCollector API utility3 versions - Latest release: over 7 years ago - 1 dependent repositories - 42 downloads last month - 13 stars on GitHub - 1 maintainer
Top 4.6% on pypi.org
6 versions - Latest release: 9 months ago - 4 dependent packages - 18 dependent repositories - 187 thousand downloads last month - 23 stars on GitHub - 3 maintainers
snakebite-py3 3.0.6
Pure Python HDFS client6 versions - Latest release: 9 months ago - 4 dependent packages - 18 dependent repositories - 187 thousand downloads last month - 23 stars on GitHub - 3 maintainers
snakeriver 0.1.3
Another way to think about Hadoop Streaming in Python4 versions - Latest release: over 12 years ago - 2 dependent repositories - 8 downloads last month - 1 maintainer
luigi-k8s-jobs-runner 2.8.10
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles depende...11 versions - Latest release: over 5 years ago - 1 dependent repositories - 39 downloads last month - 18,267 stars on GitHub - 1 maintainer
Top 4.6% on pypi.org
1 version - Latest release: about 13 years ago - 32 dependent repositories - 944 downloads last month - 1 maintainer
hive-thrift-py 0.0.1
Hive Python Thrift Libs1 version - Latest release: about 13 years ago - 32 dependent repositories - 944 downloads last month - 1 maintainer
Top 9.2% on pypi.org
14 versions - Latest release: 10 days ago - 2 dependent repositories - 424 downloads last month - 9 stars on GitHub - 1 maintainer
madoop 1.3.2
A light weight MapReduce framework for education.14 versions - Latest release: 10 days ago - 2 dependent repositories - 424 downloads last month - 9 stars on GitHub - 1 maintainer
hadopy 0.1.8
Easy parallel map-reduce command line tool8 versions - Latest release: over 4 years ago - 1 dependent repositories - 45 downloads last month - 7 stars on GitHub - 1 maintainer
dbsync 0.1.1
Sync database to hadoop1 version - Latest release: over 10 years ago - 1 dependent repositories - 20 downloads last month - 0 stars on GitHub - 1 maintainer
nutch 1.10.3 💰
Apache Nutch Python library4 versions - Latest release: about 10 years ago - 2 dependent repositories - 34 downloads last month - 39 stars on GitHub - 1 maintainer
clusterdock 2.3.0
clusterdock is a framework for creating Docker-based container clusters24 versions - Latest release: over 5 years ago - 1 dependent repositories - 68 downloads last month - 29 stars on GitHub - 3 maintainers
trustedanalytics 0.7.3.post20161020785
Trusted Analytics Toolkit161 versions - Latest release: about 9 years ago - 2 dependent repositories - 125 downloads last month - 43 stars on GitHub - 1 maintainer
Top 7.8% on pypi.org
5 versions - Latest release: over 7 years ago - 6 dependent repositories - 5.46 thousand downloads last month - 41 stars on GitHub - 1 maintainer
hbase-python 0.5
User friendly HBase client for Python 3. (Pure python implementation)5 versions - Latest release: over 7 years ago - 6 dependent repositories - 5.46 thousand downloads last month - 41 stars on GitHub - 1 maintainer
cyhdfs 0.1.3
Cython wrapper around libhdfs2 versions - Latest release: about 13 years ago - 2 dependent repositories - 23 downloads last month - 1 maintainer
analytics-command-center 3.0.14
Command Center for Data Ingestion, Advanced Analytics and Artificial Intelligence process1 version - Latest release: almost 4 years ago - 28 downloads last month - 11 stars on GitHub - 1 maintainer
hadoop-mapreduce 0.5
Implementation of Hadoop Mapreduce on text files4 versions - Latest release: about 3 years ago - 27 downloads last month - 1 stars on GitHub - 1 maintainer
hivejdbc 0.2.3
Hive database driver via jdbc5 versions - Latest release: almost 5 years ago - 1 dependent repositories - 45.2 thousand downloads last month - 5 stars on GitHub - 1 maintainer
tf-yarn 0.7.0
Distributed TensorFlow or pythorch on a YARN cluster19 versions - Latest release: about 2 years ago - 2 dependent repositories - 170 downloads last month - 90 stars on GitHub - 7 maintainers
Top 4.4% on pypi.org
104 versions - Latest release: 8 months ago - 2 dependent repositories - 69.8 thousand downloads last month - 1,756 stars on GitHub - 3 maintainers
nflx-genie-client 3.6.19
Genie Python Client.104 versions - Latest release: 8 months ago - 2 dependent repositories - 69.8 thousand downloads last month - 1,756 stars on GitHub - 3 maintainers
thumbor_hbase 0.11
HBase image storage for Thumbor11 versions - Latest release: over 12 years ago - 2 dependent repositories - 61 downloads last month - 9 stars on GitHub - 1 maintainer
yarnlog 0.2.1
Download Apache Hadoop YARN log to your local machine.3 versions - Latest release: almost 5 years ago - 1 dependent repositories - 19 downloads last month - 0 stars on GitHub - 1 maintainer
pydoris 1.1.0
Python interface to Doris10 versions - Latest release: 2 months ago - 2 dependent packages - 92.6 thousand downloads last month - 10,473 stars on GitHub - 2 maintainers
bigdata 0.0.3
IPython magic for running Apache tools for Big Data4 versions - Latest release: over 6 years ago - 363 downloads last month - 1 maintainer
pydistcp 1.0.7
pydistcp: python WebHDFS inter/intra-cluster data copy tool.6 versions - Latest release: about 5 years ago - 1 dependent repositories - 31 downloads last month - 9 stars on GitHub - 1 maintainer
cornet 0.1.3
Easily generate Apache Sqoop commands based on YAML config file5 versions - Latest release: over 10 years ago - 2 dependent repositories - 109 downloads last month - 10 stars on GitHub - 1 maintainer
Top 7.8% on pypi.org
3 versions - Latest release: about 8 years ago - 1 dependent repositories - 546 downloads last month - 623 stars on GitHub - 1 maintainer
dist-keras 0.2.1
Distributed Deep learning with Apache Spark with Keras.3 versions - Latest release: about 8 years ago - 1 dependent repositories - 546 downloads last month - 623 stars on GitHub - 1 maintainer
Top 9.8% on pypi.org
13 versions - Latest release: over 9 years ago - 4 dependent repositories - 82 downloads last month - 1,044 stars on GitHub - 1 maintainer
pinball 0.2.12
Workflow manager and scheduler13 versions - Latest release: over 9 years ago - 4 dependent repositories - 82 downloads last month - 1,044 stars on GitHub - 1 maintainer
Top 4.8% on pypi.org
80 versions - Latest release: about 3 years ago - 2 dependent packages - 21 dependent repositories - 2.9 thousand downloads last month - 267 stars on GitHub - 2 maintainers
dispy 4.15.2
Distributed and Parallel Computing with/for Python.80 versions - Latest release: about 3 years ago - 2 dependent packages - 21 dependent repositories - 2.9 thousand downloads last month - 267 stars on GitHub - 2 maintainers
Top 6.1% on pypi.org
7 versions - Latest release: almost 6 years ago - 2 dependent packages - 11 dependent repositories - 18 thousand downloads last month - 94 stars on GitHub - 1 maintainer
pyhdfs 0.3.1
Pure Python HDFS client7 versions - Latest release: almost 6 years ago - 2 dependent packages - 11 dependent repositories - 18 thousand downloads last month - 94 stars on GitHub - 1 maintainer
cassandralauncher 1.20
Command line utilities for launching Cassandra clusters in EC242 versions - Latest release: over 11 years ago - 512 downloads last month - 46 stars on GitHub - 1 maintainer
impyla-jz 0.16.3
Python client for the Impala distributed query engine1 version - Latest release: over 4 years ago - 1 dependent repositories - 13 downloads last month - 0 stars on GitHub - 1 maintainer
spark-yarn-submit 1.0.0
library to handle spark job submit in a yarn cluster in different environment1 version - Latest release: almost 9 years ago - 1 dependent repositories - 6 downloads last month - 3 stars on GitHub - 1 maintainer
dfspy 0.1.0
Distributed File System written in Python1 version - Latest release: over 3 years ago - 19 downloads last month - 14 stars on GitHub - 1 maintainer
hadoop-fs-wrapper 0.7.1
Python Wrapper for Hadoop Java API10 versions - Latest release: about 1 year ago - 1 dependent package - 1 dependent repositories - 850 downloads last month - 4 stars on GitHub - 1 maintainer
pydatavec 0.1.2
Python interface for DataVec2 versions - Latest release: almost 6 years ago - 1 dependent package - 1 dependent repositories - 21 downloads last month - 13,595 stars on GitHub - 1 maintainer
Top 8.7% on pypi.org
6 versions - Latest release: about 7 years ago - 4 dependent repositories - 207 downloads last month - 13,595 stars on GitHub - 3 maintainers
jumpy 0.2.4
Numpy and nd4j interop6 versions - Latest release: about 7 years ago - 4 dependent repositories - 207 downloads last month - 13,595 stars on GitHub - 3 maintainers
tinyhdfs 1.1.4
Tiny client for HDFS, base on WebHDFS1 version - Latest release: about 9 years ago - 1 dependent repositories - 5 downloads last month - 2 stars on GitHub - 1 maintainer
hueclientrest 0.2.0
A Python REST client for interacting with Hadoop Hue's REST API1 version - Latest release: 5 months ago - 18 downloads last month - 0 stars on GitHub - 1 maintainer
py4hdfs 0.0.1
Fast Queries to HDFS1 version - Latest release: over 9 years ago - 2 dependent repositories - 12 downloads last month - 1 stars on GitHub - 1 maintainer
tf-yarn-gpu 0.6.3
Distributed TensorFlow on a YARN cluster with Gpu support1 version - Latest release: almost 4 years ago - 1 dependent repositories - 5 downloads last month - 89 stars on GitHub - 1 maintainer
pycarbon-sdk 0.1.0
Pycarbon is a library that optimizes data access for AI based on CarbonData files, and it is bas...1 version - Latest release: over 5 years ago - 1 dependent repositories - 14 downloads last month - 1,438 stars on GitHub - 1 maintainer
hivehoney 1.0.4
Client-less data retrieval from Hive.5 versions - Latest release: almost 7 years ago - 1 dependent repositories - 12 downloads last month - 3 stars on GitHub - 1 maintainer
aiowebhdfs 0.0.2
A modern and asynchronous web client for WebHDFS2 versions - Latest release: over 5 years ago - 49 downloads last month - 6 stars on GitHub - 1 maintainer
odata2avro 1.0.0
Convert OData datasets to Avro3 versions - Latest release: over 10 years ago - 2 dependent repositories - 12 downloads last month - 2 stars on GitHub - 1 maintainer
hadeploy 0.6.1
An Hadoop Application deployment tool12 versions - Latest release: almost 7 years ago - 1 dependent repositories - 26 downloads last month - 10 stars on GitHub - 1 maintainer
streamsx.hdfs 1.5.9
HDFS integration for IBM Streams18 versions - Latest release: almost 5 years ago - 62 downloads last month - 9 stars on GitHub - 4 maintainers
sparkpickle 1.0.1
Provides functions for reading SequenceFile-s with Python pickles.2 versions - Latest release: about 9 years ago - 1 dependent repositories - 2.14 thousand downloads last month - 24 stars on GitHub - 1 maintainer
Top 4.4% on pypi.org
23 versions - Latest release: over 3 years ago - 2 dependent packages - 12 dependent repositories - 28.3 thousand downloads last month - 144 stars on GitHub - 1 maintainer
skein 0.8.2
A simple tool and library for deploying applications on Apache YARN23 versions - Latest release: over 3 years ago - 2 dependent packages - 12 dependent repositories - 28.3 thousand downloads last month - 144 stars on GitHub - 1 maintainer
spooq 3.4.2
Spooq is a PySpark based helper library for ETL data ingestion pipeline in Data Lakes.13 versions - Latest release: over 1 year ago - 1 dependent repositories - 6.98 thousand downloads last month - 9 stars on GitHub - 1 maintainer
pyhadoop 0.1
Python based hadoop command-line interface1 version - Latest release: over 11 years ago - 2 dependent repositories - 26 downloads last month - 3 stars on GitHub - 1 maintainer
Top 5.0% on pypi.org
17 versions - Latest release: over 6 years ago - 8 dependent repositories - 232 thousand downloads last month - 239 stars on GitHub - 4 maintainers
pydoop 2.0.0
Pydoop: a Python MapReduce and HDFS API for Hadoop17 versions - Latest release: over 6 years ago - 8 dependent repositories - 232 thousand downloads last month - 239 stars on GitHub - 4 maintainers
yarn-dev-tools 2.0.13
Various scripts to automate and ease Apache Hadoop YARN development.30 versions - Latest release: about 1 year ago - 107 downloads last month - 2 stars on GitHub - 1 maintainer
webhdfspy 0.3.5
A wrapper library to access Hadoop HTTP REST API8 versions - Latest release: about 9 years ago - 2 dependent repositories - 226 downloads last month - 8 stars on GitHub - 1 maintainer
cirruscluster 0.0.1-17
A batteries-included MapReduce cluster-in-a-can for scientists, researchers, and engineers.5 versions - Latest release: over 12 years ago - 3 dependent repositories - 20 downloads last month - 2 stars on GitHub - 1 maintainer
risk-command-center 1.0.37
Risk Command Center, manage your risk easly.2 versions - Latest release: over 3 years ago - 1 dependent repositories - 7 downloads last month - 11 stars on GitHub - 1 maintainer
yarntf 0.0.3.dev3
Easy distributed TensorFlow on Hops Hadoop3 versions - Latest release: over 8 years ago - 1 dependent repositories - 6 downloads last month - 31 stars on GitHub - 2 maintainers
ym-impyla 0.14.0
Python client for the Impala distributed query engine1 version - Latest release: almost 9 years ago - 1 dependent repositories - 7 downloads last month - 1 stars on GitHub - 1 maintainer
ambari-ldap-manager 0.7
A tool to manage Ambari users and groups when authentication uses LDAP.5 versions - Latest release: over 8 years ago - 1 dependent repositories - 81 downloads last month - 0 stars on GitHub - 1 maintainer
jupyterhub-yarnspawner 0.4.0
JupyterHub Spawner for Apache Hadoop/YARN Clusters4 versions - Latest release: over 6 years ago - 1 dependent repositories - 86 downloads last month - 2 stars on GitHub - 1 maintainer
datagen 1.0.1
Generate delimited sample data with a simple schema.1 version - Latest release: about 2 years ago - 2 dependent repositories - 8 stars on GitHub - 1 maintainer
knit 0.2.4 💰
Python wrapper for YARN Applications6 versions - Latest release: almost 8 years ago - 4 dependent repositories - 99 downloads last month - 52 stars on GitHub - 3 maintainers
Top 2.3% on pypi.org
63 versions - Latest release: over 9 years ago - 1 dependent package - 88 dependent repositories - 7.54 thousand downloads last month - 858 stars on GitHub - 1 maintainer
snakebite 2.11.0
Pure Python HDFS client63 versions - Latest release: over 9 years ago - 1 dependent package - 88 dependent repositories - 7.54 thousand downloads last month - 858 stars on GitHub - 1 maintainer
pymrgeo 1.0.2
MrGeo (pronounced "Mister Geo") is an open source geospatial toolkit designed to provide raster-b...3 versions - Latest release: over 8 years ago - 1 dependent repositories - 15 downloads last month - 209 stars on GitHub - 1 maintainer
pomsets-core 1.0.9
workflow management for the cloud10 versions - Latest release: about 15 years ago - 2 dependent repositories - 64 downloads last month - 1 maintainer
pomsets-gui 1.0.10
GUI for workflow management for the cloud11 versions - Latest release: about 15 years ago - 2 dependent repositories - 62 downloads last month - 1 maintainer
python3-lzo-indexer 0.3.0
Library for indexing LZO compressed files4 versions - Latest release: about 7 years ago - 1 dependent repositories - 28 downloads last month - 2 stars on GitHub - 1 maintainer
hadoop-yarn-rest-api 1.1.0
Python wrapper for Hadoop YARN REST API5 versions - Latest release: over 6 years ago - 1 dependent repositories - 1.9 thousand downloads last month - 0 stars on GitHub - 1 maintainer
Top 3.3% on pypi.org
23 versions - Latest release: 7 months ago - 13 dependent packages - 25 dependent repositories - 54.5 thousand downloads last month - 142 stars on GitHub - 4 maintainers
dask-gateway 2025.4.0 💰
A client library for interacting with a dask-gateway server23 versions - Latest release: 7 months ago - 13 dependent packages - 25 dependent repositories - 54.5 thousand downloads last month - 142 stars on GitHub - 4 maintainers
Top 3.4% on pypi.org
21 versions - Latest release: almost 4 years ago - 5 dependent packages - 19 dependent repositories - 557 thousand downloads last month - 109 stars on GitHub - 4 maintainers
yarn-api-client 1.0.3
Python client for Hadoop® YARN API21 versions - Latest release: almost 4 years ago - 5 dependent packages - 19 dependent repositories - 557 thousand downloads last month - 109 stars on GitHub - 4 maintainers
sqoopit 0.0.12
A simple package to let you Sqoop into HDFS/Hive/HBase with python1 version - Latest release: over 5 years ago - 1 dependent repositories - 5 downloads last month - 0 stars on GitHub - 1 maintainer
ls-thrift-py-hadoop 1-cdh4.3.0
Hadoop and Hive Python Thrift Libs3 versions - Latest release: over 12 years ago - 2 dependent repositories - 6 downloads last month - 10 stars on GitHub - 1 maintainer
hcompressor 1.0.0 removed
Hcompressor is a tool to compress files in HDFS1 version - Latest release: about 3 years ago - 15 downloads last month - 1 stars on GitHub - 1 maintainer
Related Keywords
python
26
hdfs
24
hive
19
spark
19
distributed
10
big-data
9
yarn
9
mapreduce
8
java
8
cloudera
7
cluster
6
webhdfs
6
gpu
6
hbase
6
bigdata
6
data-science
5
sql
5
cloud
5
api
5
database
4
apache
4
impala
4
machine-learning
4
deep-learning
4
streaming
4
tensorflow
4
automl
3
249
3
pep
3
scala
3
data-engineering
3
pyspark
3
db
3
pandas
3
etl
3
pydata
3
data
3
protobuf
3
sqoop
3
YARN
3
mysql
3
mpp
3
ensemble-learning
3
gbm
3
parallel
3
h2o
3
h2o-automl
3
big data
3
naive-bayes
3
opensource
3
pca
3
hs2
3
r
3
random-forest
3
hiveserver2
3
HDFS
3
dask
3
deeplearning
2
kafka
2
thrift
2
search
2
map
2
neural-nets
2
matrix-library
2
dl4j
2
intellij
2
linear-algebra
2
pig
2
rest
2
distributed-systems
2
deployment
2
toolkit
2
trabaja-sobre-spark
2
spark-sql
2
parquet
2
huemul-bigdatagovernance
2
huemul
2
hortonworks
2
gdpr
2
dataquality
2
datamart
2
data-warehouse
2
data-governance
2
data-engineer
2
chile
2
analytics
2
distributed-computing
2
hadoop-filesystem
2
batch
2
cdh
2
cassandra
2
machine learning
2
data mining
2
statistical analysis
2
modeling
2
HPC
2
kubernetes
2
python3
2
luigi
2
orchestration-framework
2