Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "hadoop" keyword

Top 0.7% on pypi.org
h2o 3.46.0.1
H2O, Fast Scalable Machine Learning, for python
112 versions - Latest release: about 2 months ago - 10 dependent packages - 393 dependent repositories - 323 thousand downloads last month - 6,710 stars on GitHub - 2 maintainers
Top 0.5% on pypi.org
luigi 3.5.0
Workflow mgmgt + task scheduling + dependency resolution.
81 versions - Latest release: 4 months ago - 33 dependent packages - 586 dependent repositories - 349 thousand downloads last month - 17,072 stars on GitHub - 17 maintainers
dvc-hdfs 3.0.0
hdfs plugin for dvc
3 versions - Latest release: 5 months ago - 4 dependent packages - 3 dependent repositories - 10.9 thousand downloads last month - 0 stars on GitHub - 4 maintainers
hadoop-mapreduce 0.5
Implementation of Hadoop Mapreduce on text files
4 versions - Latest release: over 1 year ago - 23 downloads last month - 0 stars on GitHub - 2 maintainers
jupyterhub-yarnspawner 0.4.0
JupyterHub Spawner for Apache Hadoop/YARN Clusters
4 versions - Latest release: almost 5 years ago - 1 dependent repositories - 69 downloads last month - 2 stars on GitHub - 1 maintainer
Top 8.7% on pypi.org
jumpy 0.2.4
Numpy and nd4j interop
6 versions - Latest release: over 5 years ago - 4 dependent repositories - 106 downloads last month - 13,445 stars on GitHub - 3 maintainers
pydatavec 0.1.2
Python interface for DataVec
2 versions - Latest release: over 4 years ago - 1 dependent package - 1 dependent repositories - 33 downloads last month - 13,290 stars on GitHub - 1 maintainer
pycarbon-sdk 0.1.0
Pycarbon is a library that optimizes data access for AI based on CarbonData files, and it is bas...
1 version - Latest release: about 4 years ago - 1 dependent repositories - 26 downloads last month - 1,418 stars on GitHub - 2 maintainers
Top 1.4% on pypi.org
impyla 0.19.0
Python client for the Impala distributed query engine
52 versions - Latest release: 6 months ago - 22 dependent packages - 251 dependent repositories - 612 thousand downloads last month - 723 stars on GitHub - 13 maintainers
pydistcp 1.0.7
pydistcp: python WebHDFS inter/intra-cluster data copy tool.
6 versions - Latest release: over 3 years ago - 1 dependent repositories - 21 downloads last month - 10 stars on GitHub - 2 maintainers
cyhdfs 0.1.3
Cython wrapper around libhdfs
2 versions - Latest release: over 11 years ago - 2 dependent repositories - 13 downloads last month - 2 maintainers
luigi-k8s-jobs-runner 2.8.10
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles depende...
11 versions - Latest release: about 4 years ago - 1 dependent repositories - 27 downloads last month - 17,351 stars on GitHub - 2 maintainers
sqoopit 0.0.12
A simple package to let you Sqoop into HDFS/Hive/HBase with python
1 version - Latest release: about 4 years ago - 1 dependent repositories - 16 downloads last month - 0 stars on GitHub - 2 maintainers
Top 4.8% on pypi.org
dask-gateway-server 2024.1.0 💰
A multi-tenant server for securely deploying and managing multiple Dask clusters.
22 versions - Latest release: 4 months ago - 1 dependent package - 10 dependent repositories - 6.32 thousand downloads last month - 128 stars on GitHub - 4 maintainers
Top 3.3% on pypi.org
dask-gateway 2024.1.0 💰
A client library for interacting with a dask-gateway server
22 versions - Latest release: 4 months ago - 11 dependent packages - 25 dependent repositories - 36.5 thousand downloads last month - 128 stars on GitHub - 4 maintainers
hadoop-yarn-rest-api 1.1.0
Python wrapper for Hadoop YARN REST API
5 versions - Latest release: almost 5 years ago - 1 dependent repositories - 2.21 thousand downloads last month - 0 stars on GitHub - 2 maintainers
pymongo_hadoop 1.1.0
UNKNOWN
2 versions - Latest release: about 11 years ago - 3 dependent repositories - 25 downloads last month - 1,522 stars on GitHub - 2 maintainers
snakeriver 0.1.3
Another way to think about Hadoop Streaming in Python
4 versions - Latest release: over 10 years ago - 2 dependent repositories - 15 downloads last month - 2 maintainers
odata2avro 1.0.0
Convert OData datasets to Avro
3 versions - Latest release: about 9 years ago - 2 dependent repositories - 15 downloads last month - 2 stars on GitHub - 2 maintainers
Top 2.3% on pypi.org
snakebite 2.11.0
Pure Python HDFS client
63 versions - Latest release: almost 8 years ago - 1 dependent package - 88 dependent repositories - 47.5 thousand downloads last month - 858 stars on GitHub - 1 maintainer
py4hdfs 0.0.1
Fast Queries to HDFS
1 version - Latest release: about 8 years ago - 2 dependent repositories - 9 downloads last month - 1 stars on GitHub - 2 maintainers
yarn-dev-tools 2.0.2
Various scripts to automate and ease Apache Hadoop YARN development.
19 versions - Latest release: about 1 month ago - 162 downloads last month - 2 stars on GitHub - 2 maintainers
aiowebhdfs 0.0.2
A modern and asynchronous web client for WebHDFS
2 versions - Latest release: about 4 years ago - 39 downloads last month - 7 stars on GitHub - 2 maintainers
dfspy 0.1.0
Distributed File System written in Python
1 version - Latest release: almost 2 years ago - 19 downloads last month - 14 stars on GitHub - 2 maintainers
hdfs-native 0.9.1
Python bindings for hdfs-native Rust library
10 versions - Latest release: about 1 month ago - 313 downloads last month - 18 stars on GitHub - 2 maintainers
analytics-command-center 3.0.14
Command Center for Data Ingestion, Advanced Analytics and Artificial Intelligence process
1 version - Latest release: over 2 years ago - 26 downloads last month - 11 stars on GitHub - 1 maintainer
yarnlog 0.2.1
Download Apache Hadoop YARN log to your local machine.
3 versions - Latest release: over 3 years ago - 1 dependent repositories - 35 downloads last month - 0 stars on GitHub - 2 maintainers
pyhadoop 0.1
Python based hadoop command-line interface
1 version - Latest release: about 10 years ago - 2 dependent repositories - 96 downloads last month - 3 stars on GitHub - 2 maintainers
sparkpickle 1.0.1
Provides functions for reading SequenceFile-s with Python pickles.
2 versions - Latest release: over 7 years ago - 1 dependent repositories - 5.3 thousand downloads last month - 24 stars on GitHub - 1 maintainer
Top 5.0% on pypi.org
pydoop 2.0.0
Pydoop: a Python MapReduce and HDFS API for Hadoop
17 versions - Latest release: almost 5 years ago - 8 dependent repositories - 206 thousand downloads last month - 233 stars on GitHub - 4 maintainers
Top 4.6% on pypi.org
hive-thrift-py 0.0.1
Hive Python Thrift Libs
1 version - Latest release: over 11 years ago - 32 dependent repositories - 2.43 thousand downloads last month - 2 maintainers
hadeploy 0.6.1
An Hadoop Application deployment tool
12 versions - Latest release: over 5 years ago - 1 dependent repositories - 112 downloads last month - 10 stars on GitHub - 2 maintainers
Top 4.8% on pypi.org
dispy 4.15.2
Distributed and Parallel Computing with/for Python.
80 versions - Latest release: over 1 year ago - 2 dependent packages - 21 dependent repositories - 397 downloads last month - 255 stars on GitHub - 2 maintainers
hadoop-protoseq 0.0.1
Python library for Hadoop Streaming with support of protobuf sequences
1 version - Latest release: almost 3 years ago - 1 dependent repositories - 17 downloads last month - 1 stars on GitHub - 2 maintainers
hadoop-fs-wrapper 0.6.1
Python Wrapper for Hadoop Java API
8 versions - Latest release: 10 months ago - 1 dependent package - 1 dependent repositories - 2.67 thousand downloads last month - 3 stars on GitHub - 2 maintainers
cornet 0.1.3
Easily generate Apache Sqoop commands based on YAML config file
5 versions - Latest release: almost 9 years ago - 2 dependent repositories - 44 downloads last month - 10 stars on GitHub - 2 maintainers
ls-thrift-py-hadoop 1-cdh4.3.0
Hadoop and Hive Python Thrift Libs
3 versions - Latest release: almost 11 years ago - 2 dependent repositories - 11 downloads last month - 10 stars on GitHub - 2 maintainers
cirruscluster 0.0.1-17
A batteries-included MapReduce cluster-in-a-can for scientists, researchers, and engineers.
5 versions - Latest release: almost 11 years ago - 3 dependent repositories - 8 downloads last month - 2 stars on GitHub - 2 maintainers
sqoopy 0.0.75
UNKNOWN
21 versions - Latest release: over 8 years ago - 2 dependent repositories - 86 downloads last month - 2 maintainers
bigdata 0.0.3
IPython magic for running Apache tools for Big Data
4 versions - Latest release: almost 5 years ago - 919 downloads last month - 2 maintainers
nutch 1.10.3 💰
Apache Nutch Python library
4 versions - Latest release: over 8 years ago - 2 dependent repositories - 40 downloads last month - 36 stars on GitHub - 2 maintainers
sdctool 0.11.0
Streamsets DataCollector API utility
3 versions - Latest release: about 6 years ago - 1 dependent repositories - 13 downloads last month - 13 stars on GitHub - 2 maintainers
Top 7.8% on pypi.org
dist-keras 0.2.1
Distributed Deep learning with Apache Spark with Keras.
3 versions - Latest release: over 6 years ago - 1 dependent repositories - 6.95 thousand downloads last month - 624 stars on GitHub - 2 maintainers
webhdfspy 0.3.5
A wrapper library to access Hadoop HTTP REST API
8 versions - Latest release: over 7 years ago - 2 dependent repositories - 95 downloads last month - 8 stars on GitHub - 2 maintainers
cassandralauncher 1.20
Command line utilities for launching Cassandra clusters in EC2
42 versions - Latest release: about 10 years ago - 48 downloads last month - 46 stars on GitHub - 1 maintainer
yarntf 0.0.3.dev3
Easy distributed TensorFlow on Hops Hadoop
3 versions - Latest release: almost 7 years ago - 1 dependent repositories - 15 downloads last month - 33 stars on GitHub - 4 maintainers
hadopy 0.1.8
Easy parallel map-reduce command line tool
8 versions - Latest release: almost 3 years ago - 1 dependent repositories - 24 downloads last month - 8 stars on GitHub - 2 maintainers
Top 7.7% on pypi.org
cluster-pack 0.3.7
A library on top of either pex or conda-packto make your Python code easily available on a cluster
44 versions - Latest release: 10 days ago - 2 dependent packages - 5 dependent repositories - 683 downloads last month - 43 stars on GitHub - 14 maintainers
Top 6.1% on pypi.org
pyhdfs 0.3.1
Pure Python HDFS client
7 versions - Latest release: over 4 years ago - 11 dependent repositories - 7.36 thousand downloads last month - 88 stars on GitHub - 2 maintainers
pomsets-core 1.0.9
workflow management for the cloud
10 versions - Latest release: over 13 years ago - 2 dependent repositories - 36 downloads last month - 2 maintainers
tinyhdfs 1.1.4
Tiny client for HDFS, base on WebHDFS
1 version - Latest release: over 7 years ago - 1 dependent repositories - 11 downloads last month - 2 stars on GitHub - 2 maintainers
python_hiveish 1.1.0
A hive-like interface wrapper around Hadoopy that allows SQL like queries ontop of MapReduce dire...
2 versions - Latest release: over 8 years ago - 2 dependent repositories - 11 downloads last month - 1 stars on GitHub - 1 maintainer
pomsets-gui 1.0.10
GUI for workflow management for the cloud
11 versions - Latest release: over 13 years ago - 2 dependent repositories - 42 downloads last month - 2 maintainers
hivejdbc 0.2.3
Hive database driver via jdbc
5 versions - Latest release: over 3 years ago - 1 dependent repositories - 73.5 thousand downloads last month - 5 stars on GitHub - 2 maintainers
Top 4.4% on pypi.org
skein 0.8.2
A simple tool and library for deploying applications on Apache YARN
23 versions - Latest release: about 2 years ago - 2 dependent packages - 12 dependent repositories - 25.2 thousand downloads last month - 138 stars on GitHub - 2 maintainers
datagen 1.0.1
Generate delimited sample data with a simple schema.
1 version - Latest release: 9 months ago - 2 dependent repositories - 8 stars on GitHub - 2 maintainers
Top 4.4% on pypi.org
nflx-genie-client 3.6.17
Genie Python Client.
102 versions - Latest release: 9 months ago - 2 dependent repositories - 78.3 thousand downloads last month - 1,679 stars on GitHub - 6 maintainers
Top 3.4% on pypi.org
yarn-api-client 1.0.3
Python client for Hadoop® YARN API
21 versions - Latest release: over 2 years ago - 3 dependent packages - 19 dependent repositories - 607 thousand downloads last month - 109 stars on GitHub - 8 maintainers
Top 4.6% on pypi.org
snakebite-py3 3.0.5
Pure Python HDFS client
5 versions - Latest release: over 4 years ago - 4 dependent packages - 18 dependent repositories - 152 thousand downloads last month - 22 stars on GitHub - 2 maintainers
dbt-doris 0.3.4
The doris adapter plugin for dbt
8 versions - Latest release: 8 months ago - 1 dependent repositories - 132 downloads last month - 10,473 stars on GitHub - 2 maintainers
pydoris-client 1.0.4
Python interface to Doris
3 versions - Latest release: 6 months ago - 29 downloads last month - 11,272 stars on GitHub - 2 maintainers
pydoris 1.0.5
Python interface to Doris
8 versions - Latest release: 3 months ago - 24.2 thousand downloads last month - 10,473 stars on GitHub - 2 maintainers
Top 7.8% on pypi.org
hbase-python 0.5
User friendly HBase client for Python 3. (Pure python implementation)
5 versions - Latest release: almost 6 years ago - 6 dependent repositories - 6.06 thousand downloads last month - 41 stars on GitHub - 2 maintainers
ym-impyla 0.14.0
Python client for the Impala distributed query engine
1 version - Latest release: over 7 years ago - 1 dependent repositories - 11 downloads last month - 1 stars on GitHub - 2 maintainers
python3-lzo-indexer 0.3.0
Library for indexing LZO compressed files
4 versions - Latest release: over 5 years ago - 1 dependent repositories - 69 downloads last month - 1 stars on GitHub - 1 maintainer
splitlog 3.0.0
Utility to split aggregated logs from Apache Hadoop Yarn applications into a folder hierarchy
10 versions - Latest release: 7 months ago - 52 downloads last month - 0 stars on GitHub - 2 maintainers
clusterdock 2.3.0
clusterdock is a framework for creating Docker-based container clusters
24 versions - Latest release: over 3 years ago - 1 dependent repositories - 341 downloads last month - 29 stars on GitHub - 6 maintainers
thumbor_hbase 0.11
HBase image storage for Thumbor
11 versions - Latest release: almost 11 years ago - 2 dependent repositories - 25 downloads last month - 9 stars on GitHub - 2 maintainers
risk-command-center 1.0.37
Risk Command Center, manage your risk easly.
2 versions - Latest release: almost 2 years ago - 1 dependent repositories - 10 downloads last month - 11 stars on GitHub - 1 maintainer
sparkdh 0.0.1
1 version - Latest release: about 2 years ago - 1 dependent repositories - 6 downloads last month - 0 stars on GitHub - 2 maintainers
spooq 3.4.0
Spooq is a PySpark based helper library for ETL data ingestion pipeline in Data Lakes.
11 versions - Latest release: about 2 months ago - 1 dependent repositories - 19.3 thousand downloads last month - 8 stars on GitHub - 2 maintainers
impyla-jz 0.16.3
Python client for the Impala distributed query engine
1 version - Latest release: almost 3 years ago - 1 dependent repositories - 12 downloads last month - 0 stars on GitHub - 2 maintainers
dbsync 0.1.1
Sync database to hadoop
1 version - Latest release: about 9 years ago - 1 dependent repositories - 17 downloads last month - 0 stars on GitHub - 2 maintainers
spark-yarn-submit 1.0.0
library to handle spark job submit in a yarn cluster in different environment
1 version - Latest release: about 7 years ago - 1 dependent repositories - 3 downloads last month - 3 stars on GitHub - 2 maintainers
Top 9.2% on pypi.org
madoop 1.2.2
A light weight MapReduce framework for education.
11 versions - Latest release: about 1 month ago - 2 dependent repositories - 1.45 thousand downloads last month - 9 stars on GitHub - 2 maintainers
knit 0.2.4 💰
Python wrapper for YARN Applications
6 versions - Latest release: over 6 years ago - 4 dependent repositories - 57 downloads last month - 53 stars on GitHub - 3 maintainers
Top 7.1% on pypi.org
starbase 0.3.3
Python client for HBase Stargate REST server
13 versions - Latest release: over 9 years ago - 14 dependent repositories - 840 downloads last month - 53 stars on GitHub - 2 maintainers
ambari-ldap-manager 0.7
A tool to manage Ambari users and groups when authentication uses LDAP.
5 versions - Latest release: almost 7 years ago - 1 dependent repositories - 6 downloads last month - 0 stars on GitHub - 1 maintainer
pymrgeo 1.0.2
MrGeo (pronounced "Mister Geo") is an open source geospatial toolkit designed to provide raster-b...
3 versions - Latest release: almost 7 years ago - 1 dependent repositories - 10 downloads last month - 201 stars on GitHub - 2 maintainers
Top 9.8% on pypi.org
pinball 0.2.12
Workflow manager and scheduler
13 versions - Latest release: about 8 years ago - 4 dependent repositories - 19 downloads last month - 1,047 stars on GitHub - 2 maintainers
Top 8.6% on pypi.org
h2o-client 3.46.0.1
H2O, Fast Scalable Machine Learning, for python
12 versions - Latest release: about 2 months ago - 3 dependent repositories - 587 downloads last month - 6,710 stars on GitHub - 1 maintainer
h2o-mlflow-flavor 0.1.0
A mlflow flavor for working with H2O-3 MOJO and POJO models
1 version - Latest release: 6 months ago - 20 downloads last month - 6,702 stars on GitHub - 2 maintainers
tf-yarn 0.7.0
Distributed TensorFlow or pythorch on a YARN cluster
19 versions - Latest release: 6 months ago - 2 dependent repositories - 37 downloads last month - 86 stars on GitHub - 14 maintainers
tf-yarn-gpu 0.6.3
Distributed TensorFlow on a YARN cluster with Gpu support
1 version - Latest release: over 2 years ago - 1 dependent repositories - 6 downloads last month - 86 stars on GitHub - 2 maintainers
kiji-bento-cluster 2.0.10
CDH and Datastax Enterprise docker single-node development cluster.
6 versions - Latest release: over 9 years ago - 2 dependent repositories - 7 downloads last month - 1 maintainer
hivehoney 1.0.4
Client-less data retrieval from Hive.
5 versions - Latest release: over 5 years ago - 1 dependent repositories - 18 downloads last month - 3 stars on GitHub - 2 maintainers
trustedanalytics 0.7.3.post20161020785
Trusted Analytics Toolkit
161 versions - Latest release: over 7 years ago - 2 dependent repositories - 279 downloads last month - 43 stars on GitHub - 2 maintainers
cobra-policytool 1.1.6
Tool for manage Hadoop access using Apache Atlas and Ranger.
9 versions - Latest release: almost 5 years ago - 1 dependent repositories - 38 downloads last month - 16 stars on GitHub - 2 maintainers
streamsx.hdfs 1.5.9
HDFS integration for IBM Streams
18 versions - Latest release: over 3 years ago - 167 downloads last month - 9 stars on GitHub - 8 maintainers
hcompressor 1.0.0 removed
Hcompressor is a tool to compress files in HDFS
1 version - Latest release: over 1 year ago - 15 downloads last month - 1 stars on GitHub - 1 maintainer