Ecosyste.ms: Packages

An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "bigdata" keyword

Top 5.5% on pypi.org
arvados-python-client 2.7.2
Arvados client library
689 versions - Latest release: about 1 month ago - 11 dependent repositories - 7.61 thousand downloads last month - 364 stars on GitHub - 1 maintainer
arvados_fuse 2.7.2
Arvados FUSE driver
503 versions - Latest release: about 1 month ago - 2 dependent repositories - 2.08 thousand downloads last month - 364 stars on GitHub - 1 maintainer
Top 9.9% on pypi.org
arvados-cwl-runner 2.7.2
Arvados Common Workflow Language runner
361 versions - Latest release: about 1 month ago - 5 dependent repositories - 1.96 thousand downloads last month - 364 stars on GitHub - 1 maintainer
Top 2.2% on pypi.org
uproot 5.3.7
ROOT I/O in pure Python and NumPy.
305 versions - Latest release: 8 days ago - 83 dependent packages - 240 dependent repositories - 137 thousand downloads last month - 220 stars on GitHub - 2 maintainers
databend 1.2.453
Databend Python Binding
260 versions - Latest release: 11 days ago - 2.1 thousand downloads last month - 7,150 stars on GitHub - 3 maintainers
Top 4.4% on pypi.org
nflx-genie-client 3.6.17
Genie Python Client.
102 versions - Latest release: 10 months ago - 2 dependent repositories - 80.4 thousand downloads last month - 1,679 stars on GitHub - 3 maintainers
Top 1.3% on pypi.org
vaex-core 4.17.1
Core of vaex
101 versions - Latest release: 10 months ago - 16 dependent packages - 64 dependent repositories - 67 thousand downloads last month - 8,171 stars on GitHub - 1 maintainer
Top 8.3% on pypi.org
visualpython 3.0.2 💰
Visual Python is a GUI-based Python code generator, developed on the Jupyter Notebook as an exten...
89 versions - Latest release: 3 days ago - 1 dependent repositories - 1.53 thousand downloads last month - 755 stars on GitHub - 1 maintainer
Top 4.8% on pypi.org
optimuspyspark 2.2.32
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion wi...
83 versions - Latest release: almost 4 years ago - 8 dependent repositories - 10.7 thousand downloads last month - 1,441 stars on GitHub - 2 maintainers
cuallee 0.10.2
Python library for data validation on DataFrame APIs including Snowflake/Snowpark, Apache/PySpark...
75 versions - Latest release: 6 days ago - 1 dependent package - 1 dependent repositories - 12.8 thousand downloads last month - 111 stars on GitHub - 2 maintainers
Top 3.3% on pypi.org
fastwarc 0.14.7
A high-performance WARC parsing library for Python written in C++/Cython.
72 versions - Latest release: 18 days ago - 6 dependent packages - 5 dependent repositories - 209 thousand downloads last month - 42 stars on GitHub - 1 maintainer
Top 8.1% on pypi.org
km3io 1.1.0
"KM3NeT I/O library without ROOT"
67 versions - Latest release: 2 months ago - 2 dependent packages - 2 dependent repositories - 789 downloads last month - 314 stars on GitHub - 2 maintainers
Top 4.3% on pypi.org
resiliparse 0.14.7
A collection of robust and fast processing tools for parsing and analyzing (not only) web archive...
66 versions - Latest release: 18 days ago - 2 dependent packages - 4 dependent repositories - 20.2 thousand downloads last month - 42 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
vaex 4.17.0
Out-of-Core DataFrames to visualize and explore big tabular datasets
58 versions - Latest release: 10 months ago - 24 dependent packages - 90 dependent repositories - 21.6 thousand downloads last month - 8,171 stars on GitHub - 1 maintainer
Top 8.1% on pypi.org
datafaker 0.7.6
A tool for generating batch test data or stream data.
51 versions - Latest release: almost 3 years ago - 2 dependent repositories - 158 downloads last month - 607 stars on GitHub - 1 maintainer
Top 1.1% on pypi.org
vispy 0.14.2
Interactive visualization in Python
38 versions - Latest release: 2 months ago - 73 dependent packages - 287 dependent repositories - 96.8 thousand downloads last month - 3,206 stars on GitHub - 3 maintainers
Top 1.5% on pypi.org
vaex-hdf5 0.14.1
hdf5 file support for vaex
36 versions - Latest release: over 1 year ago - 4 dependent packages - 58 dependent repositories - 27.3 thousand downloads last month - 8,171 stars on GitHub - 1 maintainer
Top 1.6% on pypi.org
vaex-ml 0.18.3
Machine learning support for vaex
34 versions - Latest release: 10 months ago - 2 dependent packages - 47 dependent repositories - 20.5 thousand downloads last month - 8,171 stars on GitHub - 1 maintainer
airavata-django-portal-sdk 1.8.4
The Airavata Django Portal SDK is a library that makes it easier to develop Airavata Django Porta...
33 versions - Latest release: 10 months ago - 3 dependent repositories - 407 downloads last month - 0 stars on GitHub - 3 maintainers
Top 9.4% on pypi.org
pyoptimus 0.1.0
Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion.
32 versions - Latest release: over 1 year ago - 1 dependent repositories - 371 downloads last month - 1,441 stars on GitHub - 2 maintainers
Top 0.3% on pypi.org
avro 1.11.3
Avro is a serialization and RPC framework.
32 versions - Latest release: 8 months ago - 59 dependent packages - 1,066 dependent repositories - 7.08 million downloads last month - 2,755 stars on GitHub - 8 maintainers
arvados-pam 2.0.4
Arvados PAM module
32 versions - Latest release: over 3 years ago - 136 downloads last month - 364 stars on GitHub - 1 maintainer
hdxcli 1.0rc51
Hydrolix command line utility to do CRUD operations on projects, tables, transforms and other res...
32 versions - Latest release: about 2 months ago - 142 downloads last month - 6 stars on GitHub - 1 maintainer
lettria 6.0.2
Lettria official SDK for python
31 versions - Latest release: 8 months ago - 1 dependent repositories - 938 downloads last month - 11 stars on GitHub - 1 maintainer
Top 4.8% on pypi.org
uproot4 4.0.0
ROOT I/O in pure Python and NumPy.
30 versions - Latest release: over 3 years ago - 10 dependent packages - 11 dependent repositories - 20.7 thousand downloads last month - 199 stars on GitHub - 2 maintainers
athenacli 1.6.8
CLI for Athena Database. With auto-completion and syntax highlighting.
24 versions - Latest release: about 2 years ago - 1 dependent repositories - 1.21 thousand downloads last month - 206 stars on GitHub - 2 maintainers
pd-helper 1.0.0
A helpful script to optimize a Pandas DataFrame.
24 versions - Latest release: about 3 years ago - 1 dependent repositories - 133 downloads last month - 6 stars on GitHub - 1 maintainer
Top 1.5% on pypi.org
vaex-viz 0.5.4
Visualization for vaex
24 versions - Latest release: over 1 year ago - 3 dependent packages - 54 dependent repositories - 21.2 thousand downloads last month - 8,171 stars on GitHub - 1 maintainer
Top 1.5% on pypi.org
vaex-jupyter 0.8.2
Jupyter notebook and Jupyter lab support for vaex
23 versions - Latest release: 10 months ago - 4 dependent packages - 50 dependent repositories - 20.4 thousand downloads last month - 8,171 stars on GitHub - 1 maintainer
ukv 0.12.1
Python bindings for Unum's UStore.
23 versions - Latest release: about 1 year ago - 145 downloads last month - 478 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
cytoolz 0.12.3
Cython implementation of Toolz: High performance functional utilities
23 versions - Latest release: 4 months ago - 162 dependent packages - 11,437 dependent repositories - 2.43 million downloads last month - 964 stars on GitHub - 1 maintainer
Top 5.5% on pypi.org
jupyterlab-visualpython 3.0.2 💰
GUI-based Python code generator for Jupyter Lab as an extension
23 versions - Latest release: 3 days ago - 4 dependent repositories - 1.52 thousand downloads last month - 755 stars on GitHub - 1 maintainer
ustore 1.7.5
Python bindings for Unum's UStore.
21 versions - Latest release: almost 2 years ago - 1 dependent repositories - 27 downloads last month - 478 stars on GitHub - 1 maintainer
datacatalog-util 0.11.6
A package to manage Google Cloud Data Catalog helper commands and scripts
21 versions - Latest release: over 3 years ago - 1 dependent repositories - 354 downloads last month - 20 stars on GitHub - 1 maintainer
Top 1.5% on pypi.org
vaex-server 0.9.0
Webserver and client for vaex for a remote dataset
21 versions - Latest release: 10 months ago - 3 dependent packages - 51 dependent repositories - 20.4 thousand downloads last month - 8,171 stars on GitHub - 1 maintainer
crunchstat_summary 2.7.2
Arvados crunchstat-summary reads crunch log files and summarizes resource usage
20 versions - Latest release: about 1 month ago - 184 downloads last month - 364 stars on GitHub - 1 maintainer
dpark 0.5.0
Python clone of Spark, MapReduce like computing framework supporting iterative algorithms.
19 versions - Latest release: almost 6 years ago - 1 dependent repositories - 98 downloads last month - 2,693 stars on GitHub - 2 maintainers
pysparkify 0.27.0
Spark based ETL
18 versions - Latest release: 3 days ago - 1.01 thousand downloads last month - 1 stars on GitHub - 1 maintainer
heppyness 2.1.0
ROOT I/O in pure Python and NumPy.
18 versions - Latest release: about 1 month ago - 1 dependent repositories - 448 downloads last month - 314 stars on GitHub - 1 maintainer
Top 1.6% on pypi.org
vaex-astro 0.9.3
Astronomy related transformations and FITS file support
18 versions - Latest release: over 1 year ago - 2 dependent packages - 51 dependent repositories - 21 thousand downloads last month - 8,171 stars on GitHub - 1 maintainer
legend-pydataobj 1.7.0
LEGEND Python Data Objects
18 versions - Latest release: 9 days ago - 3 dependent packages - 2.46 thousand downloads last month - 0 stars on GitHub - 1 maintainer
pysparkgateway 0.0.22
Connect Pyspark to remote clusters
18 versions - Latest release: about 3 years ago - 1 dependent repositories - 18.1 thousand downloads last month - 3 stars on GitHub - 1 maintainer
grapetree 1.6.1
Web interface of GrapeTree, which is a program for phylogenetic analysis.
16 versions - Latest release: over 4 years ago - 1 dependent repositories - 109 downloads last month - 74 stars on GitHub - 3 maintainers
mextractor 5.0.0 💰
mextractor can extract media metadata to YAML and read them
16 versions - Latest release: about 2 months ago - 117 downloads last month - 4 stars on GitHub - 1 maintainer
django-mass-migration 0.2.9
Django app for long-running data migrations
16 versions - Latest release: 4 months ago - 1.1 thousand downloads last month - 2 stars on GitHub - 1 maintainer
wendelin.core 0.2
Out-of-core NumPy arrays
16 versions - Latest release: 9 months ago - 2 dependent repositories - 38 downloads last month - 1 maintainer
datacatalog-tag-manager 2.2.0
A package to manage Google Cloud Data Catalog tags, loading metadata from external sources
15 versions - Latest release: over 3 years ago - 1 dependent repositories - 409 downloads last month - 18 stars on GitHub - 1 maintainer
gigapipe 0.1.22
Gigapipe Python Client
13 versions - Latest release: over 1 year ago - 1 dependent repositories - 73 downloads last month - 1 maintainer
Top 0.4% on pypi.org
avro-python3 1.10.2
Avro is a serialization and RPC framework.
12 versions - Latest release: about 3 years ago - 37 dependent packages - 742 dependent repositories - 6.64 million downloads last month - 2,755 stars on GitHub - 7 maintainers
Top 4.3% on pypi.org
vaex-arrow 0.5.1
Arrow support for vaex
12 versions - Latest release: almost 4 years ago - 1 dependent package - 17 dependent repositories - 371 downloads last month - 8,171 stars on GitHub - 1 maintainer
hadeploy 0.6.1
An Hadoop Application deployment tool
12 versions - Latest release: over 5 years ago - 1 dependent repositories - 112 downloads last month - 10 stars on GitHub - 1 maintainer
bdtool 9.0.0
script of managing some bigdata technology stack
11 versions - Latest release: about 2 years ago - 1 dependent repositories - 29 downloads last month - 0 stars on GitHub - 1 maintainer
cwlab 0.4.1
A platform-agnostic, cloud-ready framework for simplified deployment of the Common Workflow Langu...
10 versions - Latest release: about 3 years ago - 1 dependent repositories - 53 downloads last month - 31 stars on GitHub - 1 maintainer
vulkn 19.0.10
The environmentally friendly petabyte scale Python eco-system built on Yandex ClickHouse
10 versions - Latest release: about 4 years ago - 1 dependent repositories - 78 downloads last month - 43 stars on GitHub - 1 maintainer
pysparkproxy 0.0.17
Seamlessly execute pyspark code on remote clusters
9 versions - Latest release: over 5 years ago - 1 dependent repositories - 43 downloads last month - 4 stars on GitHub - 1 maintainer
taospyudf 0.0.11
taos python udf
9 versions - Latest release: about 1 year ago - 926 downloads last month - 22,774 stars on GitHub - 1 maintainer
robotframework-dynamodbsqllibrary 0.3.1
An Amazon AWS DynamoDB big data testing library for Robot Framework with SQL-like DSL
9 versions - Latest release: about 1 year ago - 2 dependent repositories - 1.01 thousand downloads last month - 4 stars on GitHub - 1 maintainer
fdict 0.8.1
Easy out-of-core computing of recursive dict
9 versions - Latest release: almost 7 years ago - 1 dependent package - 1 dependent repositories - 14.5 thousand downloads last month - 7 stars on GitHub - 1 maintainer
Top 9.3% on pypi.org
anovos 1.1.0
An Open Source tool for Feature Engineering in Machine Learning
8 versions - Latest release: over 1 year ago - 2 dependent repositories - 1.56 thousand downloads last month - 77 stars on GitHub - 1 maintainer
datacatalog-fileset-enricher 1.2.0
A package for enriching the content of a fileset Entry with Datacatalog Tags
8 versions - Latest release: about 4 years ago - 1 dependent repositories - 216 downloads last month - 4 stars on GitHub - 1 maintainer
pytispark 1.0.1
TiSpark support for python
8 versions - Latest release: almost 6 years ago - 1 dependent repositories - 79 downloads last month - 878 stars on GitHub - 3 maintainers
alphareader 0.0.7
A reader for large files with custom delimiters and encodings
7 versions - Latest release: about 4 years ago - 1 dependent repositories - 27 downloads last month - 5 stars on GitHub - 1 maintainer
Top 3.7% on pypi.org
pybloom-live 4.0.0
Bloom filter: A Probabilistic data structure
7 versions - Latest release: over 1 year ago - 5 dependent packages - 65 dependent repositories - 1.37 million downloads last month - 157 stars on GitHub - 1 maintainer
Top 9.6% on pypi.org
vaex-ui 0.3.0
Graphical user interface for vaex based on Qt
7 versions - Latest release: over 4 years ago - 3 dependent repositories - 39 downloads last month - 8,171 stars on GitHub - 1 maintainer
pybda 0.1.0
Analysis of big biological data sets for distributed HPC clusters.
6 versions - Latest release: almost 5 years ago - 1 dependent repositories - 54 downloads last month - 9 stars on GitHub - 1 maintainer
datacatalog-fileset-processor 0.1.5
A package to manage Google Cloud Data Catalog Fileset scripts
6 versions - Latest release: over 3 years ago - 1 dependent repositories - 160 downloads last month - 3 stars on GitHub - 1 maintainer
lemuras 1.2.3
A small Python library to deal with big tables
6 versions - Latest release: about 4 years ago - 1 dependent repositories - 68 downloads last month - 3 stars on GitHub - 1 maintainer
objetive 0.6
A mini-crawler that aims to grab some text parts from some website or ip that responds http*
6 versions - Latest release: over 4 years ago - 3 dependent repositories - 177 downloads last month - 0 stars on GitHub - 1 maintainer
Top 2.7% on pypi.org
uproot3 3.14.4
ROOT I/O in pure Python and Numpy.
5 versions - Latest release: over 3 years ago - 15 dependent packages - 30 dependent repositories - 14.5 thousand downloads last month - 314 stars on GitHub - 1 maintainer
hivehoney 1.0.4
Client-less data retrieval from Hive.
5 versions - Latest release: over 5 years ago - 1 dependent repositories - 18 downloads last month - 3 stars on GitHub - 1 maintainer
elasticsearch-partition 2.0.0
A Python library for creating Elasticsearch partitioned indexes by date range
5 versions - Latest release: about 5 years ago - 1 dependent repositories - 68 downloads last month - 4 stars on GitHub - 1 maintainer
zqy 1.0.5
belongs to zqy.
4 versions - Latest release: over 6 years ago - 1 dependent repositories - 45 downloads last month - 1 maintainer
datacatalog-custom-entries-manager 0.1.2
A package to manage Google Cloud Data Catalog custom entries
4 versions - Latest release: over 3 years ago - 1 dependent repositories - 57 downloads last month - 2 stars on GitHub - 1 maintainer
bigdatacloudapi-client 1.0.3
A Python client for BigDataCloud API connectivity (https://www.bigdatacloud.com)
4 versions - Latest release: 8 months ago - 285 downloads last month - 10 stars on GitHub - 1 maintainer
bart-simulator 0.2.1
Send event views to Google Analytics and Generator customers or products
4 versions - Latest release: over 3 years ago - 1 dependent repositories - 47 downloads last month - 0 stars on GitHub - 1 maintainer
elixirnote 4.0.0a30
go from data to knowledge
4 versions - Latest release: over 1 year ago - 19 downloads last month - 10 stars on GitHub - 1 maintainer
vector-lake 0.0.4
S3 vector database for bigdata
4 versions - Latest release: 9 months ago - 30 downloads last month - 20 stars on GitHub - 1 maintainer
Top 6.2% on pypi.org
vaex-graphql 0.2.0
GraphQL support for accessing vaex DataFrame
3 versions - Latest release: about 3 years ago - 9 dependent repositories - 344 downloads last month - 8,171 stars on GitHub - 1 maintainer
bigdatasml 0.1.3
This package calculates average student performances
3 versions - Latest release: over 2 years ago - 40 downloads last month - 1 maintainer
pscli 1.1.1
Unofficial Cisco Parstream Client with improved cli
3 versions - Latest release: over 3 years ago - 1 dependent repositories - 17 downloads last month - 2 stars on GitHub - 1 maintainer
ckitoolz 0.1.1
Cython implementation of Kitoolz: High performance functional utilities
3 versions - Latest release: almost 3 years ago - 1 dependent repositories - 46 downloads last month - 1 maintainer
pyspark-spy 1.0.2
Collect and aggregate on spark events for profitz. In 🐍 way!
3 versions - Latest release: about 3 years ago - 1 dependent repositories - 218 downloads last month - 10 stars on GitHub - 1 maintainer
cybershuttle-sdk 0.0.1a4
Python SDK for Apache Airavata Cybershuttle
3 versions - Latest release: 8 months ago - 47 downloads last month - 100 stars on GitHub - 1 maintainer
Top 10.0% on pypi.org
timehash 1.2
Module to encode and decode timestamps to/from TimeHashes
3 versions - Latest release: over 2 years ago - 2 dependent repositories - 486 downloads last month - 37 stars on GitHub - 1 maintainer
vaex-contrib 0.1.3
Community contributed modules to vaex
3 versions - Latest release: over 1 year ago - 1 dependent repositories - 44 downloads last month - 8,171 stars on GitHub - 1 maintainer
vaex-distributed 0.3.0
Distributed dataset for vaex
3 versions - Latest release: about 5 years ago - 1 dependent repositories - 18 downloads last month - 8,171 stars on GitHub - 1 maintainer
spark-celery 0.1.1
A helper to allow Python Celery tasks to do work in a Spark job
3 versions - Latest release: over 6 years ago - 1 dependent repositories - 360 downloads last month - 27 stars on GitHub - 1 maintainer
panel-vegafusion 0.0.3
Build interactive big data apps with Altair and Vega easily using Panel + VegaFusion.
3 versions - Latest release: over 2 years ago - 1 dependent repositories - 24 downloads last month - 14 stars on GitHub - 1 maintainer
datacatalog-custom-model-manager 0.1.1
A package to load user-specified metadata models into Google Cloud Data Catalog
2 versions - Latest release: over 3 years ago - 1 dependent repositories - 39 downloads last month - 6 stars on GitHub - 1 maintainer
dtshare 1.0.3
Open financial data
2 versions - Latest release: about 4 years ago - 1 dependent repositories - 28 downloads last month - 514 stars on GitHub - 1 maintainer
risk-command-center 1.0.37
Risk Command Center, manage your risk easly.
2 versions - Latest release: about 2 years ago - 1 dependent repositories - 10 downloads last month - 11 stars on GitHub - 1 maintainer
hdx-cli 1.0rc3 removed
Hydrolix command line utility to do CRUD operations on projects, tables, transforms and other res...
2 versions - Latest release: over 1 year ago - 17 downloads last month
bart-extract-ga 0.1.1
Extract event views to Google Analytics
2 versions - Latest release: over 3 years ago - 1 dependent repositories - 7 downloads last month - 0 stars on GitHub - 1 maintainer
pycebes 0.10.2
Python client for Cebes HTTP server.
2 versions - Latest release: over 6 years ago - 1 dependent repositories - 13 downloads last month - 2 stars on GitHub - 1 maintainer
iterdict 0.2.0
Dict that lazily populates itself with items from the iterator it was constructed with as keys ar...
2 versions - Latest release: about 12 years ago - 1 dependent repositories - 11 downloads last month - 2 stars on GitHub - 1 maintainer
mister 0.0.2
Approachable map/reduce jobs
2 versions - Latest release: over 5 years ago - 1 dependent repositories - 23 downloads last month - 0 stars on GitHub - 1 maintainer
dlopes7-avro 1.12.0
Avro is a serialization and RPC framework.
1 version - Latest release: 7 months ago - 39 downloads last month - 2,755 stars on GitHub - 1 maintainer
Top 5.0% on pypi.org
bigartm 0.9.2
BigARTM: the state-of-the-art platform for topic modeling
1 version - Latest release: over 4 years ago - 1 dependent package - 11 dependent repositories - 438 downloads last month - 661 stars on GitHub - 2 maintainers
akudu 0.0.1
Asyncio Client for Apache Kudu Database Engine (Storage System).
1 version - Latest release: about 1 year ago - 19 downloads last month - 1 stars on GitHub - 1 maintainer
airconditioner 0.9.2
Yaml based DAG configurator for airflow
1 version - Latest release: almost 5 years ago - 1 dependent repositories - 19 downloads last month - 0 stars on GitHub - 1 maintainer