An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "data-processing" keyword

View the packages on the pypi.org package registry that are tagged with the "data-processing" keyword.

snow-workspace-extractor 0.1.1
Snowflake Wokspace Extractor is a Python package that provides a simple way to extract and analyz...
2 versions - Latest release: 3 months ago - 33 downloads last month - 3 stars on GitHub - 1 maintainer
openforis-whisp 0.0.1
Whisp (What is in that plot) is an open-source solution which helps to produce relevant forest mo...
17 versions - Latest release: 7 months ago - 646 downloads last month - 30 stars on GitHub - 1 maintainer
xl2times 0.3.0
An open source tool to convert TIMES models specified in Excel to a format ready for processing b...
6 versions - Latest release: 4 months ago - 45 downloads last month - 18 stars on GitHub - 2 maintainers
niamoto 0.7.3
Niamoto is a command-line application and library focused on processing and publishing botanical ...
29 versions - Latest release: about 2 months ago - 207 downloads last month - 3 stars on GitHub - 1 maintainer
cratedb-toolkit 0.0.41
CrateDB Toolkit
39 versions - Latest release: about 1 month ago - 3 dependent packages - 1 dependent repositories - 4.27 thousand downloads last month - 11 stars on GitHub - 5 maintainers
Top 1.4% on pypi.org
pandera 0.26.1 💰
A light-weight and flexible data validation and testing tool for statistical data objects.
110 versions - Latest release: about 1 month ago - 97 dependent packages - 229 dependent repositories - 6.79 million downloads last month - 3,009 stars on GitHub - 3 maintainers
nemo-curator 1.0.0
Scalable Data Preprocessing Tool for Training Large Language Models
19 versions - Latest release: about 19 hours ago - 1.9 thousand downloads last month - 1,130 stars on GitHub - 5 maintainers
pybcsv 1.0.3
High-performance Python bindings for the BCSV (Binary CSV) library with pandas integration
2 versions - Latest release: about 19 hours ago - 98 downloads last month - 1 maintainer
pathway 1.3.1
Pathway is a data processing framework which takes care of streaming data updates for you.
85 versions - Latest release: over 15 years ago - 1 dependent package - 1 dependent repositories - 11.5 thousand downloads last month - 43,563 stars on GitHub - 4 maintainers
Top 9.6% on pypi.org
vip-hci 1.6.6
Package for astronomical high-contrast image processing.
49 versions - Latest release: 5 months ago - 1 dependent package - 2 dependent repositories - 3.33 thousand downloads last month - 76 stars on GitHub - 2 maintainers
Top 6.6% on pypi.org
libertem 0.15.2
Open pixelated STEM framework
42 versions - Latest release: 3 months ago - 2 dependent packages - 3 dependent repositories - 4.91 thousand downloads last month - 120 stars on GitHub - 3 maintainers
nvidia-dali-weekly-cuda130 1.52.0.dev20250928
NVIDIA DALI weekly for CUDA 13.0. Git SHA: 74f92e03f3082c286ab41fe6fc1500c2895fef0f
2 versions - Latest release: 3 days ago - 165 downloads last month - 5,512 stars on GitHub - 1 maintainer
nvidia-dali-weekly-cuda120 1.52.0.dev20250720
NVIDIA DALI weekly for CUDA 12.0. Git SHA: 67f2c79cbb2488d43757d94e30369464f2a516eb
37 versions - Latest release: 2 months ago - 163 downloads last month - 5,512 stars on GitHub - 1 maintainer
Top 6.1% on pypi.org
nvidia-dali-tf-plugin-cuda110 1.50.0
NVIDIA DALI TensorFlow plugin for CUDA 11.0. Git SHA: d5c7b54f776fcba58944048f984f5645e7d7d1bb
37 versions - Latest release: 4 months ago - 6 dependent repositories - 304 downloads last month - 5,512 stars on GitHub - 1 maintainer
nvidia-dali-cuda130 1.51.2
NVIDIA DALI for CUDA 13.0. Git SHA: 81b43417a7c0321aa9dc27e197410a16183c00d6
2 versions - Latest release: about 2 months ago - 85 downloads last month - 5,512 stars on GitHub - 1 maintainer
Top 4.0% on pypi.org
nvidia-dali-cuda120 1.51.2
NVIDIA DALI for CUDA 12.0. Git SHA: 81b43417a7c0321aa9dc27e197410a16183c00d6
32 versions - Latest release: about 2 months ago - 1 dependent package - 11 dependent repositories - 35.1 thousand downloads last month - 5,512 stars on GitHub - 1 maintainer
nvidia-dali-tf-plugin-weekly-cuda120 1.42.0.dev20240915
NVIDIA DALI weekly TensorFlow plugin for CUDA 12.0. Git SHA: 408c18bb0d8a7c1b300e02fd7f6bb58369f...
27 versions - Latest release: about 1 year ago - 99 downloads last month - 5,512 stars on GitHub - 1 maintainer
nvidia-dali-tf-plugin-cuda120 1.51.2
NVIDIA DALI TensorFlow plugin for CUDA 12.0. Git SHA: 81b43417a7c0321aa9dc27e197410a16183c00d6
31 versions - Latest release: about 2 months ago - 5.85 thousand downloads last month - 5,512 stars on GitHub - 1 maintainer
nvidia-dali-tf-plugin-cuda130 1.51.2
NVIDIA DALI TensorFlow plugin for CUDA 13.0. Git SHA: 81b43417a7c0321aa9dc27e197410a16183c00d6
2 versions - Latest release: about 2 months ago - 21 downloads last month - 5,512 stars on GitHub - 1 maintainer
Top 2.1% on pypi.org
nvidia-dali-cuda110 1.50.0
NVIDIA DALI for CUDA 11.0. Git SHA: d5c7b54f776fcba58944048f984f5645e7d7d1bb
37 versions - Latest release: 4 months ago - 5 dependent packages - 95 dependent repositories - 5.38 thousand downloads last month - 5,512 stars on GitHub - 1 maintainer
nvidia-dali-nightly-cuda130 1.52.0.dev20250930
NVIDIA DALI nightly for CUDA 13.0. Git SHA: 74f92e03f3082c286ab41fe6fc1500c2895fef0f
11 versions - Latest release: 1 day ago - 601 downloads last month - 5,512 stars on GitHub - 1 maintainer
nvidia-dali-tf-plugin-nightly-cuda110 1.43.0.dev20240919
NVIDIA DALI nightly TensorFlow plugin for CUDA 11.0. Git SHA: 94f02ad69abe149f345684ef2aba3e13d2...
152 versions - Latest release: about 1 year ago - 503 downloads last month - 4,992 stars on GitHub - 1 maintainer
nvidia-dali-tf-plugin-nightly-cuda120 1.43.0.dev20240919
NVIDIA DALI nightly TensorFlow plugin for CUDA 12.0. Git SHA: 94f02ad69abe149f345684ef2aba3e13d2...
170 versions - Latest release: about 1 year ago - 610 downloads last month - 4,992 stars on GitHub - 1 maintainer
nvidia-dali-nightly-cuda110 1.52.0.dev20250626
NVIDIA DALI nightly for CUDA 11.0. Git SHA: 2be08c56f2be9ec8055256256039eb534ab7a080
241 versions - Latest release: 3 months ago - 1 dependent repositories - 800 downloads last month - 4,992 stars on GitHub - 1 maintainer
nvidia-dali-nightly-cuda120 1.52.0.dev20250930
NVIDIA DALI nightly for CUDA 12.0. Git SHA: 74f92e03f3082c286ab41fe6fc1500c2895fef0f
269 versions - Latest release: 1 day ago - 1.52 thousand downloads last month - 4,992 stars on GitHub - 1 maintainer
stardust-sdk 0.1.0
Stardust SDK for AI/ML data processing and annotation workflows
1 version - Latest release: 1 day ago - 1 maintainer
pydpm-xl 0.1.14
Python library for DPM-XL data processing and analysis
15 versions - Latest release: 2 days ago - 866 downloads last month - 1 maintainer
cocoindex 0.2.16
With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and kee...
111 versions - Latest release: 2 days ago - 12.5 thousand downloads last month - 2,864 stars on GitHub - 1 maintainer
Top 7.7% on pypi.org
heat 1.6.0
A framework for high-performance data analytics and machine learning.
24 versions - Latest release: 29 days ago - 8 dependent repositories - 1.32 thousand downloads last month - 223 stars on GitHub - 5 maintainers
dolma 1.2.1
Toolkit for pre-processing LLM training data.
41 versions - Latest release: 3 months ago - 6.01 thousand downloads last month - 1,314 stars on GitHub - 4 maintainers
dolma-rust-components 1.3.0
Rust components for Dolma - Toolkit for pre-processing LLM training data.
2 versions - Latest release: 13 days ago - 59 downloads last month - 1,314 stars on GitHub - 2 maintainers
qufe 0.5.11
A comprehensive Python utility library for data processing, file handling, database management, a...
17 versions - Latest release: 3 days ago - 1.84 thousand downloads last month - 0 stars on GitHub - 1 maintainer
ooflow 0.1.2
A lightweight Python framework for building asynchronous data processing pipelines with stateful ...
2 versions - Latest release: 11 days ago - 235 downloads last month - 1 stars on GitHub - 1 maintainer
fleetfluid 0.1.4
AI Agent Functions for ETL/Data Processing
4 versions - Latest release: 4 days ago - 41 downloads last month - 0 stars on GitHub - 1 maintainer
electiongraphs 0.3.4
Create graphs for displaying the result of a election based on a csv-inputfile.
4 versions - Latest release: almost 2 years ago - 70 downloads last month - 0 stars on GitHub - 1 maintainer
minispark 0.1.10
一个轻量级的Python库,用于从多种数据源读取数据并在本地进行高效处理,类似于Apache Spark的功能
1 version - Latest release: about 1 month ago - 14 downloads last month - 0 stars on GitHub - 1 maintainer
hstreamdb-api 0.6.1
HStreamDB api for Python
11 versions - Latest release: about 2 years ago - 2 dependent repositories - 25 downloads last month - 727 stars on GitHub - 1 maintainer
pipevine 0.1.1
A high-performance async pipeline processing library for Python
2 versions - Latest release: 5 days ago - 119 downloads last month - 1 stars on GitHub - 1 maintainer
norve 0.1.1
Multi-Source Data Processing Language
2 versions - Latest release: 7 days ago - 479 downloads last month - 0 stars on GitHub - 1 maintainer
norvelang 0.1.1
Multi-Source Data Processing Language
2 versions - Latest release: 7 days ago - 401 downloads last month - 0 stars on GitHub - 1 maintainer
python-pyper 0.4.4
Concurrent Python made simple
11 versions - Latest release: 9 months ago - 709 downloads last month - 1,503 stars on GitHub - 1 maintainer
abracudabra 0.1.3
Convert Tensors, Arrays and DataFrames Between CPU and CUDA
4 versions - Latest release: 7 months ago - 90 downloads last month - 1 stars on gitlab.cern.ch - 1 maintainer
glassflow 3.2.0
GlassFlow Python SDK: Create GlassFlow pipelines between Kafka and ClickHouse
28 versions - Latest release: 6 days ago - 558 downloads last month - 9 stars on GitHub - 1 maintainer
splurge-data-profiler 2025.2.0
A data profiling tool for delimited and database sources.
4 versions - Latest release: 24 days ago - 127 downloads last month - 1 stars on GitHub - 1 maintainer
mcp-apache-spark-history-server 0.1.4
Model Context Protocol (MCP) server for Apache Spark History Server with job comparison and analy...
6 versions - Latest release: 14 days ago - 386 downloads last month - 81 stars on GitHub - 3 maintainers
pydatamax 0.2.0
Advanced Data Crawling and Processing Framework
20 versions - Latest release: 29 days ago - 310 downloads last month - 140 stars on GitHub - 1 maintainer
Top 7.9% on pypi.org
texar 0.2.4
Toolkit for Machine Learning and Text Generation
5 versions - Latest release: almost 6 years ago - 9 dependent repositories - 124 downloads last month - 2,389 stars on GitHub - 2 maintainers
paged-list 0.1.3
A disk-backed list implementation for handling large datasets efficiently
3 versions - Latest release: about 1 month ago - 368 downloads last month - 0 stars on GitHub - 1 maintainer
pytemporal 1.3.15
High-performance bitemporal data processing for Python
18 versions - Latest release: 25 days ago - 1.4 thousand downloads last month - 1 stars on GitHub - 1 maintainer
smallpond 0.15.0
A lightweight data processing framework built on DuckDB and shared file system.
3 versions - Latest release: 7 months ago - 117 downloads last month - 4,784 stars on GitHub - 1 maintainer
cala 0.1.0
Cala is a neural endoscope image processing tool designed for neuroscience research, with a focus...
4 versions - Latest release: 12 months ago - 18 downloads last month - 3 stars on GitHub - 1 maintainer
open-dataflow-adp 1.1.9
Modern Data Centric AI system for Large Language Models
20 versions - Latest release: 7 days ago - 500 downloads last month - 1,329 stars on GitHub - 1 maintainer
dagster-kafka 1.3.1
Enterprise-grade Kafka integration for Dagster with Confluent Connect, comprehensive serializatio...
9 versions - Latest release: about 1 month ago - 150 downloads last month - 9 stars on GitHub - 1 maintainer
Top 8.4% on pypi.org
forte 0.2.0
Forte is extensible framework for building composable and modularized NLP workflows.
13 versions - Latest release: over 3 years ago - 6 dependent repositories - 550 downloads last month - 248 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-cu12 0.6.0.32
NVIDIA nvimgcodec for CUDA 12. Git SHA:
6 versions - Latest release: about 2 months ago - 3 dependent packages - 62.7 thousand downloads last month - 119 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-cu11 0.6.0.32
NVIDIA nvimgcodec for CUDA 11. Git SHA:
6 versions - Latest release: about 2 months ago - 2 dependent packages - 2.74 thousand downloads last month - 119 stars on GitHub - 1 maintainer
Top 8.3% on pypi.org
texar-pytorch 0.1.4
Toolkit for Machine Learning and Text Generation
5 versions - Latest release: over 3 years ago - 1 dependent package - 14 dependent repositories - 1.81 thousand downloads last month - 748 stars on GitHub - 1 maintainer
tasrif 0.1.0
Tasrif is a python library for processing of wearable data from fitness trackers and wearable hea...
7 versions - Latest release: over 3 years ago - 1 dependent repositories - 29 downloads last month - 15 stars on GitHub - 1 maintainer
postql 1.0.3
Python wrapper for Postgres
3 versions - Latest release: over 1 year ago - 14 downloads last month - 0 stars on GitHub - 1 maintainer
sofine 0.2.4
Lightweight framework for creating data-collection plugins and chaining together calls to them, f...
10 versions - Latest release: almost 11 years ago - 2 dependent repositories - 19 downloads last month - 7 stars on GitHub - 1 maintainer
splurge-typer 2025.0.1
Type Inference and Conversion Library for Python
2 versions - Latest release: 28 days ago - 308 downloads last month - 0 stars on GitHub - 1 maintainer
airow 0.1.0
AI-powered DataFrame processing made simple
3 versions - Latest release: 19 days ago - 425 downloads last month - 3 stars on GitHub - 1 maintainer
roll-rate-analysis 0.1.7
Roll Rate Analysis python package. Both month over month and snapshot roll rate functionalities a...
6 versions - Latest release: over 1 year ago - 9 downloads last month - 3 stars on GitHub - 1 maintainer
lineagentic-flow 1.0.2
Lineagentic-flow is agentic ai approach for building data lineage across diverse data processing ...
4 versions - Latest release: about 1 month ago - 3 stars on GitHub - 1 maintainer
snowpark-checkpoints 0.4.0
Snowflake Snowpark Checkpoints
16 versions - Latest release: 3 months ago - 71 downloads last month - 5 stars on GitHub - 1 maintainer
oxidize 0.7.0
High-performance data processing tools for Python, built with Rust
8 versions - Latest release: 25 days ago - 316 downloads last month - 0 stars on GitHub - 1 maintainer
irdap 1.3.5
IRDAP is a highly-automated end-to-end pipeline to reduce SPHERE-IRDIS polarimetric data
17 versions - Latest release: over 2 years ago - 1 dependent repositories - 68 downloads last month - 6 stars on GitHub - 1 maintainer
tikara 0.1.6
The metadata and text content extractor for almost every file type.
6 versions - Latest release: 8 months ago - 55 downloads last month - 4 stars on GitHub - 1 maintainer
abpytools 0.3.2
Python package for antibody analysis
11 versions - Latest release: almost 7 years ago - 1 dependent repositories - 262 downloads last month - 25 stars on GitHub - 1 maintainer
laygo 0.1.2
A lightweight Python library for building resilient, in-memory data pipelines with elegant, chain...
3 versions - Latest release: 3 months ago - 32 downloads last month - 3 stars on GitHub - 1 maintainer
msdlib 1.1.13
msdlib is meant for making life easier of a common data scientist/data analyst/ML enginner.
47 versions - Latest release: over 1 year ago - 1 dependent repositories - 241 downloads last month - 14 stars on GitHub - 1 maintainer
vaspy 0.8.12
A pure Python library designed to make it easy and quick to manipulate VASP files
19 versions - Latest release: over 6 years ago - 1 dependent repositories - 79 downloads last month - 286 stars on GitHub - 1 maintainer
dft-pipeline 0.3.24
Data Flow Tools - flexible ETL pipeline framework
36 versions - Latest release: 3 months ago - 231 downloads last month - 1 maintainer
kobo2pandas 0.9.0
Desde la API de Kobo a pandas.DataFrame
1 version - Latest release: 4 months ago - 12 downloads last month - 0 stars on GitHub - 1 maintainer
align-utils 1.0.0
Utilities for parsing and processing align-system experiment data
1 version - Latest release: 20 days ago - 159 downloads last month - 0 stars on GitHub - 1 maintainer
datamaster-mcp 1.1.0
DataMaster MCP - AI-powered data analysis tool with MCP protocol support
10 versions - Latest release: 20 days ago - 856 downloads last month - 9 stars on GitHub - 1 maintainer
light-pipe 0.3.1
A high-level syntax for data pipelines, designed to make pipeline development quick and painless.
5 versions - Latest release: over 2 years ago - 40 downloads last month - 3 stars on GitHub - 1 maintainer
csv-cdc 0.1.3
A high-performance CSV Change Data Capture tool
4 versions - Latest release: 4 months ago - 34 downloads last month - 0 stars on GitHub - 1 maintainer
streamz-zmq 0.1.5
ZeroMQ integration for streamz - high-performance streaming data processing
6 versions - Latest release: 30 days ago - 224 downloads last month - 0 stars on GitHub - 1 maintainer
mathbox 0.0.8
A math toolbox.
6 versions - Latest release: about 3 years ago - 1 dependent repositories - 4 downloads last month - 5 stars on GitHub - 1 maintainer
spectroview 0.8.6
SPECTROView: A Tool for Spectroscopic Data Processing and Visualization.
33 versions - Latest release: 13 days ago - 853 downloads last month - 1 stars on GitHub - 1 maintainer
datasetops 0.0.6
Fluent dataset operations, compatible with your favorite libraries
4 versions - Latest release: over 5 years ago - 4 dependent repositories - 146 downloads last month - 11 stars on GitHub - 1 maintainer
buildify-api 2.0.0
Buildify API is a Python library for real estate data processing.
4 versions - Latest release: 8 months ago - 18 downloads last month - 0 stars on GitHub - 1 maintainer
relais 0.2.1
A practical tool for managing async pipelines.
4 versions - Latest release: 28 days ago - 37 downloads last month - 2 maintainers
flagged-csv 0.1.5
Convert XLSX files to CSV with visual formatting preserved as inline flags
6 versions - Latest release: 14 days ago - 264 downloads last month - 0 stars on GitHub - 1 maintainer
Top 5.2% on pypi.org
bonobo 0.6.4
Bonobo, a simple, modern and atomic extract-transform-load toolkit for python 3.5+.
37 versions - Latest release: over 6 years ago - 35 dependent repositories - 22.8 thousand downloads last month - 1,592 stars on GitHub - 2 maintainers
sheetwise 2.2.0 💰
A Python package for encoding spreadsheets for Large Language Models, implementing the Spreadshee...
5 versions - Latest release: about 1 month ago - 163 downloads last month - 25 stars on GitHub - 1 maintainer
smartpipeline 0.7.3
A framework for fast developing scalable data pipelines following a simple design pattern
11 versions - Latest release: almost 2 years ago - 1 dependent repositories - 57 downloads last month - 26 stars on GitHub - 1 maintainer
csv-field-extractor 1.0.1
A simple utility to extract specific fields from CSV files
2 versions - Latest release: 4 months ago - 19 downloads last month - 0 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-cu13 0.6.0.32
NVIDIA nvimgcodec for CUDA 13. Git SHA:
2 versions - Latest release: about 2 months ago - 2.31 thousand downloads last month - 117 stars on GitHub - 1 maintainer
aws-s3-controller 0.7.5
A collection of natural language-like utility functions to intuitively and easily control AWS's c...
18 versions - Latest release: 7 months ago - 191 downloads last month - 0 stars on GitHub - 1 maintainer
reki 2025.7.2
A data preparation tool for CEMC/CMA.
6 versions - Latest release: about 1 month ago - 1 dependent repositories - 94 downloads last month - 18 stars on GitHub - 1 maintainer
lumpur 0.0.6
learn to use methods for processing unclear response
6 versions - Latest release: 10 months ago - 32 downloads last month - 0 stars on GitHub - 1 maintainer
Top 5.8% on pypi.org
raydp-nightly 2025.7.14.dev0
RayDP: Distributed Data Processing on Ray
143 versions - Latest release: 3 months ago - 90 dependent repositories - 543 downloads last month - 346 stars on GitHub - 1 maintainer
tometo-tomato 0.0.1
A fuzzy join utility using DuckDB
1 version - Latest release: about 1 month ago - 121 downloads last month - 0 stars on GitHub - 1 maintainer
static-fire-toolkit 1.0.1
Command-line toolkit for static-fire test data processing and analysis
2 versions - Latest release: 13 days ago - 0 stars on GitHub - 1 maintainer
exonware-xnode 0.0.1
Node-based data processing and graph computation library
1 version - Latest release: 30 days ago - 1 maintainer
Top 4.0% on pypi.org
lithops 3.6.2
Lithops lets you transparently run your Python applications in the Cloud
56 versions - Latest release: 19 days ago - 2 dependent packages - 8 dependent repositories - 4.46 thousand downloads last month - 347 stars on GitHub - 2 maintainers
Top 6.4% on pypi.org
bonobo-sqlalchemy 0.6.1
Bonobo SQLAlchemy Extension
14 versions - Latest release: about 7 years ago - 2 dependent packages - 5 dependent repositories - 207 downloads last month - 25 stars on GitHub - 2 maintainers
Top 8.1% on pypi.org
bonobo-docker 0.6.0
Docker extension for Bonobo
18 versions - Latest release: over 7 years ago - 2 dependent packages - 3 dependent repositories - 223 downloads last month - 13 stars on GitHub - 2 maintainers