An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "data-processing" keyword

stagecraft 0.1.9
A Python library for building robust ETL pipelines with declarative stages and data flow management
10 versions - Latest release: 16 days ago - 428 downloads last month - 0 stars on GitHub - 1 maintainer
pylib-searchalgo 0.1.0
Search & sort algorithms with performance metrics. Essential for AI and ML applications. Perfect ...
1 version - Latest release: 4 months ago - 13 downloads last month - 1 maintainer
cryoflow 0.2.2
Plug-in-driven column-oriented data processing CLI tool with Polars LazyFrame at its core.
2 versions - Latest release: about 11 hours ago - 1 maintainer
Top 8.3% on pypi.org
texar-pytorch 0.1.4
Toolkit for Machine Learning and Text Generation
5 versions - Latest release: almost 4 years ago - 1 dependent package - 14 dependent repositories - 135 downloads last month - 747 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-cu12 0.7.0.11
NVIDIA nvimgcodec for CUDA 12.
8 versions - Latest release: 3 months ago - 3 dependent packages - 76.2 thousand downloads last month - 140 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-cu13 0.7.0.11
NVIDIA nvimgcodec for CUDA 13.
4 versions - Latest release: 3 months ago - 6.94 thousand downloads last month - 140 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-tegra-cu13 0.6.0.32
NVIDIA nvimgcodec tegra for CUDA 13. Git SHA:
2 versions - Latest release: 7 months ago - 18 downloads last month - 140 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-tegra-cu12 0.7.0.11
NVIDIA nvimgcodec tegra for CUDA 12.
7 versions - Latest release: 3 months ago - 118 downloads last month - 140 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-cu11 0.6.1.37
NVIDIA nvimgcodec for CUDA 11. Git SHA:
7 versions - Latest release: 4 months ago - 2 dependent packages - 11.2 thousand downloads last month - 140 stars on GitHub - 1 maintainer
cala 0.1.0
Cala is a neural endoscope image processing tool designed for neuroscience research, with a focus...
4 versions - Latest release: over 1 year ago - 42 downloads last month - 3 stars on GitHub - 1 maintainer
dalla-data-processing 0.0.11
data processing pipeline with deduplication, stemming, quality checking, and readability scoring,...
6 versions - Latest release: 2 months ago - 92 downloads last month - 1 maintainer
pylib-tzconvert 0.1.0
Time-zone conversions and aware datetimes. Timezone utilities.
1 version - Latest release: 4 months ago - 13 downloads last month - 1 maintainer
Top 1.4% on pypi.org
pandera 0.29.0 💰
A light-weight and flexible data validation and testing tool for statistical data objects.
116 versions - Latest release: about 1 month ago - 97 dependent packages - 229 dependent repositories - 8.63 million downloads last month - 3,009 stars on GitHub - 3 maintainers
cocoindex 999.0.0
With CocoIndex, users declare the transformation, CocoIndex creates & maintains an index, and kee...
175 versions - Latest release: 3 months ago - 48.7 thousand downloads last month - 6,148 stars on GitHub - 1 maintainer
pylib-aibox 0.1.0
Prompt templates & LLM orchestration helpers. Essential for AI agents and LLMs. Perfect for AI ag...
1 version - Latest release: 4 months ago - 10 downloads last month - 1 maintainer
splurge-data-profiler 2025.2.0
A data profiling tool for delimited and database sources.
4 versions - Latest release: 6 months ago - 23 downloads last month - 1 stars on GitHub - 1 maintainer
redditdumps 0.1.0
Lightweight utilities for processing Reddit data dumps in ZST format
1 version - Latest release: about 1 month ago - 24 downloads last month - 1 maintainer
ehrax 2025.9.22
EHRs io+processing for JAX+equinox compatible tasks, e.g. mainly ML.
2 versions - Latest release: 5 months ago - 50 downloads last month - 0 stars on GitHub - 1 maintainer
auralith-data-pipeline 0.1.7
Production-grade data collection and processing pipeline for training LLMs and multimodal AI
4 versions - Latest release: 1 day ago - 1 maintainer
openforis-whisp 0.0.1
Whisp (What is in that plot) is an open-source solution which helps to produce relevant forest mo...
31 versions - Latest release: 12 months ago - 686 downloads last month - 30 stars on GitHub - 1 maintainer
Top 6.4% on pypi.org
bonobo-sqlalchemy 0.6.1
Bonobo SQLAlchemy Extension
14 versions - Latest release: over 7 years ago - 2 dependent packages - 5 dependent repositories - 518 downloads last month - 25 stars on GitHub - 2 maintainers
hstreamdb-api 0.6.1
HStreamDB api for Python
11 versions - Latest release: over 2 years ago - 2 dependent repositories - 41 downloads last month - 727 stars on GitHub - 1 maintainer
snowflake-data-exchange-agent 1.3.2
Data exchange agent for migrations and validation
15 versions - Latest release: 12 days ago - 332 downloads last month - 1 maintainer
table-toolkit 2025.11.9
A Python library for consistent preprocessing of tabular data with automatic type inference, cach...
9 versions - Latest release: 4 months ago - 113 downloads last month - 0 stars on GitHub - 1 maintainer
nvidia-dali-nightly-cuda110 1.52.0.dev20250626
NVIDIA DALI nightly for CUDA 11.0. Git SHA: 2be08c56f2be9ec8055256256039eb534ab7a080
241 versions - Latest release: 8 months ago - 1 dependent repositories - 269 downloads last month - 4,992 stars on GitHub - 1 maintainer
Top 2.1% on pypi.org
nvidia-dali-cuda110 1.50.0
NVIDIA DALI for CUDA 11.0. Git SHA: d5c7b54f776fcba58944048f984f5645e7d7d1bb
37 versions - Latest release: 9 months ago - 5 dependent packages - 95 dependent repositories - 2.36 thousand downloads last month - 5,551 stars on GitHub - 1 maintainer
Top 4.0% on pypi.org
nvidia-dali-cuda120 1.53.0
NVIDIA DALI for CUDA 12.0. Git SHA: 55113c6cd54624aeebd7e3c0d93b4c4a68a34034
34 versions - Latest release: about 2 months ago - 1 dependent package - 11 dependent repositories - 60.3 thousand downloads last month - 5,531 stars on GitHub - 1 maintainer
spifpy 1.0.5
Single Particle Image Format (SPIF) data converter and interface
4 versions - Latest release: over 3 years ago - 24 downloads last month - 0 stars on GitHub - 1 maintainer
json-xlsx 0.1.0
Convert nested JSON data to formatted Excel files
3 versions - Latest release: 9 months ago - 42 downloads last month - 0 stars on GitHub - 1 maintainer
starlings 0.2.3
A Python package for comparing entity resolutions from different processes
4 versions - Latest release: 5 months ago - 380 downloads last month - 1 maintainer
qmm 0.19.0
Quadratic Majorize-Minimize Python toolbox
25 versions - Latest release: about 1 month ago - 1 dependent repositories - 303 downloads last month - 17 stars on GitHub - 1 maintainer
ultrasav 0.2.14
A Python package for working with SPSS/SAV files with two-track architecture separating data and ...
25 versions - Latest release: 29 days ago - 1.31 thousand downloads last month - 1 maintainer
nvidia-dali-cuda130 1.53.0
NVIDIA DALI for CUDA 13.0. Git SHA: 55113c6cd54624aeebd7e3c0d93b4c4a68a34034
4 versions - Latest release: about 2 months ago - 793 downloads last month - 5,531 stars on GitHub - 1 maintainer
nvidia-dali-tf-plugin-nightly-cuda110 1.43.0.dev20240919
NVIDIA DALI nightly TensorFlow plugin for CUDA 11.0. Git SHA: 94f02ad69abe149f345684ef2aba3e13d2...
152 versions - Latest release: over 1 year ago - 120 downloads last month - 4,992 stars on GitHub - 1 maintainer
nvidia-dali-tf-plugin-nightly-cuda120 1.43.0.dev20240919
NVIDIA DALI nightly TensorFlow plugin for CUDA 12.0. Git SHA: 94f02ad69abe149f345684ef2aba3e13d2...
170 versions - Latest release: over 1 year ago - 256 downloads last month - 4,992 stars on GitHub - 1 maintainer
nvidia-dali-nightly-cuda130 1.53.0.dev20251017
NVIDIA DALI nightly for CUDA 13.0. Git SHA: e2db4d795524dd2274dec9fbe479d2e8e50c6f23
18 versions - Latest release: 5 months ago - 60 downloads last month - 5,531 stars on GitHub - 1 maintainer
nvidia-dali-weekly-cuda120 1.52.0.dev20250720
NVIDIA DALI weekly for CUDA 12.0. Git SHA: 67f2c79cbb2488d43757d94e30369464f2a516eb
37 versions - Latest release: 7 months ago - 52 downloads last month - 5,531 stars on GitHub - 1 maintainer
nvidia-dali-tf-plugin-cuda120 1.53.0
NVIDIA DALI TensorFlow plugin for CUDA 12.0. Git SHA: 55113c6cd54624aeebd7e3c0d93b4c4a68a34034
33 versions - Latest release: about 2 months ago - 225 downloads last month - 5,564 stars on GitHub - 1 maintainer
nvidia-dali-tf-plugin-cuda130 1.53.0
NVIDIA DALI TensorFlow plugin for CUDA 13.0. Git SHA: 55113c6cd54624aeebd7e3c0d93b4c4a68a34034
4 versions - Latest release: about 2 months ago - 23 downloads last month - 5,564 stars on GitHub - 1 maintainer
nvidia-dali-nightly-cuda120 1.53.0.dev20251017
NVIDIA DALI nightly for CUDA 12.0. Git SHA: e2db4d795524dd2274dec9fbe479d2e8e50c6f23
276 versions - Latest release: 5 months ago - 630 downloads last month - 4,992 stars on GitHub - 1 maintainer
nvidia-dali-weekly-cuda130 1.52.0.dev20251005
NVIDIA DALI weekly for CUDA 13.0. Git SHA: 4da8adfb6b58c3a3c352f98c6f431b49323ac518
3 versions - Latest release: 5 months ago - 15 downloads last month - 5,539 stars on GitHub - 1 maintainer
Top 6.1% on pypi.org
nvidia-dali-tf-plugin-cuda110 1.50.0
NVIDIA DALI TensorFlow plugin for CUDA 11.0. Git SHA: d5c7b54f776fcba58944048f984f5645e7d7d1bb
37 versions - Latest release: 9 months ago - 6 dependent repositories - 98 downloads last month - 5,578 stars on GitHub - 1 maintainer
nvidia-dali-tf-plugin-weekly-cuda120 1.42.0.dev20240915
NVIDIA DALI weekly TensorFlow plugin for CUDA 12.0. Git SHA: 408c18bb0d8a7c1b300e02fd7f6bb58369f...
27 versions - Latest release: over 1 year ago - 46 downloads last month - 5,551 stars on GitHub - 1 maintainer
open-dataflow 1.0.9
Modern Data Centric AI system for Large Language Models
13 versions - Latest release: 5 days ago - 1.26 thousand downloads last month - 1,329 stars on GitHub - 2 maintainers
aeroviz 0.1.25
Aerosol data processing and visualization toolkit. Read, QC, and analyze data from SMPS, APS, AE3...
33 versions - Latest release: 16 days ago - 690 downloads last month - 12 stars on GitHub - 1 maintainer
smallpond 0.15.0
A lightweight data processing framework built on DuckDB and shared file system.
3 versions - Latest release: about 1 year ago - 246 downloads last month - 4,806 stars on GitHub - 1 maintainer
glide 0.4.1
Easy ETL
45 versions - Latest release: over 3 years ago - 1 dependent repositories - 11.7 thousand downloads last month - 18 stars on GitHub - 1 maintainer
acutils-python 0.1.1
Data processing library implemented by Acuzle.
2 versions - Latest release: about 2 years ago - 78 downloads last month - 4 stars on GitHub - 1 maintainer
rocketride 1.0.0
RocketRide Pipeline Python Client SDK
1 version - Latest release: 6 days ago - 71 downloads last month - 1 maintainer
pyspectrakit 1.9.6
Python toolkit for spectral data processing: baseline correction, normalization, smoothing, despi...
11 versions - Latest release: 6 days ago - 1 maintainer
Top 3.4% on pypi.org
bytewax 0.21.1
Python Stream Processing
34 versions - Latest release: over 1 year ago - 4 dependent packages - 20 dependent repositories - 22.4 thousand downloads last month - 1,844 stars on GitHub - 1 maintainer
miniduct 0.2.6
a small library for monothread orchestration of data pipelines
4 versions - Latest release: 8 months ago - 16 downloads last month - 1 maintainer
torchglyph 0.3.2
Data Processor Combinators for Natural Language Processing
8 versions - Latest release: about 4 years ago - 2 dependent repositories - 49 downloads last month - 7 stars on GitHub - 1 maintainer
cnpj-processor 4.3.1
Sistema de Processamento de Dados CNPJ da Receita Federal do Brasil
17 versions - Latest release: 12 days ago - 1.13 thousand downloads last month - 1 maintainer
graphbook_huggingface 0.0.6
Graphbook Hugging Face Plugin for no-code Hugging Face AI pipelines
5 versions - Latest release: 11 months ago - 51 downloads last month - 41 stars on GitHub - 1 maintainer
graphbook 0.13.3
The AI-driven data pipeline and workflow framework for data scientists and machine learning engin...
23 versions - Latest release: 11 months ago - 154 downloads last month - 41 stars on GitHub - 1 maintainer
dagster-kafka 1.3.1
Enterprise-grade Kafka integration for Dagster with Confluent Connect, comprehensive serializatio...
9 versions - Latest release: 6 months ago - 276 downloads last month - 9 stars on GitHub - 1 maintainer
Top 8.1% on pypi.org
flowquery 1.0.38
A declarative query language for data processing pipelines
38 versions - Latest release: 9 days ago - 3.21 thousand downloads last month - 6 stars on GitHub - 1 maintainer
zephflow 0.3.1
Python SDK for ZephFlow data processing pipelines
13 versions - Latest release: 5 months ago - 94 downloads last month - 6 stars on GitHub - 1 maintainer
biosets 1.2.1
Bioinformatics datasets and tools
5 versions - Latest release: over 1 year ago - 92 downloads last month - 3 stars on GitHub - 1 maintainer
fleetfluid 0.1.6
AI Agent Functions for ETL/Data Processing
6 versions - Latest release: 5 months ago - 59 downloads last month - 0 stars on GitHub - 1 maintainer
ign-lidar-hd 4.0.1
IGN LiDAR HD Dataset Processing Library for Building LOD Classification
48 versions - Latest release: 3 months ago - 389 downloads last month - 1 stars on GitHub - 1 maintainer
splurge-typer 2025.3.0
Type Inference and Conversion Library for Python
5 versions - Latest release: 4 months ago - 51 downloads last month - 0 stars on GitHub - 1 maintainer
align-utils 1.5.0
Utilities for parsing and processing align-system experiment data
8 versions - Latest release: about 2 months ago - 403 downloads last month - 1 stars on GitHub - 1 maintainer
earthquake-selection 1.1.0
A library for earthquake selection
3 versions - Latest release: 14 days ago - 21 downloads last month - 0 stars on GitHub - 1 maintainer
niamoto 0.7.5
Niamoto is a command-line application and library focused on processing and publishing botanical ...
31 versions - Latest release: 4 months ago - 307 downloads last month - 3 stars on GitHub - 1 maintainer
psrecord-io 1.5.0
Python package to record activity from processes
1 version - Latest release: about 2 months ago - 106 downloads last month - 1 maintainer
light-pipe 0.3.1
A high-level syntax for data pipelines, designed to make pipeline development quick and painless.
5 versions - Latest release: almost 3 years ago - 22 downloads last month - 3 stars on GitHub - 1 maintainer
yt-framework 1.2.0
YTsaurus pipeline framework with utilities and common modules
5 versions - Latest release: 8 days ago - 414 downloads last month - 5 stars on GitHub - 1 maintainer
pylib-autogptkit 0.1.0
Build agent workflows with memory & tools. Autonomous AI agents toolkit. Perfect for AI agents an...
1 version - Latest release: 4 months ago - 21 downloads last month - 1 maintainer
pylib-fetcher 0.1.0
Async HTTP client with caching and metrics. Network utilities for AI agents. Perfect for AI agent...
1 version - Latest release: 4 months ago - 16 downloads last month - 1 maintainer
pylib-codefixer 0.1.0
AI-based code refactoring & linting assistant. Code quality AI tools. Perfect for AI agents and L...
1 version - Latest release: 4 months ago - 17 downloads last month - 1 maintainer
csv-cdc 0.1.3
A high-performance CSV Change Data Capture tool
4 versions - Latest release: 9 months ago - 42 downloads last month - 0 stars on GitHub - 1 maintainer
minispark 0.1.10
一个轻量级的Python库,用于从多种数据源读取数据并在本地进行高效处理,类似于Apache Spark的功能
1 version - Latest release: 7 months ago - 9 downloads last month - 0 stars on GitHub - 1 maintainer
ray-milvus 0.1.1
Ray Data datasource and datasink for Milvus Storage
2 versions - Latest release: about 1 month ago - 51 downloads last month - 2 maintainers
postit 0.1.4
A robust, extensible Python data tagging framework for dynamic processing and intelligent filteri...
7 versions - Latest release: about 1 year ago - 25 downloads last month - 0 stars on GitHub - 1 maintainer
postql 1.0.3
Python wrapper for Postgres
3 versions - Latest release: about 2 years ago - 34 downloads last month - 0 stars on GitHub - 1 maintainer
dataknobs-fsm 0.1.10
Finite State Machine framework with data modes, resource management, and streaming support
11 versions - Latest release: 10 days ago - 735 downloads last month - 1 maintainer
parllel 0.1.0
A high-performance async pipeline processing library for Python
1 version - Latest release: 5 months ago - 19 downloads last month - 1 stars on GitHub - 1 maintainer
aparavi-client-python 1.1.1
Aparavi Pipeline Python Client SDK
2 versions - Latest release: about 2 months ago - 87 downloads last month - 1 maintainer
Top 6.9% on pypi.org
itertable 2.2.0
Iterable API for tabular datasets including CSV, XLSX, XML, & JSON.
4 versions - Latest release: over 2 years ago - 1 dependent package - 8 dependent repositories - 8.3 thousand downloads last month - 53 stars on GitHub - 1 maintainer
scriptcraft-python 1.6.3
Data processing and quality control tools for research workflows
18 versions - Latest release: 6 months ago - 45 downloads last month - 0 stars on GitHub - 1 maintainer
machine-learning-data-pipeline 1.0.3
Pipeline module for parallel real-time data processing for machine learning models development an...
2 versions - Latest release: over 7 years ago - 1 dependent repositories - 33 downloads last month - 22 stars on GitHub - 1 maintainer
norvelang 0.2.2
Multi-Source Data Processing Language
5 versions - Latest release: 5 months ago - 31 downloads last month - 0 stars on GitHub - 1 maintainer
Top 6.2% on pypi.org
cdp-backend 4.1.3
Data storage utilities and processing pipelines used by CDP instances.
108 versions - Latest release: about 2 years ago - 3 dependent packages - 22 dependent repositories - 4.09 thousand downloads last month - 21 stars on GitHub - 1 maintainer
pytemporal 1.4.27
High-performance bitemporal data processing for Python
43 versions - Latest release: 3 months ago - 904 downloads last month - 1 stars on GitHub - 1 maintainer
buelon 1.0.74
A scripting language to simply manage a very large amount of i/o heavy workloads. Such as API cal...
237 versions - Latest release: 5 months ago - 2.76 thousand downloads last month - 0 stars on GitHub - 1 maintainer
tasrif 0.1.0
Tasrif is a python library for processing of wearable data from fitness trackers and wearable hea...
7 versions - Latest release: almost 4 years ago - 1 dependent repositories - 95 downloads last month - 15 stars on GitHub - 1 maintainer
tometo-tomato 0.0.1
A fuzzy join utility using DuckDB
1 version - Latest release: 6 months ago - 11 downloads last month - 0 stars on GitHub - 1 maintainer
yaml-workflow 0.4.0
A lightweight, powerful, and flexible workflow engine that executes tasks defined in YAML configu...
6 versions - Latest release: 11 months ago - 44 downloads last month - 2 stars on GitHub - 1 maintainer
ini2csv 1.0.0
A simple utility that converts and combines a folder of .ini files with identical keys into one c...
1 version - Latest release: over 2 years ago - 12 downloads last month - 1 stars on GitHub - 2 maintainers
nhanes-pytool-api 0.1.1
A tool for programmatic access to NHANES downloadable datasets
2 versions - Latest release: over 2 years ago - 28 downloads last month - 10 stars on GitHub - 1 maintainer
mercury-dataschema 1.1.2
Mercury's DataSchema package allows the automatic recognition and validation of feature types.
2 versions - Latest release: about 1 year ago - 2 dependent packages - 34 downloads last month - 16 stars on GitHub - 1 maintainer
Top 5.5% on pypi.org
nonechucks 0.4.2
nonechucks is a library that provides wrappers for PyTorch's datasets, samplers, and transforms t...
18 versions - Latest release: over 4 years ago - 26 dependent repositories - 332 downloads last month - 377 stars on GitHub - 1 maintainer
Top 8.1% on pypi.org
bonobo-docker 0.6.0
Docker extension for Bonobo
18 versions - Latest release: about 8 years ago - 2 dependent packages - 3 dependent repositories - 576 downloads last month - 13 stars on GitHub - 2 maintainers
dpyp 1.0.0
A pandas convenience wrapper for small-scale data pipelines
1 version - Latest release: almost 2 years ago - 17 downloads last month - 1 stars on GitHub - 1 maintainer
pandas-optimum 0.0.6
Optimised pandas, Best practices in-built
6 versions - Latest release: over 2 years ago - 37 downloads last month - 0 stars on GitHub - 1 maintainer
gmeterpy 0.0.2
Processing gravity measurements with Python
2 versions - Latest release: almost 7 years ago - 1 dependent repositories - 19 downloads last month - 11 stars on GitHub - 1 maintainer
datagpu 0.1.1
Open-source data compiler for AI training datasets
2 versions - Latest release: 4 months ago - 38 downloads last month - 1 maintainer
undatum 1.1.1
A powerful command-line tool for data processing and analysis
20 versions - Latest release: about 1 month ago - 1 dependent repositories - 3.97 thousand downloads last month - 48 stars on GitHub - 1 maintainer