An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "data-processing" keyword

spectroview 26.9.6
SPECTROView: A Tool for Spectroscopic Data Processing and Visualization
57 versions - Latest release: 23 days ago - 899 downloads last month - 1 stars on GitHub - 1 maintainer
flagged-csv 0.1.5
Convert XLSX files to CSV with visual formatting preserved as inline flags
6 versions - Latest release: 6 months ago - 38 downloads last month - 0 stars on GitHub - 1 maintainer
dataknobs-fsm 0.1.13
Finite State Machine framework with data modes, resource management, and streaming support
14 versions - Latest release: about 18 hours ago - 735 downloads last month - 1 maintainer
paged-list 0.1.3
A disk-backed list implementation for handling large datasets efficiently
3 versions - Latest release: 7 months ago - 156 downloads last month - 0 stars on GitHub - 1 maintainer
Top 8.1% on pypi.org
bonobo-docker 0.6.0
Docker extension for Bonobo
18 versions - Latest release: about 8 years ago - 2 dependent packages - 3 dependent repositories - 576 downloads last month - 13 stars on GitHub - 2 maintainers
taskflow-pipeline 0.1.1
A Python library for orchestrating RPA, data processing, and AI task pipelines from YAML/JSON con...
2 versions - Latest release: 5 months ago - 26 downloads last month - 1 maintainer
clean-csv-tool 1.0.0
A utility for cleaning and normalizing CSV files
1 version - Latest release: 3 months ago - 24 downloads last month - 0 stars on GitHub - 1 maintainer
postit 0.1.4
A robust, extensible Python data tagging framework for dynamic processing and intelligent filteri...
7 versions - Latest release: about 1 year ago - 50 downloads last month - 0 stars on GitHub - 1 maintainer
apiverve-metadataextractor 1.1.14
Metadata Extractor is a simple tool for extracting metadata from web pages. It returns the meta t...
5 versions - Latest release: about 1 month ago - 72 downloads last month - 0 stars on GitHub - 1 maintainer
datasetpipeline 0.2.1
A data processing and analysis pipeline designed to handle various jobs related to data transform...
11 versions - Latest release: 9 months ago - 48 downloads last month - 1 stars on GitHub - 1 maintainer
exc-to-pdf 1.0.0
Excel to PDF converter optimized for Google NotebookLM
1 version - Latest release: 5 months ago - 67 downloads last month - 0 stars on GitHub - 1 maintainer
gmeterpy 0.0.2
Processing gravity measurements with Python
2 versions - Latest release: almost 7 years ago - 1 dependent repositories - 13 downloads last month - 11 stars on GitHub - 1 maintainer
irdap 1.3.5
IRDAP is a highly-automated end-to-end pipeline to reduce SPHERE-IRDIS polarimetric data
17 versions - Latest release: over 2 years ago - 1 dependent repositories - 87 downloads last month - 6 stars on GitHub - 1 maintainer
pylib-textai 0.1.0
Sentiment analysis, embeddings, summarization wrappers. AI text processing. Perfect for AI agents...
1 version - Latest release: 5 months ago - 11 downloads last month - 0 stars on GitHub - 1 maintainer
dataform 1.0.0
DataForm: Data processing and transformation tool.
1 version - Latest release: about 2 years ago - 23 downloads last month - 1 stars on GitHub - 1 maintainer
openforis-whisp 0.0.1
Whisp (What is in that plot) is an open-source solution which helps to produce relevant forest mo...
33 versions - Latest release: about 1 year ago - 615 downloads last month - 30 stars on GitHub - 1 maintainer
nhanes-pytool-api 0.1.1
A tool for programmatic access to NHANES downloadable datasets
2 versions - Latest release: over 2 years ago - 30 downloads last month - 10 stars on GitHub - 1 maintainer
kothon 0.3.2
A Python library that brings Kotlin's Sequence class functionalities and the power of functional ...
8 versions - Latest release: almost 2 years ago - 47 downloads last month - 4 stars on GitHub - 1 maintainer
Top 9.8% on pypi.org
haupt 2.13.3
Lineage metadata API, artifacts streams, sandbox, ML-API, and spaces for Polyaxon.
184 versions - Latest release: 15 days ago - 1 dependent package - 2.35 thousand downloads last month - 451 stars on GitHub - 2 maintainers
datagpu 0.1.1
Open-source data compiler for AI training datasets
2 versions - Latest release: 4 months ago - 38 downloads last month - 1 maintainer
polars-istr 0.1.3 💰
Polars extension for general data science use cases
5 versions - Latest release: 7 months ago - 807 downloads last month - 493 stars on GitHub - 1 maintainer
pylib-compare 0.1.0
Deep diff for dicts/lists with patch generation. Data comparison utilities.
1 version - Latest release: 4 months ago - 14 downloads last month - 1 maintainer
sortdx 0.1.1
Universal sorting tool for files, data structures, and large datasets
2 versions - Latest release: 7 months ago - 16 downloads last month - 0 stars on GitHub - 1 maintainer
pylib-streams 0.1.0
Chainable functional API for list processing (map/filter/reduce). Great for data pipelines. Perfe...
1 version - Latest release: 4 months ago - 14 downloads last month - 1 maintainer
ror 0.1.1
Simple pipelining framework in Python
3 versions - Latest release: about 2 years ago - 22 downloads last month - 0 stars on GitHub - 1 maintainer
buildify-api 2.0.0
Buildify API is a Python library for real estate data processing.
4 versions - Latest release: about 1 year ago - 23 downloads last month - 0 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
pandera 0.30.1 💰
A light-weight and flexible data validation and testing tool for statistical data objects.
118 versions - Latest release: 6 days ago - 97 dependent packages - 229 dependent repositories - 9.17 million downloads last month - 3,009 stars on GitHub - 3 maintainers
unipipe 0.5.4
project_description
16 versions - Latest release: over 3 years ago - 56 downloads last month - 3 stars on GitHub - 1 maintainer
aws-s3-controller 0.7.5
A collection of natural language-like utility functions to intuitively and easily control AWS's c...
18 versions - Latest release: about 1 year ago - 63 downloads last month - 0 stars on GitHub - 1 maintainer
airow 0.1.0
AI-powered DataFrame processing made simple
3 versions - Latest release: 6 months ago - 54 downloads last month - 3 stars on GitHub - 1 maintainer
pylib-docgen 0.1.0
Generate README/API docs using AI summarization. AI-powered documentation. Perfect for AI agents ...
1 version - Latest release: 5 months ago - 20 downloads last month - 1 maintainer
pystream-pipeline 0.2.0
Python package to create and manage fast parallelized data processing pipeline for real-time appl...
5 versions - Latest release: over 2 years ago - 1 dependent repositories - 35 downloads last month - 2 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-cu13 0.7.0.11
NVIDIA nvimgcodec for CUDA 13.
4 versions - Latest release: 4 months ago - 6.72 thousand downloads last month - 143 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-tegra-cu12 0.7.0.11
NVIDIA nvimgcodec tegra for CUDA 12.
7 versions - Latest release: 4 months ago - 141 downloads last month - 143 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-cu11 0.6.1.37
NVIDIA nvimgcodec for CUDA 11. Git SHA:
7 versions - Latest release: 5 months ago - 2 dependent packages - 9.72 thousand downloads last month - 143 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-tegra-cu13 0.6.0.32
NVIDIA nvimgcodec tegra for CUDA 13. Git SHA:
2 versions - Latest release: 7 months ago - 57 downloads last month - 143 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-cu12 0.7.0.11
NVIDIA nvimgcodec for CUDA 12.
8 versions - Latest release: 4 months ago - 3 dependent packages - 86 thousand downloads last month - 143 stars on GitHub - 1 maintainer
pyvaspflow 0.0.3
Vasp Calculation
1 version - Latest release: about 6 years ago - 1 dependent repositories - 5 downloads last month - 22 stars on GitHub - 1 maintainer
pds-benchmark 0.1.0
A minimal, Polars-focused data processing benchmark suite
1 version - Latest release: 3 months ago - 17 downloads last month - 1 maintainer
pipevine 0.1.5
A high-performance async pipeline processing library for Python
6 versions - Latest release: 6 months ago - 15 downloads last month - 1 stars on GitHub - 1 maintainer
iflow-mcp_datamaster-mcp 1.0.4
DataMaster MCP - AI-powered data analysis tool with MCP protocol support
1 version - Latest release: 4 months ago - 1 maintainer
Top 5.8% on pypi.org
raydp-nightly 2025.12.4.dev0
RayDP: Distributed Data Processing on Ray
147 versions - Latest release: 4 months ago - 90 dependent repositories - 440 downloads last month - 348 stars on GitHub - 1 maintainer
Top 5.1% on pypi.org
padasip 1.2.2
Python Adaptive Signal Processing
13 versions - Latest release: over 3 years ago - 1 dependent package - 10 dependent repositories - 1.13 thousand downloads last month - 314 stars on GitHub - 1 maintainer
mathbox 0.0.8
A math toolbox.
6 versions - Latest release: over 3 years ago - 1 dependent repositories - 17 downloads last month - 5 stars on GitHub - 1 maintainer
graphbook 0.13.3
The AI-driven data pipeline and workflow framework for data scientists and machine learning engin...
30 versions - Latest release: 12 months ago - 748 downloads last month - 47 stars on GitHub - 1 maintainer
graphbook_huggingface 0.0.6
Graphbook Hugging Face Plugin for no-code Hugging Face AI pipelines
5 versions - Latest release: 12 months ago - 25 downloads last month - 47 stars on GitHub - 1 maintainer
pylib-daterange 0.1.0
Generate date ranges, sequences, calendars. Date range utilities.
1 version - Latest release: 4 months ago - 14 downloads last month - 1 maintainer
sparklanes 0.2.4
A lightweight framework to build and execute data processing pipelines in pyspark (Apache Spark's...
5 versions - Latest release: about 7 years ago - 1 dependent repositories - 25 downloads last month - 16 stars on GitHub - 1 maintainer
noob 1000.0.1
A graph processing library for processing graphs
4 versions - Latest release: 8 days ago - 253 downloads last month - 2 maintainers
pydatacore 1.1.2
A data library for handling temporal, frequency signals, and data pools.
12 versions - Latest release: over 1 year ago - 43 downloads last month - 1 stars on GitHub - 1 maintainer
ssak 0.0.2
Toolbox for Speech Processing.
2 versions - Latest release: 5 months ago - 10 downloads last month - 5 stars on GitHub - 1 maintainer
dynamic-data-reduction 2025.12.1
Dynamic MapReduce framework for data processing
11 versions - Latest release: 4 months ago - 111 downloads last month - 0 stars on GitHub - 1 maintainer
pypabhiveagent 1.0.3
A Python package for querying Hive data and processing with AIGC applications
4 versions - Latest release: 4 months ago - 48 downloads last month - 1 maintainer
redis-message-queue 0.10.1
Python message queuing with Redis and message deduplication
14 versions - Latest release: 12 months ago - 210 downloads last month - 6 stars on GitHub - 1 maintainer
forklift-etl 0.1.4
A powerful data processing and schema generation tool with PyArrow streaming, validation, and S3 ...
3 versions - Latest release: 5 months ago - 33 downloads last month - 0 stars on GitHub - 1 maintainer
dtflow 0.5.13
A flexible data transformation tool for ML training formats (SFT, RLHF, Pretrain)
24 versions - Latest release: 12 days ago - 452 downloads last month - 1 maintainer
abracudabra 0.1.3
Convert Tensors, Arrays and DataFrames Between CPU and CUDA
4 versions - Latest release: about 1 year ago - 49 downloads last month - 1 stars on gitlab.cern.ch - 1 maintainer
pathway 1.3.1
Pathway is a data processing framework which takes care of streaming data updates for you.
92 versions - Latest release: about 16 years ago - 1 dependent package - 1 dependent repositories - 13.7 thousand downloads last month - 49,408 stars on GitHub - 4 maintainers
dft-pipeline 0.3.24
Data Flow Tools - flexible ETL pipeline framework
36 versions - Latest release: 9 months ago - 94 downloads last month - 1 maintainer
Top 5.2% on pypi.org
bonobo 0.6.4
Bonobo, a simple, modern and atomic extract-transform-load toolkit for python 3.5+.
37 versions - Latest release: almost 7 years ago - 35 dependent repositories - 20.8 thousand downloads last month - 1,592 stars on GitHub - 2 maintainers
gzeus 0.1.2
Polars IO plugin for reading compressed CSV/TSV files in a streaming fashion
3 versions - Latest release: 5 months ago - 103 downloads last month - 13 stars on GitHub - 1 maintainer
evaluation-service-base 0.1.4
A comprehensive framework for building evaluation services with progress tracking, task managemen...
5 versions - Latest release: 7 months ago - 19 downloads last month - 1 maintainer
pylib-serializer 0.1.0
Safe JSON/YAML serialization with circular-reference handling. Data processing utility.
1 version - Latest release: 4 months ago - 15 downloads last month - 1 maintainer
Top 6.6% on pypi.org
libertem 0.15.2
Open pixelated STEM framework
42 versions - Latest release: 8 months ago - 2 dependent packages - 3 dependent repositories - 1.51 thousand downloads last month - 122 stars on GitHub - 3 maintainers
xara-astro 1.0.1
A package for eXtreme Angular Resolution Astronomy
2 versions - Latest release: 3 months ago - 47 downloads last month - 11 stars on GitHub - 1 maintainer
datamaster-mcp-enhanced 1.0.0
DataMaster MCP Enhanced - AI-powered data analysis tool with MCP protocol support
1 version - Latest release: 6 months ago - 17 downloads last month - 9 stars on GitHub - 1 maintainer
mylib-dkj 0.0.2
一个用于大气数据处理的 Python 示例库
2 versions - Latest release: 6 days ago - 1 maintainer
cotk 0.1.0
Conversational Toolkits
3 versions - Latest release: over 5 years ago - 2 dependent repositories - 38 downloads last month - 128 stars on GitHub - 1 maintainer
spiderkit 0.1.9
一个面向爬虫与数据处理场景的 Python 工具包, 覆盖加密解密, 数据存储, 异步下载, 字体解析和哈希工具
10 versions - Latest release: 2 months ago - 133 downloads last month - 1 maintainer
Top 3.8% on pypi.org
pysparkling 0.6.2
Pure Python implementation of the Spark RDD interface.
69 versions - Latest release: over 3 years ago - 1 dependent package - 34 dependent repositories - 45.8 thousand downloads last month - 271 stars on GitHub - 2 maintainers
lineagemd 0.0.0
Lineage metadata for ML/AI/Data.
1 version - Latest release: over 3 years ago - 10 downloads last month - 452 stars on GitHub - 2 maintainers
tikara 0.1.6
The metadata and text content extractor for almost every file type.
6 versions - Latest release: about 1 year ago - 244 downloads last month - 4 stars on GitHub - 1 maintainer
climate-zarr 0.1.0
Interactive CLI toolkit for processing climate data with NetCDF to Zarr conversion and county-lev...
1 version - Latest release: 9 months ago - 19 downloads last month - 0 stars on GitHub - 1 maintainer
datamaster-mcp 1.1.0
DataMaster MCP - AI-powered data analysis tool with MCP protocol support
10 versions - Latest release: 6 months ago - 58 downloads last month - 8 stars on GitHub - 1 maintainer
prosto 0.6.0
Data processing toolkit radically changing the way data is processed
5 versions - Latest release: over 4 years ago - 1 dependent repositories - 28 downloads last month - 91 stars on GitHub - 1 maintainer
Top 8.1% on pypi.org
flowquery 1.0.46
A declarative query language for data processing pipelines
46 versions - Latest release: 10 days ago - 2.24 thousand downloads last month - 7 stars on GitHub - 1 maintainer
carp-analytics-python 0.1.0
A high-performance Python library for processing and analysing data from CARP (Copenhagen Researc...
1 version - Latest release: 4 months ago - 23 downloads last month - 1 maintainer
python-pyper 0.4.4
Concurrent Python made simple
11 versions - Latest release: about 1 year ago - 173 downloads last month - 1,503 stars on GitHub - 1 maintainer
oect-infra 3.3.0
OECT (Organic Electrochemical Transistor) data processing infrastructure for experiment managemen...
21 versions - Latest release: 4 months ago - 156 downloads last month - 0 stars on GitHub - 1 maintainer
electiongraphs 0.3.4
Create graphs for displaying the result of a election based on a csv-inputfile.
4 versions - Latest release: over 2 years ago - 18 downloads last month - 0 stars on GitHub - 1 maintainer
dalla-data-p 0.1.4
Unified Arabic data processing pipeline with deduplication, stemming, quality checking, and reada...
1 version - Latest release: 4 months ago
reki 2025.7.2
A data preparation tool for CEMC/CMA.
6 versions - Latest release: 7 months ago - 1 dependent repositories - 27 downloads last month - 18 stars on GitHub - 1 maintainer
pylib-optimize 0.1.0
Simple optimization solvers (gradient descent, LP). Machine learning utilities.
1 version - Latest release: 4 months ago - 15 downloads last month - 1 maintainer
sheetwise 2.8.0 💰
A Python package for encoding spreadsheets for Large Language Models, implementing the Spreadshee...
14 versions - Latest release: 3 months ago - 1.29 thousand downloads last month - 25 stars on GitHub - 1 maintainer
cwepr 0.5.1
Package for handling cw-EPR data.
10 versions - Latest release: about 2 years ago - 1 dependent repositories - 60 downloads last month - 2 stars on GitHub - 2 maintainers
sofine 0.2.4
Lightweight framework for creating data-collection plugins and chaining together calls to them, f...
10 versions - Latest release: over 11 years ago - 2 dependent repositories - 27 downloads last month - 7 stars on GitHub - 1 maintainer
schemarrow 0.1.1a0 💰
A library for switching pandas backend to pyarrow
2 versions - Latest release: about 2 years ago - 15 downloads last month - 9 stars on GitHub - 1 maintainer
snow-workspace-extractor 0.1.2
Snowflake Wokspace Extractor is a Python package that provides a simple way to extract and analyz...
3 versions - Latest release: 4 months ago - 27 downloads last month - 4 stars on GitHub - 1 maintainer
pyvectis 0.1.0
Async component pipeline framework for data processing
1 version - Latest release: 10 days ago - 1 maintainer
perke 0.4.4
A keyphrase extractor for Persian
13 versions - Latest release: over 2 years ago - 1 dependent repositories - 60 downloads last month - 72 stars on GitHub - 1 maintainer
nekupload 1.1.1
Upload and validation pipeline for Nektar++ datasets to an online repository
7 versions - Latest release: 9 months ago - 16 downloads last month - 2 maintainers
antchain 0.0.7
一个函数式编程风格的数据处理管道库,支持链式调用和多种数据处理操作
7 versions - Latest release: 5 months ago - 85 downloads last month - 1 maintainer
fondant 1.0.0
Fondant - Large-scale data processing made easy and reusable
45 versions - Latest release: about 2 years ago - 1 dependent repositories - 312 downloads last month - 354 stars on GitHub - 2 maintainers
vrl-python 0.1.0
Python bindings for Vector Remap Language (VRL)
1 version - Latest release: 6 months ago - 59 downloads last month - 1 maintainer
automatic-station 2.0.1
自动站数据处理工具 - 用于自动站数据的处理和建模
1 version - Latest release: 4 months ago - 27 downloads last month - 1 maintainer
thepipe 1.3.8
A lightweight, general purpose pipeline framework.
15 versions - Latest release: over 3 years ago - 1 dependent package - 2 dependent repositories - 461 downloads last month - 14 stars on GitHub - 2 maintainers
bonobo-selenium 0.1.1
Bonobo Selenium Extension
2 versions - Latest release: over 8 years ago - 51 downloads last month - 4 stars on GitHub - 2 maintainers
open-dataflow-adp 1.1.21
Modern Data Centric AI system for Large Language Models
32 versions - Latest release: 2 months ago - 390 downloads last month - 1,410 stars on GitHub - 1 maintainer
Top 7.7% on pypi.org
heat 1.7.0
A framework for high-performance data analytics and machine learning.
25 versions - Latest release: 3 months ago - 8 dependent repositories - 846 downloads last month - 223 stars on GitHub - 5 maintainers
smartpipeline 0.7.3
A framework for fast developing scalable data pipelines following a simple design pattern
11 versions - Latest release: over 2 years ago - 1 dependent repositories - 89 downloads last month - 26 stars on GitHub - 1 maintainer