pypi.org "data-processing" keyword
spectroview 26.9.6
SPECTROView: A Tool for Spectroscopic Data Processing and Visualization57 versions - Latest release: 23 days ago - 899 downloads last month - 1 stars on GitHub - 1 maintainer
flagged-csv 0.1.5
Convert XLSX files to CSV with visual formatting preserved as inline flags6 versions - Latest release: 6 months ago - 38 downloads last month - 0 stars on GitHub - 1 maintainer
dataknobs-fsm 0.1.13
Finite State Machine framework with data modes, resource management, and streaming support14 versions - Latest release: about 18 hours ago - 735 downloads last month - 1 maintainer
paged-list 0.1.3
A disk-backed list implementation for handling large datasets efficiently3 versions - Latest release: 7 months ago - 156 downloads last month - 0 stars on GitHub - 1 maintainer
Top 8.1% on pypi.org
18 versions - Latest release: about 8 years ago - 2 dependent packages - 3 dependent repositories - 576 downloads last month - 13 stars on GitHub - 2 maintainers
bonobo-docker 0.6.0
Docker extension for Bonobo18 versions - Latest release: about 8 years ago - 2 dependent packages - 3 dependent repositories - 576 downloads last month - 13 stars on GitHub - 2 maintainers
taskflow-pipeline 0.1.1
A Python library for orchestrating RPA, data processing, and AI task pipelines from YAML/JSON con...2 versions - Latest release: 5 months ago - 26 downloads last month - 1 maintainer
clean-csv-tool 1.0.0
A utility for cleaning and normalizing CSV files1 version - Latest release: 3 months ago - 24 downloads last month - 0 stars on GitHub - 1 maintainer
postit 0.1.4
A robust, extensible Python data tagging framework for dynamic processing and intelligent filteri...7 versions - Latest release: about 1 year ago - 50 downloads last month - 0 stars on GitHub - 1 maintainer
apiverve-metadataextractor 1.1.14
Metadata Extractor is a simple tool for extracting metadata from web pages. It returns the meta t...5 versions - Latest release: about 1 month ago - 72 downloads last month - 0 stars on GitHub - 1 maintainer
datasetpipeline 0.2.1
A data processing and analysis pipeline designed to handle various jobs related to data transform...11 versions - Latest release: 9 months ago - 48 downloads last month - 1 stars on GitHub - 1 maintainer
exc-to-pdf 1.0.0
Excel to PDF converter optimized for Google NotebookLM1 version - Latest release: 5 months ago - 67 downloads last month - 0 stars on GitHub - 1 maintainer
gmeterpy 0.0.2
Processing gravity measurements with Python2 versions - Latest release: almost 7 years ago - 1 dependent repositories - 13 downloads last month - 11 stars on GitHub - 1 maintainer
irdap 1.3.5
IRDAP is a highly-automated end-to-end pipeline to reduce SPHERE-IRDIS polarimetric data17 versions - Latest release: over 2 years ago - 1 dependent repositories - 87 downloads last month - 6 stars on GitHub - 1 maintainer
pylib-textai 0.1.0
Sentiment analysis, embeddings, summarization wrappers. AI text processing. Perfect for AI agents...1 version - Latest release: 5 months ago - 11 downloads last month - 0 stars on GitHub - 1 maintainer
dataform 1.0.0
DataForm: Data processing and transformation tool.1 version - Latest release: about 2 years ago - 23 downloads last month - 1 stars on GitHub - 1 maintainer
openforis-whisp 0.0.1
Whisp (What is in that plot) is an open-source solution which helps to produce relevant forest mo...33 versions - Latest release: about 1 year ago - 615 downloads last month - 30 stars on GitHub - 1 maintainer
nhanes-pytool-api 0.1.1
A tool for programmatic access to NHANES downloadable datasets2 versions - Latest release: over 2 years ago - 30 downloads last month - 10 stars on GitHub - 1 maintainer
kothon 0.3.2
A Python library that brings Kotlin's Sequence class functionalities and the power of functional ...8 versions - Latest release: almost 2 years ago - 47 downloads last month - 4 stars on GitHub - 1 maintainer
Top 9.8% on pypi.org
184 versions - Latest release: 15 days ago - 1 dependent package - 2.35 thousand downloads last month - 451 stars on GitHub - 2 maintainers
haupt 2.13.3
Lineage metadata API, artifacts streams, sandbox, ML-API, and spaces for Polyaxon.184 versions - Latest release: 15 days ago - 1 dependent package - 2.35 thousand downloads last month - 451 stars on GitHub - 2 maintainers
datagpu 0.1.1
Open-source data compiler for AI training datasets2 versions - Latest release: 4 months ago - 38 downloads last month - 1 maintainer
polars-istr 0.1.3 💰
Polars extension for general data science use cases5 versions - Latest release: 7 months ago - 807 downloads last month - 493 stars on GitHub - 1 maintainer
pylib-compare 0.1.0
Deep diff for dicts/lists with patch generation. Data comparison utilities.1 version - Latest release: 4 months ago - 14 downloads last month - 1 maintainer
sortdx 0.1.1
Universal sorting tool for files, data structures, and large datasets2 versions - Latest release: 7 months ago - 16 downloads last month - 0 stars on GitHub - 1 maintainer
pylib-streams 0.1.0
Chainable functional API for list processing (map/filter/reduce). Great for data pipelines. Perfe...1 version - Latest release: 4 months ago - 14 downloads last month - 1 maintainer
ror 0.1.1
Simple pipelining framework in Python3 versions - Latest release: about 2 years ago - 22 downloads last month - 0 stars on GitHub - 1 maintainer
buildify-api 2.0.0
Buildify API is a Python library for real estate data processing.4 versions - Latest release: about 1 year ago - 23 downloads last month - 0 stars on GitHub - 1 maintainer
Top 1.4% on pypi.org
118 versions - Latest release: 6 days ago - 97 dependent packages - 229 dependent repositories - 9.17 million downloads last month - 3,009 stars on GitHub - 3 maintainers
pandera 0.30.1 💰
A light-weight and flexible data validation and testing tool for statistical data objects.118 versions - Latest release: 6 days ago - 97 dependent packages - 229 dependent repositories - 9.17 million downloads last month - 3,009 stars on GitHub - 3 maintainers
unipipe 0.5.4
project_description16 versions - Latest release: over 3 years ago - 56 downloads last month - 3 stars on GitHub - 1 maintainer
aws-s3-controller 0.7.5
A collection of natural language-like utility functions to intuitively and easily control AWS's c...18 versions - Latest release: about 1 year ago - 63 downloads last month - 0 stars on GitHub - 1 maintainer
airow 0.1.0
AI-powered DataFrame processing made simple3 versions - Latest release: 6 months ago - 54 downloads last month - 3 stars on GitHub - 1 maintainer
pylib-docgen 0.1.0
Generate README/API docs using AI summarization. AI-powered documentation. Perfect for AI agents ...1 version - Latest release: 5 months ago - 20 downloads last month - 1 maintainer
pystream-pipeline 0.2.0
Python package to create and manage fast parallelized data processing pipeline for real-time appl...5 versions - Latest release: over 2 years ago - 1 dependent repositories - 35 downloads last month - 2 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-cu13 0.7.0.11
NVIDIA nvimgcodec for CUDA 13.4 versions - Latest release: 4 months ago - 6.72 thousand downloads last month - 143 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-tegra-cu12 0.7.0.11
NVIDIA nvimgcodec tegra for CUDA 12.7 versions - Latest release: 4 months ago - 141 downloads last month - 143 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-cu11 0.6.1.37
NVIDIA nvimgcodec for CUDA 11. Git SHA:7 versions - Latest release: 5 months ago - 2 dependent packages - 9.72 thousand downloads last month - 143 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-tegra-cu13 0.6.0.32
NVIDIA nvimgcodec tegra for CUDA 13. Git SHA:2 versions - Latest release: 7 months ago - 57 downloads last month - 143 stars on GitHub - 1 maintainer
nvidia-nvimgcodec-cu12 0.7.0.11
NVIDIA nvimgcodec for CUDA 12.8 versions - Latest release: 4 months ago - 3 dependent packages - 86 thousand downloads last month - 143 stars on GitHub - 1 maintainer
pyvaspflow 0.0.3
Vasp Calculation1 version - Latest release: about 6 years ago - 1 dependent repositories - 5 downloads last month - 22 stars on GitHub - 1 maintainer
pds-benchmark 0.1.0
A minimal, Polars-focused data processing benchmark suite1 version - Latest release: 3 months ago - 17 downloads last month - 1 maintainer
pipevine 0.1.5
A high-performance async pipeline processing library for Python6 versions - Latest release: 6 months ago - 15 downloads last month - 1 stars on GitHub - 1 maintainer
iflow-mcp_datamaster-mcp 1.0.4
DataMaster MCP - AI-powered data analysis tool with MCP protocol support1 version - Latest release: 4 months ago - 1 maintainer
Top 5.8% on pypi.org
147 versions - Latest release: 4 months ago - 90 dependent repositories - 440 downloads last month - 348 stars on GitHub - 1 maintainer
raydp-nightly 2025.12.4.dev0
RayDP: Distributed Data Processing on Ray147 versions - Latest release: 4 months ago - 90 dependent repositories - 440 downloads last month - 348 stars on GitHub - 1 maintainer
Top 5.1% on pypi.org
13 versions - Latest release: over 3 years ago - 1 dependent package - 10 dependent repositories - 1.13 thousand downloads last month - 314 stars on GitHub - 1 maintainer
padasip 1.2.2
Python Adaptive Signal Processing13 versions - Latest release: over 3 years ago - 1 dependent package - 10 dependent repositories - 1.13 thousand downloads last month - 314 stars on GitHub - 1 maintainer
mathbox 0.0.8
A math toolbox.6 versions - Latest release: over 3 years ago - 1 dependent repositories - 17 downloads last month - 5 stars on GitHub - 1 maintainer
graphbook 0.13.3
The AI-driven data pipeline and workflow framework for data scientists and machine learning engin...30 versions - Latest release: 12 months ago - 748 downloads last month - 47 stars on GitHub - 1 maintainer
graphbook_huggingface 0.0.6
Graphbook Hugging Face Plugin for no-code Hugging Face AI pipelines5 versions - Latest release: 12 months ago - 25 downloads last month - 47 stars on GitHub - 1 maintainer
pylib-daterange 0.1.0
Generate date ranges, sequences, calendars. Date range utilities.1 version - Latest release: 4 months ago - 14 downloads last month - 1 maintainer
sparklanes 0.2.4
A lightweight framework to build and execute data processing pipelines in pyspark (Apache Spark's...5 versions - Latest release: about 7 years ago - 1 dependent repositories - 25 downloads last month - 16 stars on GitHub - 1 maintainer
noob 1000.0.1
A graph processing library for processing graphs4 versions - Latest release: 8 days ago - 253 downloads last month - 2 maintainers
pydatacore 1.1.2
A data library for handling temporal, frequency signals, and data pools.12 versions - Latest release: over 1 year ago - 43 downloads last month - 1 stars on GitHub - 1 maintainer
ssak 0.0.2
Toolbox for Speech Processing.2 versions - Latest release: 5 months ago - 10 downloads last month - 5 stars on GitHub - 1 maintainer
dynamic-data-reduction 2025.12.1
Dynamic MapReduce framework for data processing11 versions - Latest release: 4 months ago - 111 downloads last month - 0 stars on GitHub - 1 maintainer
pypabhiveagent 1.0.3
A Python package for querying Hive data and processing with AIGC applications4 versions - Latest release: 4 months ago - 48 downloads last month - 1 maintainer
redis-message-queue 0.10.1
Python message queuing with Redis and message deduplication14 versions - Latest release: 12 months ago - 210 downloads last month - 6 stars on GitHub - 1 maintainer
forklift-etl 0.1.4
A powerful data processing and schema generation tool with PyArrow streaming, validation, and S3 ...3 versions - Latest release: 5 months ago - 33 downloads last month - 0 stars on GitHub - 1 maintainer
dtflow 0.5.13
A flexible data transformation tool for ML training formats (SFT, RLHF, Pretrain)24 versions - Latest release: 12 days ago - 452 downloads last month - 1 maintainer
abracudabra 0.1.3
Convert Tensors, Arrays and DataFrames Between CPU and CUDA4 versions - Latest release: about 1 year ago - 49 downloads last month - 1 stars on gitlab.cern.ch - 1 maintainer
pathway 1.3.1
Pathway is a data processing framework which takes care of streaming data updates for you.92 versions - Latest release: about 16 years ago - 1 dependent package - 1 dependent repositories - 13.7 thousand downloads last month - 49,408 stars on GitHub - 4 maintainers
dft-pipeline 0.3.24
Data Flow Tools - flexible ETL pipeline framework36 versions - Latest release: 9 months ago - 94 downloads last month - 1 maintainer
Top 5.2% on pypi.org
37 versions - Latest release: almost 7 years ago - 35 dependent repositories - 20.8 thousand downloads last month - 1,592 stars on GitHub - 2 maintainers
bonobo 0.6.4
Bonobo, a simple, modern and atomic extract-transform-load toolkit for python 3.5+.37 versions - Latest release: almost 7 years ago - 35 dependent repositories - 20.8 thousand downloads last month - 1,592 stars on GitHub - 2 maintainers
gzeus 0.1.2
Polars IO plugin for reading compressed CSV/TSV files in a streaming fashion3 versions - Latest release: 5 months ago - 103 downloads last month - 13 stars on GitHub - 1 maintainer
evaluation-service-base 0.1.4
A comprehensive framework for building evaluation services with progress tracking, task managemen...5 versions - Latest release: 7 months ago - 19 downloads last month - 1 maintainer
pylib-serializer 0.1.0
Safe JSON/YAML serialization with circular-reference handling. Data processing utility.1 version - Latest release: 4 months ago - 15 downloads last month - 1 maintainer
Top 6.6% on pypi.org
42 versions - Latest release: 8 months ago - 2 dependent packages - 3 dependent repositories - 1.51 thousand downloads last month - 122 stars on GitHub - 3 maintainers
libertem 0.15.2
Open pixelated STEM framework42 versions - Latest release: 8 months ago - 2 dependent packages - 3 dependent repositories - 1.51 thousand downloads last month - 122 stars on GitHub - 3 maintainers
xara-astro 1.0.1
A package for eXtreme Angular Resolution Astronomy2 versions - Latest release: 3 months ago - 47 downloads last month - 11 stars on GitHub - 1 maintainer
datamaster-mcp-enhanced 1.0.0
DataMaster MCP Enhanced - AI-powered data analysis tool with MCP protocol support1 version - Latest release: 6 months ago - 17 downloads last month - 9 stars on GitHub - 1 maintainer
cotk 0.1.0
Conversational Toolkits3 versions - Latest release: over 5 years ago - 2 dependent repositories - 38 downloads last month - 128 stars on GitHub - 1 maintainer
spiderkit 0.1.9
一个面向爬虫与数据处理场景的 Python 工具包, 覆盖加密解密, 数据存储, 异步下载, 字体解析和哈希工具10 versions - Latest release: 2 months ago - 133 downloads last month - 1 maintainer
Top 3.8% on pypi.org
69 versions - Latest release: over 3 years ago - 1 dependent package - 34 dependent repositories - 45.8 thousand downloads last month - 271 stars on GitHub - 2 maintainers
pysparkling 0.6.2
Pure Python implementation of the Spark RDD interface.69 versions - Latest release: over 3 years ago - 1 dependent package - 34 dependent repositories - 45.8 thousand downloads last month - 271 stars on GitHub - 2 maintainers
lineagemd 0.0.0
Lineage metadata for ML/AI/Data.1 version - Latest release: over 3 years ago - 10 downloads last month - 452 stars on GitHub - 2 maintainers
tikara 0.1.6
The metadata and text content extractor for almost every file type.6 versions - Latest release: about 1 year ago - 244 downloads last month - 4 stars on GitHub - 1 maintainer
climate-zarr 0.1.0
Interactive CLI toolkit for processing climate data with NetCDF to Zarr conversion and county-lev...1 version - Latest release: 9 months ago - 19 downloads last month - 0 stars on GitHub - 1 maintainer
datamaster-mcp 1.1.0
DataMaster MCP - AI-powered data analysis tool with MCP protocol support10 versions - Latest release: 6 months ago - 58 downloads last month - 8 stars on GitHub - 1 maintainer
prosto 0.6.0
Data processing toolkit radically changing the way data is processed5 versions - Latest release: over 4 years ago - 1 dependent repositories - 28 downloads last month - 91 stars on GitHub - 1 maintainer
Top 8.1% on pypi.org
46 versions - Latest release: 10 days ago - 2.24 thousand downloads last month - 7 stars on GitHub - 1 maintainer
flowquery 1.0.46
A declarative query language for data processing pipelines46 versions - Latest release: 10 days ago - 2.24 thousand downloads last month - 7 stars on GitHub - 1 maintainer
carp-analytics-python 0.1.0
A high-performance Python library for processing and analysing data from CARP (Copenhagen Researc...1 version - Latest release: 4 months ago - 23 downloads last month - 1 maintainer
python-pyper 0.4.4
Concurrent Python made simple11 versions - Latest release: about 1 year ago - 173 downloads last month - 1,503 stars on GitHub - 1 maintainer
oect-infra 3.3.0
OECT (Organic Electrochemical Transistor) data processing infrastructure for experiment managemen...21 versions - Latest release: 4 months ago - 156 downloads last month - 0 stars on GitHub - 1 maintainer
electiongraphs 0.3.4
Create graphs for displaying the result of a election based on a csv-inputfile.4 versions - Latest release: over 2 years ago - 18 downloads last month - 0 stars on GitHub - 1 maintainer
dalla-data-p 0.1.4
Unified Arabic data processing pipeline with deduplication, stemming, quality checking, and reada...1 version - Latest release: 4 months ago
reki 2025.7.2
A data preparation tool for CEMC/CMA.6 versions - Latest release: 7 months ago - 1 dependent repositories - 27 downloads last month - 18 stars on GitHub - 1 maintainer
pylib-optimize 0.1.0
Simple optimization solvers (gradient descent, LP). Machine learning utilities.1 version - Latest release: 4 months ago - 15 downloads last month - 1 maintainer
sheetwise 2.8.0 💰
A Python package for encoding spreadsheets for Large Language Models, implementing the Spreadshee...14 versions - Latest release: 3 months ago - 1.29 thousand downloads last month - 25 stars on GitHub - 1 maintainer
cwepr 0.5.1
Package for handling cw-EPR data.10 versions - Latest release: about 2 years ago - 1 dependent repositories - 60 downloads last month - 2 stars on GitHub - 2 maintainers
sofine 0.2.4
Lightweight framework for creating data-collection plugins and chaining together calls to them, f...10 versions - Latest release: over 11 years ago - 2 dependent repositories - 27 downloads last month - 7 stars on GitHub - 1 maintainer
schemarrow 0.1.1a0 💰
A library for switching pandas backend to pyarrow2 versions - Latest release: about 2 years ago - 15 downloads last month - 9 stars on GitHub - 1 maintainer
snow-workspace-extractor 0.1.2
Snowflake Wokspace Extractor is a Python package that provides a simple way to extract and analyz...3 versions - Latest release: 4 months ago - 27 downloads last month - 4 stars on GitHub - 1 maintainer
pyvectis 0.1.0
Async component pipeline framework for data processing1 version - Latest release: 10 days ago - 1 maintainer
perke 0.4.4
A keyphrase extractor for Persian13 versions - Latest release: over 2 years ago - 1 dependent repositories - 60 downloads last month - 72 stars on GitHub - 1 maintainer
nekupload 1.1.1
Upload and validation pipeline for Nektar++ datasets to an online repository7 versions - Latest release: 9 months ago - 16 downloads last month - 2 maintainers
antchain 0.0.7
一个函数式编程风格的数据处理管道库,支持链式调用和多种数据处理操作7 versions - Latest release: 5 months ago - 85 downloads last month - 1 maintainer
fondant 1.0.0
Fondant - Large-scale data processing made easy and reusable45 versions - Latest release: about 2 years ago - 1 dependent repositories - 312 downloads last month - 354 stars on GitHub - 2 maintainers
vrl-python 0.1.0
Python bindings for Vector Remap Language (VRL)1 version - Latest release: 6 months ago - 59 downloads last month - 1 maintainer
automatic-station 2.0.1
自动站数据处理工具 - 用于自动站数据的处理和建模1 version - Latest release: 4 months ago - 27 downloads last month - 1 maintainer
thepipe 1.3.8
A lightweight, general purpose pipeline framework.15 versions - Latest release: over 3 years ago - 1 dependent package - 2 dependent repositories - 461 downloads last month - 14 stars on GitHub - 2 maintainers
bonobo-selenium 0.1.1
Bonobo Selenium Extension2 versions - Latest release: over 8 years ago - 51 downloads last month - 4 stars on GitHub - 2 maintainers
open-dataflow-adp 1.1.21
Modern Data Centric AI system for Large Language Models32 versions - Latest release: 2 months ago - 390 downloads last month - 1,410 stars on GitHub - 1 maintainer
Top 7.7% on pypi.org
25 versions - Latest release: 3 months ago - 8 dependent repositories - 846 downloads last month - 223 stars on GitHub - 5 maintainers
heat 1.7.0
A framework for high-performance data analytics and machine learning.25 versions - Latest release: 3 months ago - 8 dependent repositories - 846 downloads last month - 223 stars on GitHub - 5 maintainers
smartpipeline 0.7.3
A framework for fast developing scalable data pipelines following a simple design pattern11 versions - Latest release: over 2 years ago - 1 dependent repositories - 89 downloads last month - 26 stars on GitHub - 1 maintainer
Related Keywords
machine-learning
89
python
87
data-science
65
ai
49
pipeline
37
ml
37
deep-learning
36
pytorch
33
nlp
33
etl
32
data
30
pandas
30
utilities
29
data-analysis
27
image-processing
26
csv
25
json
23
gpu
21
fast-data-pipeline
19
llm
19
excel
18
workflow
17
audio-processing
15
data-engineering
15
image-augmentation
14
streaming
14
gpu-tensorflow
14
automation
14
mxnet
14
neural-network
14
data-augmentation
14
data-cleaning
14
paddle
14
analytics
14
async
14
data-visualization
13
data-pipeline
12
database
11
pipelines
11
polars
10
python3
10
dataset
9
validation
9
numpy
9
natural-language-processing
9
parquet
9
data-processing-pipelines
8
api
8
cli
8
deduplication
8
spark
8
research
7
big-data
7
distributed
7
data-validation
7
kubernetes
7
data-preparation
7
computer-vision
7
data-analytics
7
real-time
7
performance
7
visualization
7
rust
7
data-preprocessing
7
mcp
6
multiprocessing
6
cuda
6
dataframe
6
tensorflow
6
stream-processing
6
data-pipelines
6
data-transformation
6
large-language-models
6
nvidia
5
converter
5
data-quality
5
duckdb
5
dali
5
cpp
5
sqlite
5
kafka
5
sql
5
mlops
5
parallel
5
compression
5
ray
5
text-processing
5
preprocessing
5
framework
5
openpyxl
5
graph
5
business-intelligence
5
python-library
5
data-management
5
data science
5
machine learning
5
postgresql
5
s3
4
jupyter
4
cloud
4