An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "evaluation-metrics" keyword

View the packages on the pypi.org package registry that are tagged with the "evaluation-metrics" keyword.

ir-metrics 0.1.6
The most common information retrieval (IR) metrics
15 versions - Latest release: about 4 years ago - 5 dependent repositories - 5.04 thousand downloads last month - 5 stars on GitHub - 1 maintainer
repsys-framework 0.4.1
Framework for developing and analyzing recommender systems.
21 versions - Latest release: over 1 year ago - 2 dependent repositories - 745 downloads last month - 36 stars on GitHub - 1 maintainer
Top 9.3% on pypi.org
deepeval 2.7.5
The LLM Evaluation Framework
384 versions - Latest release: about 24 hours ago - 7 dependent packages - 1 dependent repositories - 456 thousand downloads last month - 5,915 stars on GitHub - 1 maintainer
lighteval 0.8.1
A lightweight and configurable evaluation package
13 versions - Latest release: 26 days ago - 6.06 thousand downloads last month - 1,429 stars on GitHub - 3 maintainers
daze 0.1.1
Better multi-class confusion matrix plots for Scikit-Learn, incorporating per-class and overall e...
3 versions - Latest release: about 4 years ago - 1 dependent repositories - 92 downloads last month - 5 stars on GitHub - 1 maintainer
ctc-score 0.1.3
CTC: A Unified Framework for Evaluating Natural Language Generation
6 versions - Latest release: almost 3 years ago - 1 dependent repositories - 416 downloads last month - 96 stars on GitHub - 1 maintainer
coreference-eval 0.0.2
Common metrics and evaluation tools for coreference chains (jsonline format)
2 versions - Latest release: over 2 years ago - 1 dependent repositories - 71 downloads last month - 4 stars on GitHub - 1 maintainer
classeval 0.2.2 💰
Python package classeval
17 versions - Latest release: about 1 year ago - 2 dependent packages - 2 dependent repositories - 1.94 thousand downloads last month - 7 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
jiwer 3.1.0
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
22 versions - Latest release: 3 months ago - 43 dependent packages - 1,125 dependent repositories - 826 thousand downloads last month - 549 stars on GitHub - 1 maintainer
metric-eval 1.0.2
a python package for evaluating evaluation metrics
3 versions - Latest release: over 1 year ago - 58 downloads last month - 11 stars on GitHub - 1 maintainer
ir_evaluation 1.1.0
Information retrieval evaluation metrics in pure python with zero dependencies
5 versions - Latest release: 3 months ago - 218 downloads last month - 8 stars on GitHub - 1 maintainer
gleu 1.1.0
GLEU: evaluation metric for grammatical error correction
2 versions - Latest release: almost 2 years ago - 119 downloads last month - 3 stars on GitHub - 1 maintainer
top-pr 0.2.1
TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Mo...
3 versions - Latest release: over 1 year ago - 1 dependent package - 581 downloads last month - 103 stars on GitHub - 1 maintainer
agentops 0.4.6
Observability and DevTool Platform for AI Agents
91 versions - Latest release: 12 days ago - 1 dependent package - 1 dependent repositories - 64.9 thousand downloads last month - 1,639 stars on GitHub - 2 maintainers
distfuse 0.1.4
Compute DistFuse similarity scores from embedding models and APIs
5 versions - Latest release: 10 months ago - 241 downloads last month - 5 stars on GitHub - 1 maintainer
athina 1.7.35
Python SDK to configure and run evaluations for your LLM-based application
183 versions - Latest release: 4 days ago - 11.5 thousand downloads last month - 139 stars on GitHub - 1 maintainer
corec 1.1.5
A Context-Aware Recommendation Framework for Python
15 versions - Latest release: 7 days ago - 817 downloads last month - 537 stars on GitHub - 1 maintainer
cd-fvd 0.1.1
FVD calculation in PyTorch with I3D or VideoMAE models
3 versions - Latest release: 9 months ago - 1.3 thousand downloads last month - 103 stars on GitHub - 1 maintainer
valor-lite 0.34.3
Evaluate machine learning models.
25 versions - Latest release: 5 days ago - 987 downloads last month - 38 stars on GitHub - 1 maintainer
Top 6.9% on pypi.org
nf1 0.0.4
NF1: Normalized F1 score for community evaluation against ground truth
2 versions - Latest release: almost 4 years ago - 1 dependent package - 14 dependent repositories - 1.04 thousand downloads last month - 22 stars on GitHub - 1 maintainer
boost-loss 0.5.5 💰
Utilities for easy use of custom losses in CatBoost, LightGBM, XGBoost
21 versions - Latest release: about 1 year ago - 697 downloads last month - 9 stars on GitHub - 1 maintainer
testllm 0.14.1
Deep eval provides evaluation platform to accelerate development of LLMs and Agents
1 version - Latest release: over 1 year ago - 59 downloads last month - 5,915 stars on GitHub - 1 maintainer
Top 3.8% on pypi.org
nervaluate 0.2.0 💰
NER evaluation considering partial match scoring
7 versions - Latest release: 11 months ago - 2 dependent packages - 20 dependent repositories - 42.6 thousand downloads last month - 130 stars on GitHub - 1 maintainer
ward-metrics 0.9.5
Tools for event-based evaluation for activity recognition problems.
6 versions - Latest release: about 7 years ago - 2 dependent repositories - 147 downloads last month - 6 stars on GitHub - 1 maintainer
Top 9.9% on pypi.org
permetrics 2.0.0
PerMetrics: A Framework of Performance Metrics for Machine Learning Models
20 versions - Latest release: about 1 year ago - 8 dependent packages - 1 dependent repositories - 19.2 thousand downloads last month - 73 stars on GitHub - 1 maintainer
hbb2obb 1.0.0
Toolkit for converting horizontal bounding boxes to oriented bounding boxes using segmentation mo...
1 version - Latest release: 15 days ago - 123 downloads last month - 0 stars on GitHub - 1 maintainer
llmevals 0.1.0
Eval
2 versions - Latest release: over 1 year ago - 70 downloads last month - 3,548 stars on GitHub - 1 maintainer
Top 5.3% on pypi.org
ranx 0.3.20
ranx: A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion
46 versions - Latest release: 10 months ago - 4 dependent packages - 7 dependent repositories - 25.3 thousand downloads last month - 537 stars on GitHub - 1 maintainer
bokbokbok 0.6.1
Custom Losses and Metrics for XGBoost, LightGBM, CatBoost
7 versions - Latest release: almost 4 years ago - 1 dependent repositories - 272 downloads last month - 36 stars on GitHub - 1 maintainer
regressormetricgraphplot 0.0.3
A simple package for comparing different Regression Models and Plotting with their most common ev...
3 versions - Latest release: about 4 years ago - 1 dependent repositories - 65 downloads last month - 6 stars on GitHub - 1 maintainer
faster-coco-eval 1.6.5
Faster interpretation of the original COCOEval
21 versions - Latest release: 6 months ago - 1 dependent package - 1 dependent repositories - 45.8 thousand downloads last month - 75 stars on GitHub - 1 maintainer
continuous-eval 0.3.14
Open-Source Evaluation for GenAI Applications.
28 versions - Latest release: 4 months ago - 3.11 thousand downloads last month - 446 stars on GitHub - 1 maintainer
fightin-words 1.0.5
An implementation of Monroe et. al's Fightin' Words Analysis
2 versions - Latest release: about 6 years ago - 2 dependent repositories - 100 downloads last month - 12 stars on GitHub - 1 maintainer
Top 5.2% on pypi.org
fast-bss-eval 0.1.4
Package for fast computation of BSS Eval metrics for source separation
7 versions - Latest release: almost 3 years ago - 3 dependent packages - 9 dependent repositories - 19.6 thousand downloads last month - 126 stars on GitHub - 1 maintainer
subsonar 1.0
Evaluate the quality of SRT files using the multilingual multimodal SONAR model.
1 version - Latest release: 11 months ago - 63 downloads last month - 13 stars on GitHub - 1 maintainer
Top 7.1% on pypi.org
pynlpl 1.2.9
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contai...
102 versions - Latest release: about 6 years ago - 1 dependent package - 4 dependent repositories - 2.52 thousand downloads last month - 477 stars on GitHub - 1 maintainer
Top 3.5% on pypi.org
rliable 1.2.0
rliable: Reliable evaluation on reinforcement learning and machine learning benchmarks.
11 versions - Latest release: 8 months ago - 6 dependent packages - 15 dependent repositories - 1.92 thousand downloads last month - 708 stars on GitHub - 2 maintainers
rliable-fork 1.2.0
rliable: Reliable evaluation on reinforcement learning and machine learning benchmarks.
1 version - Latest release: 8 months ago - 22 downloads last month - 708 stars on GitHub - 1 maintainer
skflex 1.0.2
skflex provides a suite of flexible utility functions for use with the sklearn library
4 versions - Latest release: over 3 years ago - 1 dependent repositories - 170 downloads last month - 0 stars on GitHub - 1 maintainer
gan-evaluator 1.15
GAN Evaluator for IS and FID
9 versions - Latest release: about 2 years ago - 49 downloads last month - 10 stars on GitHub - 1 maintainer
synthetic-eval 0.1.4
Package for Evaluation of Synthetic Tabular Data Quality
8 versions - Latest release: 5 months ago - 159 downloads last month - 1 stars on GitHub - 1 maintainer
skloverlay 1.2.0
SKLearn Classification Interface
5 versions - Latest release: over 1 year ago - 137 downloads last month - 0 stars on GitHub - 1 maintainer
guardrails-ai-unbabel-comet 2.2.1
High-quality Machine Translation Evaluation
1 version - Latest release: over 1 year ago - 1 dependent package - 70 downloads last month - 566 stars on GitHub - 1 maintainer
Top 3.5% on pypi.org
unbabel-comet 2.2.5
High-quality Machine Translation Evaluation
36 versions - Latest release: 24 days ago - 4 dependent packages - 45 dependent repositories - 34.8 thousand downloads last month - 566 stars on GitHub - 2 maintainers
semalex 1.3.4
A comprehensive evaluation metric designed to measure the weighted similarity score by prioritizi...
7 versions - Latest release: 8 months ago - 170 downloads last month - 2 stars on GitHub - 1 maintainer
cleval 0.1.1
cleval
2 versions - Latest release: over 1 year ago - 104 downloads last month - 185 stars on GitHub - 1 maintainer
echoswift 1.1.3
LLM Inference Benchmarking Tool
8 versions - Latest release: 7 months ago - 291 downloads last month - 6 stars on GitHub - 1 maintainer
tvalmetrics 1.0.2
RAG evaluation metrics.
6 versions - Latest release: over 1 year ago - 1 dependent package - 264 downloads last month - 245 stars on GitHub - 1 maintainer
clayrs 0.5.1
Complexly represent contents, build recommender systems, evaluate them. All in one place!
12 versions - Latest release: almost 2 years ago - 234 downloads last month - 35 stars on GitHub - 1 maintainer
Top 8.9% on pypi.org
codebleu 0.7.0
Unofficial CodeBLEU implementation that supports Linux, MacOS and Windows available on PyPI.
14 versions - Latest release: 11 months ago - 3 dependent repositories - 7.97 thousand downloads last month - 61 stars on GitHub - 1 maintainer
aqudem 0.2.0
Activity and Sequence Detection Performance Measures: A package to evaluate activity detection re...
3 versions - Latest release: 6 months ago - 156 downloads last month - 1 stars on GitHub - 1 maintainer
deepevals 0.2.0
Eval
1 version - Latest release: over 1 year ago - 63 downloads last month - 3,548 stars on GitHub - 1 maintainer
kolena 1.61.0
Client for Kolena's machine learning testing platform.
113 versions - Latest release: about 1 month ago - 1 dependent repositories - 12 thousand downloads last month - 48 stars on GitHub - 1 maintainer
kolena-client 1.61.0
Client for Kolena's machine learning testing platform.
118 versions - Latest release: about 1 month ago - 2.25 thousand downloads last month - 45 stars on GitHub - 1 maintainer
rke-score 0.0.7
Compute Renyi Kernel Entropy scores (RKE-MC and RRKE) for two sets of vectors.
5 versions - Latest release: over 1 year ago - 190 downloads last month - 11 stars on GitHub - 1 maintainer
mini-judge 0.4.1
Simple implementation of LLM-As-Judge for pairwise evaluation of Q&A models
6 versions - Latest release: over 1 year ago - 188 downloads last month - 3 stars on GitHub - 1 maintainer
nlptutti 0.0.2
nlp measurement package
10 versions - Latest release: over 2 years ago - 1 dependent repositories - 964 downloads last month - 62 stars on GitHub - 1 maintainer
tvallogging 1.0.0
Logging for Tonic Validate
4 versions - Latest release: over 1 year ago - 146 downloads last month - 245 stars on GitHub - 1 maintainer
evalify 1.0.0
Evaluate your face or voice verification models literally in seconds.
6 versions - Latest release: 5 months ago - 1 dependent repositories - 214 downloads last month - 19 stars on GitHub - 1 maintainer
nereval 0.2.5
Evaluation script for named entity recognition systems based on F1 score.
3 versions - Latest release: almost 7 years ago - 1 dependent repositories - 250 downloads last month - 70 stars on GitHub - 1 maintainer
hmeasure 0.1.6
H-Measure Classification Metric
7 versions - Latest release: about 4 years ago - 447 downloads last month - 6 stars on GitHub - 1 maintainer
debobo 0.1.6
Package for evaluating object detection models
6 versions - Latest release: almost 6 years ago - 1 dependent repositories - 157 downloads last month - 1 stars on GitHub - 1 maintainer
pate 0.1.1
PATE: Proximity-Aware Time series anomaly Evaluation metric
2 versions - Latest release: 11 months ago - 88 downloads last month - 11 stars on GitHub - 1 maintainer
tonic-validate 6.2.0
RAG evaluation metrics.
26 versions - Latest release: 5 months ago - 2 dependent packages - 1.24 thousand downloads last month - 245 stars on GitHub - 1 maintainer
easy-lm-eval 0.1.2
A library for easy evaluation of language models
3 versions - Latest release: about 1 year ago - 86 downloads last month - 3 stars on GitHub - 1 maintainer
falcon-evaluate 0.1.6
Falcon Evaluate is an open-source Python library designed to simplify the process of evaluating a...
17 versions - Latest release: over 1 year ago - 1 dependent package - 584 downloads last month - 7 stars on GitHub - 1 maintainer
v-stream 0.1.2
STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models
8 versions - Latest release: about 1 year ago - 268 downloads last month - 26 stars on GitHub - 1 maintainer
pytolemaic 0.15.4
Package for ML model analysis
56 versions - Latest release: almost 3 years ago - 1 dependent repositories - 899 downloads last month - 11 stars on GitHub - 1 maintainer
Top 5.6% on pypi.org
prdc 0.2
Compute precision, recall, density, and coverage metrics for two sets of vectors.
1 version - Latest release: about 5 years ago - 2 dependent packages - 17 dependent repositories - 1.26 thousand downloads last month - 254 stars on GitHub - 1 maintainer
colortransferlib 2.0.1
This library provides color and tyle transfer algorithms which were published in scientific paper...
8 versions - Latest release: about 2 months ago - 289 downloads last month - 6 stars on GitHub - 1 maintainer
probability-calibration 0.0.1
Utilities to calibrate model outcome probability and evaluate calibration.
1 version - Latest release: about 4 years ago - 1 dependent repositories - 56 downloads last month - 1 stars on GitHub - 1 maintainer
Top 5.6% on pypi.org
octis 1.14.0
OCTIS: a library for Optimizing and Comparing Topic Models.
35 versions - Latest release: 9 months ago - 2 dependent packages - 3 dependent repositories - 3.24 thousand downloads last month - 690 stars on GitHub - 1 maintainer
Top 4.6% on pypi.org
pyrouge 0.1.3
A Python wrapper for the ROUGE summarization evaluation package.
4 versions - Latest release: over 1 year ago - 339 dependent repositories - 5.03 thousand downloads last month - 248 stars on GitHub - 1 maintainer
quica 0.2.5
Quick Inter Coder Agreement in Python
6 versions - Latest release: over 4 years ago - 1 dependent repositories - 209 downloads last month - 23 stars on GitHub - 1 maintainer
frd-score 0.0.2
Package for calculating Fréchet Radiomics Distance (FRD)
2 versions - Latest release: 10 months ago - 74 downloads last month - 2 stars on GitHub - 1 maintainer
survivaleval 0.4.1
The most comprehensive Python package for evaluating survival analysis models.
9 versions - Latest release: about 1 month ago - 813 downloads last month - 33 stars on GitHub - 1 maintainer
rank-eval 0.1.3
rank_eval: A Blazing Fast Python Library for Ranking Evaluation and Comparison
5 versions - Latest release: over 3 years ago - 1 dependent repositories - 389 downloads last month - 524 stars on GitHub - 1 maintainer
guap 0.1.4
Open-source evaluation metric for linking Machine Learning model outputs with Business outcomes
4 versions - Latest release: almost 4 years ago - 1 dependent repositories - 125 downloads last month - 23 stars on GitHub - 1 maintainer
rankeval 0.8.2
Tool for the analysis and evaluation of Learning to Rank models based on ensembles of regression ...
8 versions - Latest release: over 5 years ago - 1 dependent repositories - 756 downloads last month - 88 stars on GitHub - 1 maintainer
testclayrs 0.1.1
Complexly represent contents, build recommender systems, evaluate them. All in one place!
2 versions - Latest release: almost 3 years ago - 8 stars on GitHub
tutti-nlp 0.0.1
nlp measurement package
1 version - Latest release: over 2 years ago - 14 stars on GitHub
Related Keywords
evaluation 32 evaluation-framework 19 machine-learning 18 python 18 nlp 12 metrics 9 llm 8 llmops 7 rag 6 information-retrieval 6 llm-evaluation 6 natural-language-processing 5 large-language-models 5 classification 5 retrieval-augmented-generation 4 deep-learning 4 scikit-learn 4 generative-model 4 data-science 4 recommender-systems 4 llms 4 llm-evaluation-metrics 4 llm-evaluation-framework 4 recommender-system 3 word-error-rate 3 ai 3 recall 3 regression 3 evaluate 3 pytorch 3 comparison 3 data-fusion 3 information-retrieval-evaluation 3 information-retrieval-metrics 3 metasearch 3 numba 3 rank-fusion 3 ranking-metrics 3 score-fusion 3 object-detection 3 mlops 3 generative-adversarial-network 3 computer-vision 3 precision 3 artificial-intelligence 3 plot 3 evaluation-functions 3 speech-to-text 3 wer 3 mae 2 Unbabel 2 rmse 2 Evaluation 2 activity-recognition 2 activity recognition 2 Machine Translation 2 ner 2 named-entity-recognition 2 xgboost 2 synthetic-data 2 sklearn 2 linguistics 2 lightgbm 2 rl 2 image-segmentation 2 nlp-library 2 benchmarking 2 reproducibility 2 research 2 reinforcement 2 reinforcement-learning 2 machine 2 custom-loss-functions 2 learning 2 google 2 information retrieval 2 trec_eval 2 content-based-recommendation 2 graph-based-recommendation 2 ranking 2 hyperparameter-optimization 2 Kolena 2 machine-translation 2 ML 2 testing 2 evaluate-models 2 diversity 2 coefficient-of-determination 2 classification-report 2 amazon 2 aws 2 COMET 2 cer 2 character-error-rate 2 computing-error-rates 2 korean 2 normalization 2 speech-analysis 2 speech-recognition 2 test 2