pypi.org "evaluation-metrics" keyword
View the packages on the pypi.org package registry that are tagged with the "evaluation-metrics" keyword.
ir-metrics 0.1.6
The most common information retrieval (IR) metrics15 versions - Latest release: about 4 years ago - 5 dependent repositories - 5.04 thousand downloads last month - 5 stars on GitHub - 1 maintainer
repsys-framework 0.4.1
Framework for developing and analyzing recommender systems.21 versions - Latest release: over 1 year ago - 2 dependent repositories - 745 downloads last month - 36 stars on GitHub - 1 maintainer
Top 9.3% on pypi.org
384 versions - Latest release: about 24 hours ago - 7 dependent packages - 1 dependent repositories - 456 thousand downloads last month - 5,915 stars on GitHub - 1 maintainer
deepeval 2.7.5
The LLM Evaluation Framework384 versions - Latest release: about 24 hours ago - 7 dependent packages - 1 dependent repositories - 456 thousand downloads last month - 5,915 stars on GitHub - 1 maintainer
lighteval 0.8.1
A lightweight and configurable evaluation package13 versions - Latest release: 26 days ago - 6.06 thousand downloads last month - 1,429 stars on GitHub - 3 maintainers
daze 0.1.1
Better multi-class confusion matrix plots for Scikit-Learn, incorporating per-class and overall e...3 versions - Latest release: about 4 years ago - 1 dependent repositories - 92 downloads last month - 5 stars on GitHub - 1 maintainer
ctc-score 0.1.3
CTC: A Unified Framework for Evaluating Natural Language Generation6 versions - Latest release: almost 3 years ago - 1 dependent repositories - 416 downloads last month - 96 stars on GitHub - 1 maintainer
coreference-eval 0.0.2
Common metrics and evaluation tools for coreference chains (jsonline format)2 versions - Latest release: over 2 years ago - 1 dependent repositories - 71 downloads last month - 4 stars on GitHub - 1 maintainer
classeval 0.2.2 💰
Python package classeval17 versions - Latest release: about 1 year ago - 2 dependent packages - 2 dependent repositories - 1.94 thousand downloads last month - 7 stars on GitHub - 1 maintainer
Top 1.7% on pypi.org
22 versions - Latest release: 3 months ago - 43 dependent packages - 1,125 dependent repositories - 826 thousand downloads last month - 549 stars on GitHub - 1 maintainer
jiwer 3.1.0
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)22 versions - Latest release: 3 months ago - 43 dependent packages - 1,125 dependent repositories - 826 thousand downloads last month - 549 stars on GitHub - 1 maintainer
metric-eval 1.0.2
a python package for evaluating evaluation metrics3 versions - Latest release: over 1 year ago - 58 downloads last month - 11 stars on GitHub - 1 maintainer
ir_evaluation 1.1.0
Information retrieval evaluation metrics in pure python with zero dependencies5 versions - Latest release: 3 months ago - 218 downloads last month - 8 stars on GitHub - 1 maintainer
gleu 1.1.0
GLEU: evaluation metric for grammatical error correction2 versions - Latest release: almost 2 years ago - 119 downloads last month - 3 stars on GitHub - 1 maintainer
top-pr 0.2.1
TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Mo...3 versions - Latest release: over 1 year ago - 1 dependent package - 581 downloads last month - 103 stars on GitHub - 1 maintainer
agentops 0.4.6
Observability and DevTool Platform for AI Agents91 versions - Latest release: 12 days ago - 1 dependent package - 1 dependent repositories - 64.9 thousand downloads last month - 1,639 stars on GitHub - 2 maintainers
distfuse 0.1.4
Compute DistFuse similarity scores from embedding models and APIs5 versions - Latest release: 10 months ago - 241 downloads last month - 5 stars on GitHub - 1 maintainer
athina 1.7.35
Python SDK to configure and run evaluations for your LLM-based application183 versions - Latest release: 4 days ago - 11.5 thousand downloads last month - 139 stars on GitHub - 1 maintainer
corec 1.1.5
A Context-Aware Recommendation Framework for Python15 versions - Latest release: 7 days ago - 817 downloads last month - 537 stars on GitHub - 1 maintainer
cd-fvd 0.1.1
FVD calculation in PyTorch with I3D or VideoMAE models3 versions - Latest release: 9 months ago - 1.3 thousand downloads last month - 103 stars on GitHub - 1 maintainer
valor-lite 0.34.3
Evaluate machine learning models.25 versions - Latest release: 5 days ago - 987 downloads last month - 38 stars on GitHub - 1 maintainer
Top 6.9% on pypi.org
2 versions - Latest release: almost 4 years ago - 1 dependent package - 14 dependent repositories - 1.04 thousand downloads last month - 22 stars on GitHub - 1 maintainer
nf1 0.0.4
NF1: Normalized F1 score for community evaluation against ground truth2 versions - Latest release: almost 4 years ago - 1 dependent package - 14 dependent repositories - 1.04 thousand downloads last month - 22 stars on GitHub - 1 maintainer
boost-loss 0.5.5 💰
Utilities for easy use of custom losses in CatBoost, LightGBM, XGBoost21 versions - Latest release: about 1 year ago - 697 downloads last month - 9 stars on GitHub - 1 maintainer
testllm 0.14.1
Deep eval provides evaluation platform to accelerate development of LLMs and Agents1 version - Latest release: over 1 year ago - 59 downloads last month - 5,915 stars on GitHub - 1 maintainer
Top 3.8% on pypi.org
7 versions - Latest release: 11 months ago - 2 dependent packages - 20 dependent repositories - 42.6 thousand downloads last month - 130 stars on GitHub - 1 maintainer
nervaluate 0.2.0 💰
NER evaluation considering partial match scoring7 versions - Latest release: 11 months ago - 2 dependent packages - 20 dependent repositories - 42.6 thousand downloads last month - 130 stars on GitHub - 1 maintainer
ward-metrics 0.9.5
Tools for event-based evaluation for activity recognition problems.6 versions - Latest release: about 7 years ago - 2 dependent repositories - 147 downloads last month - 6 stars on GitHub - 1 maintainer
Top 9.9% on pypi.org
20 versions - Latest release: about 1 year ago - 8 dependent packages - 1 dependent repositories - 19.2 thousand downloads last month - 73 stars on GitHub - 1 maintainer
permetrics 2.0.0
PerMetrics: A Framework of Performance Metrics for Machine Learning Models20 versions - Latest release: about 1 year ago - 8 dependent packages - 1 dependent repositories - 19.2 thousand downloads last month - 73 stars on GitHub - 1 maintainer
hbb2obb 1.0.0
Toolkit for converting horizontal bounding boxes to oriented bounding boxes using segmentation mo...1 version - Latest release: 15 days ago - 123 downloads last month - 0 stars on GitHub - 1 maintainer
llmevals 0.1.0
Eval2 versions - Latest release: over 1 year ago - 70 downloads last month - 3,548 stars on GitHub - 1 maintainer
Top 5.3% on pypi.org
46 versions - Latest release: 10 months ago - 4 dependent packages - 7 dependent repositories - 25.3 thousand downloads last month - 537 stars on GitHub - 1 maintainer
ranx 0.3.20
ranx: A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion46 versions - Latest release: 10 months ago - 4 dependent packages - 7 dependent repositories - 25.3 thousand downloads last month - 537 stars on GitHub - 1 maintainer
bokbokbok 0.6.1
Custom Losses and Metrics for XGBoost, LightGBM, CatBoost7 versions - Latest release: almost 4 years ago - 1 dependent repositories - 272 downloads last month - 36 stars on GitHub - 1 maintainer
regressormetricgraphplot 0.0.3
A simple package for comparing different Regression Models and Plotting with their most common ev...3 versions - Latest release: about 4 years ago - 1 dependent repositories - 65 downloads last month - 6 stars on GitHub - 1 maintainer
faster-coco-eval 1.6.5
Faster interpretation of the original COCOEval21 versions - Latest release: 6 months ago - 1 dependent package - 1 dependent repositories - 45.8 thousand downloads last month - 75 stars on GitHub - 1 maintainer
continuous-eval 0.3.14
Open-Source Evaluation for GenAI Applications.28 versions - Latest release: 4 months ago - 3.11 thousand downloads last month - 446 stars on GitHub - 1 maintainer
fightin-words 1.0.5
An implementation of Monroe et. al's Fightin' Words Analysis2 versions - Latest release: about 6 years ago - 2 dependent repositories - 100 downloads last month - 12 stars on GitHub - 1 maintainer
Top 5.2% on pypi.org
7 versions - Latest release: almost 3 years ago - 3 dependent packages - 9 dependent repositories - 19.6 thousand downloads last month - 126 stars on GitHub - 1 maintainer
fast-bss-eval 0.1.4
Package for fast computation of BSS Eval metrics for source separation7 versions - Latest release: almost 3 years ago - 3 dependent packages - 9 dependent repositories - 19.6 thousand downloads last month - 126 stars on GitHub - 1 maintainer
subsonar 1.0
Evaluate the quality of SRT files using the multilingual multimodal SONAR model.1 version - Latest release: 11 months ago - 63 downloads last month - 13 stars on GitHub - 1 maintainer
Top 7.1% on pypi.org
102 versions - Latest release: about 6 years ago - 1 dependent package - 4 dependent repositories - 2.52 thousand downloads last month - 477 stars on GitHub - 1 maintainer
pynlpl 1.2.9
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contai...102 versions - Latest release: about 6 years ago - 1 dependent package - 4 dependent repositories - 2.52 thousand downloads last month - 477 stars on GitHub - 1 maintainer
Top 3.5% on pypi.org
11 versions - Latest release: 8 months ago - 6 dependent packages - 15 dependent repositories - 1.92 thousand downloads last month - 708 stars on GitHub - 2 maintainers
rliable 1.2.0
rliable: Reliable evaluation on reinforcement learning and machine learning benchmarks.11 versions - Latest release: 8 months ago - 6 dependent packages - 15 dependent repositories - 1.92 thousand downloads last month - 708 stars on GitHub - 2 maintainers
rliable-fork 1.2.0
rliable: Reliable evaluation on reinforcement learning and machine learning benchmarks.1 version - Latest release: 8 months ago - 22 downloads last month - 708 stars on GitHub - 1 maintainer
skflex 1.0.2
skflex provides a suite of flexible utility functions for use with the sklearn library4 versions - Latest release: over 3 years ago - 1 dependent repositories - 170 downloads last month - 0 stars on GitHub - 1 maintainer
gan-evaluator 1.15
GAN Evaluator for IS and FID9 versions - Latest release: about 2 years ago - 49 downloads last month - 10 stars on GitHub - 1 maintainer
synthetic-eval 0.1.4
Package for Evaluation of Synthetic Tabular Data Quality8 versions - Latest release: 5 months ago - 159 downloads last month - 1 stars on GitHub - 1 maintainer
skloverlay 1.2.0
SKLearn Classification Interface5 versions - Latest release: over 1 year ago - 137 downloads last month - 0 stars on GitHub - 1 maintainer
guardrails-ai-unbabel-comet 2.2.1
High-quality Machine Translation Evaluation1 version - Latest release: over 1 year ago - 1 dependent package - 70 downloads last month - 566 stars on GitHub - 1 maintainer
Top 3.5% on pypi.org
36 versions - Latest release: 24 days ago - 4 dependent packages - 45 dependent repositories - 34.8 thousand downloads last month - 566 stars on GitHub - 2 maintainers
unbabel-comet 2.2.5
High-quality Machine Translation Evaluation36 versions - Latest release: 24 days ago - 4 dependent packages - 45 dependent repositories - 34.8 thousand downloads last month - 566 stars on GitHub - 2 maintainers
semalex 1.3.4
A comprehensive evaluation metric designed to measure the weighted similarity score by prioritizi...7 versions - Latest release: 8 months ago - 170 downloads last month - 2 stars on GitHub - 1 maintainer
cleval 0.1.1
cleval2 versions - Latest release: over 1 year ago - 104 downloads last month - 185 stars on GitHub - 1 maintainer
echoswift 1.1.3
LLM Inference Benchmarking Tool8 versions - Latest release: 7 months ago - 291 downloads last month - 6 stars on GitHub - 1 maintainer
tvalmetrics 1.0.2
RAG evaluation metrics.6 versions - Latest release: over 1 year ago - 1 dependent package - 264 downloads last month - 245 stars on GitHub - 1 maintainer
clayrs 0.5.1
Complexly represent contents, build recommender systems, evaluate them. All in one place!12 versions - Latest release: almost 2 years ago - 234 downloads last month - 35 stars on GitHub - 1 maintainer
Top 8.9% on pypi.org
14 versions - Latest release: 11 months ago - 3 dependent repositories - 7.97 thousand downloads last month - 61 stars on GitHub - 1 maintainer
codebleu 0.7.0
Unofficial CodeBLEU implementation that supports Linux, MacOS and Windows available on PyPI.14 versions - Latest release: 11 months ago - 3 dependent repositories - 7.97 thousand downloads last month - 61 stars on GitHub - 1 maintainer
aqudem 0.2.0
Activity and Sequence Detection Performance Measures: A package to evaluate activity detection re...3 versions - Latest release: 6 months ago - 156 downloads last month - 1 stars on GitHub - 1 maintainer
deepevals 0.2.0
Eval1 version - Latest release: over 1 year ago - 63 downloads last month - 3,548 stars on GitHub - 1 maintainer
kolena 1.61.0
Client for Kolena's machine learning testing platform.113 versions - Latest release: about 1 month ago - 1 dependent repositories - 12 thousand downloads last month - 48 stars on GitHub - 1 maintainer
kolena-client 1.61.0
Client for Kolena's machine learning testing platform.118 versions - Latest release: about 1 month ago - 2.25 thousand downloads last month - 45 stars on GitHub - 1 maintainer
rke-score 0.0.7
Compute Renyi Kernel Entropy scores (RKE-MC and RRKE) for two sets of vectors.5 versions - Latest release: over 1 year ago - 190 downloads last month - 11 stars on GitHub - 1 maintainer
mini-judge 0.4.1
Simple implementation of LLM-As-Judge for pairwise evaluation of Q&A models6 versions - Latest release: over 1 year ago - 188 downloads last month - 3 stars on GitHub - 1 maintainer
nlptutti 0.0.2
nlp measurement package10 versions - Latest release: over 2 years ago - 1 dependent repositories - 964 downloads last month - 62 stars on GitHub - 1 maintainer
tvallogging 1.0.0
Logging for Tonic Validate4 versions - Latest release: over 1 year ago - 146 downloads last month - 245 stars on GitHub - 1 maintainer
evalify 1.0.0
Evaluate your face or voice verification models literally in seconds.6 versions - Latest release: 5 months ago - 1 dependent repositories - 214 downloads last month - 19 stars on GitHub - 1 maintainer
nereval 0.2.5
Evaluation script for named entity recognition systems based on F1 score.3 versions - Latest release: almost 7 years ago - 1 dependent repositories - 250 downloads last month - 70 stars on GitHub - 1 maintainer
hmeasure 0.1.6
H-Measure Classification Metric7 versions - Latest release: about 4 years ago - 447 downloads last month - 6 stars on GitHub - 1 maintainer
debobo 0.1.6
Package for evaluating object detection models6 versions - Latest release: almost 6 years ago - 1 dependent repositories - 157 downloads last month - 1 stars on GitHub - 1 maintainer
pate 0.1.1
PATE: Proximity-Aware Time series anomaly Evaluation metric2 versions - Latest release: 11 months ago - 88 downloads last month - 11 stars on GitHub - 1 maintainer
tonic-validate 6.2.0
RAG evaluation metrics.26 versions - Latest release: 5 months ago - 2 dependent packages - 1.24 thousand downloads last month - 245 stars on GitHub - 1 maintainer
easy-lm-eval 0.1.2
A library for easy evaluation of language models3 versions - Latest release: about 1 year ago - 86 downloads last month - 3 stars on GitHub - 1 maintainer
falcon-evaluate 0.1.6
Falcon Evaluate is an open-source Python library designed to simplify the process of evaluating a...17 versions - Latest release: over 1 year ago - 1 dependent package - 584 downloads last month - 7 stars on GitHub - 1 maintainer
v-stream 0.1.2
STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models8 versions - Latest release: about 1 year ago - 268 downloads last month - 26 stars on GitHub - 1 maintainer
pytolemaic 0.15.4
Package for ML model analysis56 versions - Latest release: almost 3 years ago - 1 dependent repositories - 899 downloads last month - 11 stars on GitHub - 1 maintainer
Top 5.6% on pypi.org
1 version - Latest release: about 5 years ago - 2 dependent packages - 17 dependent repositories - 1.26 thousand downloads last month - 254 stars on GitHub - 1 maintainer
prdc 0.2
Compute precision, recall, density, and coverage metrics for two sets of vectors.1 version - Latest release: about 5 years ago - 2 dependent packages - 17 dependent repositories - 1.26 thousand downloads last month - 254 stars on GitHub - 1 maintainer
colortransferlib 2.0.1
This library provides color and tyle transfer algorithms which were published in scientific paper...8 versions - Latest release: about 2 months ago - 289 downloads last month - 6 stars on GitHub - 1 maintainer
probability-calibration 0.0.1
Utilities to calibrate model outcome probability and evaluate calibration.1 version - Latest release: about 4 years ago - 1 dependent repositories - 56 downloads last month - 1 stars on GitHub - 1 maintainer
Top 5.6% on pypi.org
35 versions - Latest release: 9 months ago - 2 dependent packages - 3 dependent repositories - 3.24 thousand downloads last month - 690 stars on GitHub - 1 maintainer
octis 1.14.0
OCTIS: a library for Optimizing and Comparing Topic Models.35 versions - Latest release: 9 months ago - 2 dependent packages - 3 dependent repositories - 3.24 thousand downloads last month - 690 stars on GitHub - 1 maintainer
Top 4.6% on pypi.org
4 versions - Latest release: over 1 year ago - 339 dependent repositories - 5.03 thousand downloads last month - 248 stars on GitHub - 1 maintainer
pyrouge 0.1.3
A Python wrapper for the ROUGE summarization evaluation package.4 versions - Latest release: over 1 year ago - 339 dependent repositories - 5.03 thousand downloads last month - 248 stars on GitHub - 1 maintainer
quica 0.2.5
Quick Inter Coder Agreement in Python6 versions - Latest release: over 4 years ago - 1 dependent repositories - 209 downloads last month - 23 stars on GitHub - 1 maintainer
frd-score 0.0.2
Package for calculating Fréchet Radiomics Distance (FRD)2 versions - Latest release: 10 months ago - 74 downloads last month - 2 stars on GitHub - 1 maintainer
survivaleval 0.4.1
The most comprehensive Python package for evaluating survival analysis models.9 versions - Latest release: about 1 month ago - 813 downloads last month - 33 stars on GitHub - 1 maintainer
rank-eval 0.1.3
rank_eval: A Blazing Fast Python Library for Ranking Evaluation and Comparison5 versions - Latest release: over 3 years ago - 1 dependent repositories - 389 downloads last month - 524 stars on GitHub - 1 maintainer
guap 0.1.4
Open-source evaluation metric for linking Machine Learning model outputs with Business outcomes4 versions - Latest release: almost 4 years ago - 1 dependent repositories - 125 downloads last month - 23 stars on GitHub - 1 maintainer
rankeval 0.8.2
Tool for the analysis and evaluation of Learning to Rank models based on ensembles of regression ...8 versions - Latest release: over 5 years ago - 1 dependent repositories - 756 downloads last month - 88 stars on GitHub - 1 maintainer
testclayrs 0.1.1
Complexly represent contents, build recommender systems, evaluate them. All in one place!2 versions - Latest release: almost 3 years ago - 8 stars on GitHub
tutti-nlp 0.0.1
nlp measurement package1 version - Latest release: over 2 years ago - 14 stars on GitHub
Related Keywords
evaluation
32
evaluation-framework
19
machine-learning
18
python
18
nlp
12
metrics
9
llm
8
llmops
7
rag
6
information-retrieval
6
llm-evaluation
6
natural-language-processing
5
large-language-models
5
classification
5
retrieval-augmented-generation
4
deep-learning
4
scikit-learn
4
generative-model
4
data-science
4
recommender-systems
4
llms
4
llm-evaluation-metrics
4
llm-evaluation-framework
4
recommender-system
3
word-error-rate
3
ai
3
recall
3
regression
3
evaluate
3
pytorch
3
comparison
3
data-fusion
3
information-retrieval-evaluation
3
information-retrieval-metrics
3
metasearch
3
numba
3
rank-fusion
3
ranking-metrics
3
score-fusion
3
object-detection
3
mlops
3
generative-adversarial-network
3
computer-vision
3
precision
3
artificial-intelligence
3
plot
3
evaluation-functions
3
speech-to-text
3
wer
3
mae
2
Unbabel
2
rmse
2
Evaluation
2
activity-recognition
2
activity recognition
2
Machine Translation
2
ner
2
named-entity-recognition
2
xgboost
2
synthetic-data
2
sklearn
2
linguistics
2
lightgbm
2
rl
2
image-segmentation
2
nlp-library
2
benchmarking
2
reproducibility
2
research
2
reinforcement
2
reinforcement-learning
2
machine
2
custom-loss-functions
2
learning
2
google
2
information retrieval
2
trec_eval
2
content-based-recommendation
2
graph-based-recommendation
2
ranking
2
hyperparameter-optimization
2
Kolena
2
machine-translation
2
ML
2
testing
2
evaluate-models
2
diversity
2
coefficient-of-determination
2
classification-report
2
amazon
2
aws
2
COMET
2
cer
2
character-error-rate
2
computing-error-rates
2
korean
2
normalization
2
speech-analysis
2
speech-recognition
2
test
2