evaluation-metrics | pypi.org keywords

pypi.org "evaluation-metrics" keyword

View the packages on the pypi.org package registry that are tagged with the "evaluation-metrics" keyword.

ir-metrics 0.1.6

The most common information retrieval (IR) metrics
15 versions - Latest release: about 4 years ago - 5 dependent repositories - 5.04 thousand downloads last month - 5 stars on GitHub - 1 maintainer

repsys-framework 0.4.1

Framework for developing and analyzing recommender systems.
21 versions - Latest release: over 1 year ago - 2 dependent repositories - 745 downloads last month - 36 stars on GitHub - 1 maintainer

Top 9.3% on pypi.org

The LLM Evaluation Framework
384 versions - Latest release: about 24 hours ago - 7 dependent packages - 1 dependent repositories - 456 thousand downloads last month - 5,915 stars on GitHub - 1 maintainer

lighteval 0.8.1

A lightweight and configurable evaluation package
13 versions - Latest release: 26 days ago - 6.06 thousand downloads last month - 1,429 stars on GitHub - 3 maintainers

daze 0.1.1

Better multi-class confusion matrix plots for Scikit-Learn, incorporating per-class and overall e...
3 versions - Latest release: about 4 years ago - 1 dependent repositories - 92 downloads last month - 5 stars on GitHub - 1 maintainer

ctc-score 0.1.3

CTC: A Unified Framework for Evaluating Natural Language Generation
6 versions - Latest release: almost 3 years ago - 1 dependent repositories - 416 downloads last month - 96 stars on GitHub - 1 maintainer

coreference-eval 0.0.2

Common metrics and evaluation tools for coreference chains (jsonline format)
2 versions - Latest release: over 2 years ago - 1 dependent repositories - 71 downloads last month - 4 stars on GitHub - 1 maintainer

classeval 0.2.2 💰

Python package classeval
17 versions - Latest release: about 1 year ago - 2 dependent packages - 2 dependent repositories - 1.94 thousand downloads last month - 7 stars on GitHub - 1 maintainer

Top 1.7% on pypi.org

jiwer 3.1.0

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
22 versions - Latest release: 3 months ago - 43 dependent packages - 1,125 dependent repositories - 826 thousand downloads last month - 549 stars on GitHub - 1 maintainer

metric-eval 1.0.2

a python package for evaluating evaluation metrics
3 versions - Latest release: over 1 year ago - 58 downloads last month - 11 stars on GitHub - 1 maintainer

ir_evaluation 1.1.0

Information retrieval evaluation metrics in pure python with zero dependencies
5 versions - Latest release: 3 months ago - 218 downloads last month - 8 stars on GitHub - 1 maintainer

gleu 1.1.0

GLEU: evaluation metric for grammatical error correction
2 versions - Latest release: almost 2 years ago - 119 downloads last month - 3 stars on GitHub - 1 maintainer

top-pr 0.2.1

TopP&R: Robust Support Estimation Approach for Evaluating Fidelity and Diversity in Generative Mo...
3 versions - Latest release: over 1 year ago - 1 dependent package - 581 downloads last month - 103 stars on GitHub - 1 maintainer

agentops 0.4.6

Observability and DevTool Platform for AI Agents
91 versions - Latest release: 12 days ago - 1 dependent package - 1 dependent repositories - 64.9 thousand downloads last month - 1,639 stars on GitHub - 2 maintainers

distfuse 0.1.4

Compute DistFuse similarity scores from embedding models and APIs
5 versions - Latest release: 10 months ago - 241 downloads last month - 5 stars on GitHub - 1 maintainer

athina 1.7.35

Python SDK to configure and run evaluations for your LLM-based application
183 versions - Latest release: 4 days ago - 11.5 thousand downloads last month - 139 stars on GitHub - 1 maintainer

corec 1.1.5

A Context-Aware Recommendation Framework for Python
15 versions - Latest release: 7 days ago - 817 downloads last month - 537 stars on GitHub - 1 maintainer

cd-fvd 0.1.1

FVD calculation in PyTorch with I3D or VideoMAE models
3 versions - Latest release: 9 months ago - 1.3 thousand downloads last month - 103 stars on GitHub - 1 maintainer

valor-lite 0.34.3

Evaluate machine learning models.
25 versions - Latest release: 5 days ago - 987 downloads last month - 38 stars on GitHub - 1 maintainer

Top 6.9% on pypi.org

nf1 0.0.4

NF1: Normalized F1 score for community evaluation against ground truth
2 versions - Latest release: almost 4 years ago - 1 dependent package - 14 dependent repositories - 1.04 thousand downloads last month - 22 stars on GitHub - 1 maintainer

boost-loss 0.5.5 💰

Utilities for easy use of custom losses in CatBoost, LightGBM, XGBoost
21 versions - Latest release: about 1 year ago - 697 downloads last month - 9 stars on GitHub - 1 maintainer

testllm 0.14.1

Deep eval provides evaluation platform to accelerate development of LLMs and Agents
1 version - Latest release: over 1 year ago - 59 downloads last month - 5,915 stars on GitHub - 1 maintainer

Top 3.8% on pypi.org

nervaluate 0.2.0 💰

NER evaluation considering partial match scoring
7 versions - Latest release: 11 months ago - 2 dependent packages - 20 dependent repositories - 42.6 thousand downloads last month - 130 stars on GitHub - 1 maintainer

ward-metrics 0.9.5

Tools for event-based evaluation for activity recognition problems.
6 versions - Latest release: about 7 years ago - 2 dependent repositories - 147 downloads last month - 6 stars on GitHub - 1 maintainer

Top 9.9% on pypi.org

permetrics 2.0.0

PerMetrics: A Framework of Performance Metrics for Machine Learning Models
20 versions - Latest release: about 1 year ago - 8 dependent packages - 1 dependent repositories - 19.2 thousand downloads last month - 73 stars on GitHub - 1 maintainer

hbb2obb 1.0.0

Toolkit for converting horizontal bounding boxes to oriented bounding boxes using segmentation mo...
1 version - Latest release: 15 days ago - 123 downloads last month - 0 stars on GitHub - 1 maintainer

llmevals 0.1.0

Eval
2 versions - Latest release: over 1 year ago - 70 downloads last month - 3,548 stars on GitHub - 1 maintainer

Top 5.3% on pypi.org

ranx 0.3.20

ranx: A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion
46 versions - Latest release: 10 months ago - 4 dependent packages - 7 dependent repositories - 25.3 thousand downloads last month - 537 stars on GitHub - 1 maintainer

bokbokbok 0.6.1

Custom Losses and Metrics for XGBoost, LightGBM, CatBoost
7 versions - Latest release: almost 4 years ago - 1 dependent repositories - 272 downloads last month - 36 stars on GitHub - 1 maintainer

regressormetricgraphplot 0.0.3

A simple package for comparing different Regression Models and Plotting with their most common ev...
3 versions - Latest release: about 4 years ago - 1 dependent repositories - 65 downloads last month - 6 stars on GitHub - 1 maintainer

faster-coco-eval 1.6.5

Faster interpretation of the original COCOEval
21 versions - Latest release: 6 months ago - 1 dependent package - 1 dependent repositories - 45.8 thousand downloads last month - 75 stars on GitHub - 1 maintainer

continuous-eval 0.3.14

Open-Source Evaluation for GenAI Applications.
28 versions - Latest release: 4 months ago - 3.11 thousand downloads last month - 446 stars on GitHub - 1 maintainer

fightin-words 1.0.5

An implementation of Monroe et. al's Fightin' Words Analysis
2 versions - Latest release: about 6 years ago - 2 dependent repositories - 100 downloads last month - 12 stars on GitHub - 1 maintainer

Top 5.2% on pypi.org

fast-bss-eval 0.1.4

Package for fast computation of BSS Eval metrics for source separation
7 versions - Latest release: almost 3 years ago - 3 dependent packages - 9 dependent repositories - 19.6 thousand downloads last month - 126 stars on GitHub - 1 maintainer

subsonar 1.0

Evaluate the quality of SRT files using the multilingual multimodal SONAR model.
1 version - Latest release: 11 months ago - 63 downloads last month - 13 stars on GitHub - 1 maintainer

Top 7.1% on pypi.org

pynlpl 1.2.9

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contai...
102 versions - Latest release: about 6 years ago - 1 dependent package - 4 dependent repositories - 2.52 thousand downloads last month - 477 stars on GitHub - 1 maintainer

Top 3.5% on pypi.org

rliable 1.2.0

rliable: Reliable evaluation on reinforcement learning and machine learning benchmarks.
11 versions - Latest release: 8 months ago - 6 dependent packages - 15 dependent repositories - 1.92 thousand downloads last month - 708 stars on GitHub - 2 maintainers

rliable-fork 1.2.0

rliable: Reliable evaluation on reinforcement learning and machine learning benchmarks.
1 version - Latest release: 8 months ago - 22 downloads last month - 708 stars on GitHub - 1 maintainer

skflex 1.0.2

skflex provides a suite of flexible utility functions for use with the sklearn library
4 versions - Latest release: over 3 years ago - 1 dependent repositories - 170 downloads last month - 0 stars on GitHub - 1 maintainer

gan-evaluator 1.15

GAN Evaluator for IS and FID
9 versions - Latest release: about 2 years ago - 49 downloads last month - 10 stars on GitHub - 1 maintainer

synthetic-eval 0.1.4

Package for Evaluation of Synthetic Tabular Data Quality
8 versions - Latest release: 5 months ago - 159 downloads last month - 1 stars on GitHub - 1 maintainer

skloverlay 1.2.0

SKLearn Classification Interface
5 versions - Latest release: over 1 year ago - 137 downloads last month - 0 stars on GitHub - 1 maintainer

guardrails-ai-unbabel-comet 2.2.1

High-quality Machine Translation Evaluation
1 version - Latest release: over 1 year ago - 1 dependent package - 70 downloads last month - 566 stars on GitHub - 1 maintainer

Top 3.5% on pypi.org

unbabel-comet 2.2.5

High-quality Machine Translation Evaluation
36 versions - Latest release: 24 days ago - 4 dependent packages - 45 dependent repositories - 34.8 thousand downloads last month - 566 stars on GitHub - 2 maintainers

semalex 1.3.4

A comprehensive evaluation metric designed to measure the weighted similarity score by prioritizi...
7 versions - Latest release: 8 months ago - 170 downloads last month - 2 stars on GitHub - 1 maintainer

cleval 0.1.1

cleval
2 versions - Latest release: over 1 year ago - 104 downloads last month - 185 stars on GitHub - 1 maintainer

echoswift 1.1.3

LLM Inference Benchmarking Tool
8 versions - Latest release: 7 months ago - 291 downloads last month - 6 stars on GitHub - 1 maintainer

tvalmetrics 1.0.2

RAG evaluation metrics.
6 versions - Latest release: over 1 year ago - 1 dependent package - 264 downloads last month - 245 stars on GitHub - 1 maintainer

clayrs 0.5.1

Complexly represent contents, build recommender systems, evaluate them. All in one place!
12 versions - Latest release: almost 2 years ago - 234 downloads last month - 35 stars on GitHub - 1 maintainer

Top 8.9% on pypi.org

codebleu 0.7.0

Unofficial CodeBLEU implementation that supports Linux, MacOS and Windows available on PyPI.
14 versions - Latest release: 11 months ago - 3 dependent repositories - 7.97 thousand downloads last month - 61 stars on GitHub - 1 maintainer

aqudem 0.2.0

Activity and Sequence Detection Performance Measures: A package to evaluate activity detection re...
3 versions - Latest release: 6 months ago - 156 downloads last month - 1 stars on GitHub - 1 maintainer

deepevals 0.2.0

Eval
1 version - Latest release: over 1 year ago - 63 downloads last month - 3,548 stars on GitHub - 1 maintainer

kolena 1.61.0

Client for Kolena's machine learning testing platform.
113 versions - Latest release: about 1 month ago - 1 dependent repositories - 12 thousand downloads last month - 48 stars on GitHub - 1 maintainer

kolena-client 1.61.0

Client for Kolena's machine learning testing platform.
118 versions - Latest release: about 1 month ago - 2.25 thousand downloads last month - 45 stars on GitHub - 1 maintainer

rke-score 0.0.7

Compute Renyi Kernel Entropy scores (RKE-MC and RRKE) for two sets of vectors.
5 versions - Latest release: over 1 year ago - 190 downloads last month - 11 stars on GitHub - 1 maintainer

mini-judge 0.4.1

Simple implementation of LLM-As-Judge for pairwise evaluation of Q&A models
6 versions - Latest release: over 1 year ago - 188 downloads last month - 3 stars on GitHub - 1 maintainer

nlptutti 0.0.2

nlp measurement package
10 versions - Latest release: over 2 years ago - 1 dependent repositories - 964 downloads last month - 62 stars on GitHub - 1 maintainer

tvallogging 1.0.0

Logging for Tonic Validate
4 versions - Latest release: over 1 year ago - 146 downloads last month - 245 stars on GitHub - 1 maintainer

evalify 1.0.0

Evaluate your face or voice verification models literally in seconds.
6 versions - Latest release: 5 months ago - 1 dependent repositories - 214 downloads last month - 19 stars on GitHub - 1 maintainer

nereval 0.2.5

Evaluation script for named entity recognition systems based on F1 score.
3 versions - Latest release: almost 7 years ago - 1 dependent repositories - 250 downloads last month - 70 stars on GitHub - 1 maintainer

hmeasure 0.1.6

H-Measure Classification Metric
7 versions - Latest release: about 4 years ago - 447 downloads last month - 6 stars on GitHub - 1 maintainer

debobo 0.1.6

Package for evaluating object detection models
6 versions - Latest release: almost 6 years ago - 1 dependent repositories - 157 downloads last month - 1 stars on GitHub - 1 maintainer

pate 0.1.1

PATE: Proximity-Aware Time series anomaly Evaluation metric
2 versions - Latest release: 11 months ago - 88 downloads last month - 11 stars on GitHub - 1 maintainer

tonic-validate 6.2.0

RAG evaluation metrics.
26 versions - Latest release: 5 months ago - 2 dependent packages - 1.24 thousand downloads last month - 245 stars on GitHub - 1 maintainer

easy-lm-eval 0.1.2

A library for easy evaluation of language models
3 versions - Latest release: about 1 year ago - 86 downloads last month - 3 stars on GitHub - 1 maintainer

falcon-evaluate 0.1.6

Falcon Evaluate is an open-source Python library designed to simplify the process of evaluating a...
17 versions - Latest release: over 1 year ago - 1 dependent package - 584 downloads last month - 7 stars on GitHub - 1 maintainer

v-stream 0.1.2

STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models
8 versions - Latest release: about 1 year ago - 268 downloads last month - 26 stars on GitHub - 1 maintainer

pytolemaic 0.15.4

Package for ML model analysis
56 versions - Latest release: almost 3 years ago - 1 dependent repositories - 899 downloads last month - 11 stars on GitHub - 1 maintainer

Top 5.6% on pypi.org

prdc 0.2

Compute precision, recall, density, and coverage metrics for two sets of vectors.
1 version - Latest release: about 5 years ago - 2 dependent packages - 17 dependent repositories - 1.26 thousand downloads last month - 254 stars on GitHub - 1 maintainer

colortransferlib 2.0.1

This library provides color and tyle transfer algorithms which were published in scientific paper...
8 versions - Latest release: about 2 months ago - 289 downloads last month - 6 stars on GitHub - 1 maintainer

probability-calibration 0.0.1

Utilities to calibrate model outcome probability and evaluate calibration.
1 version - Latest release: about 4 years ago - 1 dependent repositories - 56 downloads last month - 1 stars on GitHub - 1 maintainer

Top 5.6% on pypi.org

octis 1.14.0

OCTIS: a library for Optimizing and Comparing Topic Models.
35 versions - Latest release: 9 months ago - 2 dependent packages - 3 dependent repositories - 3.24 thousand downloads last month - 690 stars on GitHub - 1 maintainer

Top 4.6% on pypi.org

pyrouge 0.1.3

A Python wrapper for the ROUGE summarization evaluation package.
4 versions - Latest release: over 1 year ago - 339 dependent repositories - 5.03 thousand downloads last month - 248 stars on GitHub - 1 maintainer

quica 0.2.5

Quick Inter Coder Agreement in Python
6 versions - Latest release: over 4 years ago - 1 dependent repositories - 209 downloads last month - 23 stars on GitHub - 1 maintainer

frd-score 0.0.2

Package for calculating Fréchet Radiomics Distance (FRD)
2 versions - Latest release: 10 months ago - 74 downloads last month - 2 stars on GitHub - 1 maintainer

survivaleval 0.4.1

The most comprehensive Python package for evaluating survival analysis models.
9 versions - Latest release: about 1 month ago - 813 downloads last month - 33 stars on GitHub - 1 maintainer

rank-eval 0.1.3

rank_eval: A Blazing Fast Python Library for Ranking Evaluation and Comparison
5 versions - Latest release: over 3 years ago - 1 dependent repositories - 389 downloads last month - 524 stars on GitHub - 1 maintainer

guap 0.1.4

Open-source evaluation metric for linking Machine Learning model outputs with Business outcomes
4 versions - Latest release: almost 4 years ago - 1 dependent repositories - 125 downloads last month - 23 stars on GitHub - 1 maintainer

rankeval 0.8.2

Tool for the analysis and evaluation of Learning to Rank models based on ensembles of regression ...
8 versions - Latest release: over 5 years ago - 1 dependent repositories - 756 downloads last month - 88 stars on GitHub - 1 maintainer

testclayrs 0.1.1

Complexly represent contents, build recommender systems, evaluate them. All in one place!
2 versions - Latest release: almost 3 years ago - 8 stars on GitHub

tutti-nlp 0.0.1

nlp measurement package
1 version - Latest release: over 2 years ago - 14 stars on GitHub

Related Keywords

evaluation 32 evaluation-framework 19 machine-learning 18 python 18 nlp 12 metrics 9 llm 8 llmops 7 rag 6 information-retrieval 6 llm-evaluation 6 natural-language-processing 5 large-language-models 5 classification 5 retrieval-augmented-generation 4 deep-learning 4 scikit-learn 4 generative-model 4 data-science 4 recommender-systems 4 llms 4 llm-evaluation-metrics 4 llm-evaluation-framework 4 recommender-system 3 word-error-rate 3 ai 3 recall 3 regression 3 evaluate 3 pytorch 3 comparison 3 data-fusion 3 information-retrieval-evaluation 3 information-retrieval-metrics 3 metasearch 3 numba 3 rank-fusion 3 ranking-metrics 3 score-fusion 3 object-detection 3 mlops 3 generative-adversarial-network 3 computer-vision 3 precision 3 artificial-intelligence 3 plot 3 evaluation-functions 3 speech-to-text 3 wer 3 mae 2 Unbabel 2 rmse 2 Evaluation 2 activity-recognition 2 activity recognition 2 Machine Translation 2 ner 2 named-entity-recognition 2 xgboost 2 synthetic-data 2 sklearn 2 linguistics 2 lightgbm 2 rl 2 image-segmentation 2 nlp-library 2 benchmarking 2 reproducibility 2 research 2 reinforcement 2 reinforcement-learning 2 machine 2 custom-loss-functions 2 learning 2 google 2 information retrieval 2 trec_eval 2 content-based-recommendation 2 graph-based-recommendation 2 ranking 2 hyperparameter-optimization 2 Kolena 2 machine-translation 2 ML 2 testing 2 evaluate-models 2 diversity 2 coefficient-of-determination 2 classification-report 2 amazon 2 aws 2 COMET 2 cer 2 character-error-rate 2 computing-error-rates 2 korean 2 normalization 2 speech-analysis 2 speech-recognition 2 test 2

pypi.org "evaluation-metrics" keyword

ir-metrics 0.1.6

repsys-framework 0.4.1

deepeval 2.7.5

lighteval 0.8.1

daze 0.1.1

ctc-score 0.1.3

coreference-eval 0.0.2

classeval 0.2.2 💰

jiwer 3.1.0

metric-eval 1.0.2

ir_evaluation 1.1.0

gleu 1.1.0

top-pr 0.2.1

agentops 0.4.6

distfuse 0.1.4

athina 1.7.35

corec 1.1.5

cd-fvd 0.1.1

valor-lite 0.34.3

nf1 0.0.4

boost-loss 0.5.5 💰

testllm 0.14.1

nervaluate 0.2.0 💰

ward-metrics 0.9.5

permetrics 2.0.0

hbb2obb 1.0.0

llmevals 0.1.0

ranx 0.3.20

bokbokbok 0.6.1

regressormetricgraphplot 0.0.3

faster-coco-eval 1.6.5

continuous-eval 0.3.14

fightin-words 1.0.5

fast-bss-eval 0.1.4

subsonar 1.0

pynlpl 1.2.9

rliable 1.2.0

rliable-fork 1.2.0

skflex 1.0.2

gan-evaluator 1.15

synthetic-eval 0.1.4

skloverlay 1.2.0

guardrails-ai-unbabel-comet 2.2.1

unbabel-comet 2.2.5

semalex 1.3.4

cleval 0.1.1

echoswift 1.1.3

tvalmetrics 1.0.2

clayrs 0.5.1

codebleu 0.7.0

aqudem 0.2.0

deepevals 0.2.0

kolena 1.61.0

kolena-client 1.61.0

rke-score 0.0.7

mini-judge 0.4.1

nlptutti 0.0.2

tvallogging 1.0.0

evalify 1.0.0

nereval 0.2.5

hmeasure 0.1.6

debobo 0.1.6

pate 0.1.1

tonic-validate 6.2.0

easy-lm-eval 0.1.2

falcon-evaluate 0.1.6

v-stream 0.1.2

pytolemaic 0.15.4

prdc 0.2

colortransferlib 2.0.1

probability-calibration 0.0.1

octis 1.14.0

pyrouge 0.1.3

quica 0.2.5

frd-score 0.0.2

survivaleval 0.4.1

rank-eval 0.1.3

guap 0.1.4

rankeval 0.8.2