An open API service providing package, version and dependency metadata of many open source software ecosystems and registries.

pypi.org "evaluation" keyword

View the packages on the pypi.org package registry that are tagged with the "evaluation" keyword.

Top 1.8% on pypi.org
motmetrics 1.4.0
Metrics for multiple object tracker benchmarking.
9 versions - Latest release: over 2 years ago - 13 dependent packages - 398 dependent repositories - 168 thousand downloads last month - 1,335 stars on GitHub - 1 maintainer
alpaca-farm 0.2.0
An automatic evaluator for instruction-following language models. Human-validated, high-quality, ...
11 versions - Latest release: over 1 year ago - 181 downloads last month - 1,737 stars on GitHub - 1 maintainer
feedback-forensics 0.4.4
A tool to investigate your pairwise feedback data
14 versions - Latest release: about 3 hours ago - 215 downloads last month - 21 stars on GitHub - 2 maintainers
pate 0.1.1
PATE: Proximity-Aware Time series anomaly Evaluation metric
2 versions - Latest release: about 1 year ago - 26 downloads last month - 14 stars on GitHub - 1 maintainer
picai-eval 1.4.13
PICAI Evaluation
12 versions - Latest release: 7 months ago - 1 dependent package - 386 downloads last month - 22 stars on GitHub - 1 maintainer
easy-lm-eval 0.1.2
A library for easy evaluation of language models
3 versions - Latest release: over 1 year ago - 11 downloads last month - 3 stars on GitHub - 1 maintainer
antgo 0.1.24
machine learning experiment platform
46 versions - Latest release: about 2 years ago - 1 dependent repositories - 73 downloads last month - 319 stars on GitHub - 1 maintainer
uval 0.2.2
This python package is meant to provide a high level interface to facilitate the evaluation of ob...
5 versions - Latest release: 6 months ago - 1 dependent repositories - 54 downloads last month - 1 stars on gitlab.com - 3 maintainers
examinationrag 0.1.4
XRAG: eXamining the Core - Benchmarking Foundational Component Modules in Advanced Retrieval-Augm...
5 versions - Latest release: 6 months ago - 60 downloads last month - 105 stars on GitHub - 1 maintainer
superoptix 0.1.0b5
Full Stack Agentic AI Framework
4 versions - Latest release: about 10 hours ago - 170 downloads last month - 3 stars on GitHub - 1 maintainer
Top 2.3% on pypi.org
pycm 0.9.5 πŸ’°
Multi-class confusion matrix library in Python
47 versions - Latest release: about 7 years ago - 4 dependent packages - 50 dependent repositories - 174 thousand downloads last month - 1,483 stars on GitHub - 3 maintainers
Top 1.8% on pypi.org
simpleeval 1.0.3 πŸ’°
A simple, safe single expression evaluator library.
22 versions - Latest release: 9 months ago - 59 dependent packages - 290 dependent repositories - 4.99 million downloads last month - 447 stars on GitHub - 1 maintainer
multivar-horner 3.1.0 πŸ’°
python package implementing a multivariate Horner scheme for efficiently evaluating multivariate ...
14 versions - Latest release: over 2 years ago - 1 dependent repositories - 60 downloads last month - 29 stars on GitHub - 1 maintainer
oumi 0.2.1
Oumi - Modeling Platform
15 versions - Latest release: 20 days ago - 1.2 thousand downloads last month - 8,337 stars on GitHub - 1 maintainer
Top 7.9% on pypi.org
jury 2.3.1
Evaluation toolkit for neural language generation.
23 versions - Latest release: about 1 year ago - 1 dependent package - 2 dependent repositories - 1.66 thousand downloads last month - 183 stars on GitHub - 1 maintainer
opencompass 0.4.2
A comprehensive toolkit for large model evaluation
26 versions - Latest release: 4 months ago - 2.12 thousand downloads last month - 3,677 stars on GitHub - 1 maintainer
sacrebleu-macrof 2.0.1
Hassle-free computation of shareable, comparable, and reproducible BLEU, chrF, and TER scores
1 version - Latest release: almost 4 years ago - 1 dependent package - 1 dependent repositories - 18 downloads last month - 907 stars on GitHub - 1 maintainer
bob.paper.tifs2018-dsu 1.0.2
Tools to reproduce the paper "Heterogeneous Face Recognition Using Domain Specific Units"
2 versions - Latest release: over 6 years ago - 3 downloads last month - 1 maintainer
htmleval 0.1.0
A Python package that facilitates the creation of HTML evaluations.
1 version - Latest release: 3 months ago - 15 downloads last month - 0 stars on GitHub - 1 maintainer
corec 1.1.5
A Context-Aware Recommendation Framework for Python
15 versions - Latest release: 4 months ago - 121 downloads last month - 546 stars on GitHub - 1 maintainer
Top 3.5% on pypi.org
evo 1.31.1
Python package for the evaluation of odometry and SLAM
110 versions - Latest release: 4 months ago - 18 dependent repositories - 72.1 thousand downloads last month - 3,023 stars on GitHub - 1 maintainer
Top 1.2% on pypi.org
evaluate 0.4.5
HuggingFace community-driven open-source library of evaluation
18 versions - Latest release: 21 days ago - 222 dependent packages - 2,474 dependent repositories - 2.93 million downloads last month - 1,762 stars on GitHub - 3 maintainers
zpyshell 0.1.2.0
Command line shell with script languages, like python
2 versions - Latest release: about 8 years ago - 1 dependent repositories - 3 downloads last month - 7 stars on GitHub - 1 maintainer
bt4vt 1.0.1
Bias Tests for Voice Technologies
2 versions - Latest release: over 2 years ago - 26 downloads last month - 12 stars on GitHub - 2 maintainers
dyff-audit 0.11.1
Audit tools for the Dyff AI auditing platform.
46 versions - Latest release: about 22 hours ago - 1 dependent package - 387 downloads last month - 0 stars on GitLab.com - 5 maintainers
inspire 1.0.9
Helper library to participate in the INSPIRE challenge
12 versions - Latest release: over 10 years ago - 4 dependent repositories - 33 downloads last month - 2 stars on GitHub - 2 maintainers
multimedeval 1.0.0
A Python tool to evaluate the performance of VLM on the medical domain.
3 versions - Latest release: 8 days ago - 111 downloads last month - 76 stars on GitHub - 1 maintainer
lighthousellmm 0.3.14
Client library to connect to the lighthousellmm LLM Tracing and Evaluation Platform.
2 versions - Latest release: 5 months ago - 10 downloads last month - 1 maintainer
fico-itr 1.0.0
Fine-grained and Coarse-grained Image-Text Retrieval Evaluation
2 versions - Latest release: 2 months ago - 16 downloads last month - 2 stars on GitHub - 1 maintainer
panoptica 1.5.0
Panoptic Quality (PQ) computation for binary masks.
75 versions - Latest release: 1 day ago - 1 dependent repositories - 1.77 thousand downloads last month - 26 stars on GitHub - 2 maintainers
promptfoo 0.1.0 πŸ’°
LLM evals and red teaming
1 version - Latest release: 12 months ago - 2.03 thousand downloads last month - 7,779 stars on GitHub - 2 maintainers
langwatch 0.2.9
LangWatch Python SDK, for monitoring your LLMs
92 versions - Latest release: 13 days ago - 113 thousand downloads last month - 2,255 stars on GitHub - 1 maintainer
evargs 1.0.0
"EvArgs" is a Python module designed for value assignment, easy expression parsing, and type cast...
7 versions - Latest release: 3 months ago - 82 downloads last month - 1 stars on GitHub - 1 maintainer
docling-eval 0.7.0
Evaluation of Docling
4 versions - Latest release: 1 day ago - 189 downloads last month - 25 stars on GitHub
costra 1.1
Tool for automatic evaluation of Czech sentence embeddings using Costra 1.1 dataset
2 versions - Latest release: 8 months ago - 1 dependent repositories - 25 downloads last month - 0 stars on GitHub - 1 maintainer
lmms_eval 0.4.0
A framework for evaluating large multi-modality language models
18 versions - Latest release: 1 day ago - 5.03 thousand downloads last month - 2,793 stars on GitHub - 1 maintainer
naeval 0.2.0
Comparing quality and performance of NLP systems for Russian language
1 version - Latest release: over 5 years ago - 2 dependent repositories - 28 downloads last month - 49 stars on GitHub - 1 maintainer
conff 0.5.0
Simple config parser with evaluator library.
4 versions - Latest release: about 7 years ago - 1 dependent repositories - 9 downloads last month - 21 stars on GitHub - 1 maintainer
Top 8.6% on pypi.org
evalidate 2.0.5
Validation and secure evaluation of untrusted python expressions
28 versions - Latest release: 3 months ago - 3 dependent packages - 6 dependent repositories - 65.7 thousand downloads last month - 31 stars on GitHub - 1 maintainer
xturing 0.1.8
Fine-tuning, evaluation and data generation for LLMs
19 versions - Latest release: almost 2 years ago - 107 downloads last month - 1 maintainer
compare-qrels 0.0.3
Qualitatively compare the qrels results of two IR systems.
1 version - Latest release: almost 4 years ago - 1 dependent repositories - 12 downloads last month - 0 stars on GitHub - 1 maintainer
novaeval 0.4.0
A comprehensive, open-source LLM evaluation framework for testing and benchmarking AI models
7 versions - Latest release: 9 days ago - 614 downloads last month - 12 stars on GitHub - 1 maintainer
Top 1.1% on pypi.org
configspace 1.2.1 πŸ’°
Creation and manipulation of parameter configuration spaces for automated algorithm configuration...
52 versions - Latest release: 8 months ago - 31 dependent packages - 56 dependent repositories - 71.3 thousand downloads last month - 212 stars on GitHub - 2 maintainers
ragmetrics-client 0.2.1
Monitor your LLM calls. Test your LLM app.
18 versions - Latest release: about 1 month ago - 146 downloads last month - 1 stars on GitHub - 1 maintainer
take-ai-evaluation 0.2.3
Metrics and visualizations for evaluating chatbot's AI utilization.
5 versions - Latest release: about 4 years ago - 1 dependent repositories - 20 downloads last month - 1 maintainer
evaluation-framework 1.3
Evaluation Framework for testing and comparing graph embedding techniques
2 versions - Latest release: over 5 years ago - 1 dependent repositories - 12 downloads last month - 10 stars on GitHub - 1 maintainer
prediction-evaluation 0.0.1
evaluation of prediction of binary, multiclass and regression
2 versions - Latest release: over 6 years ago - 1 dependent repositories - 3 downloads last month - 1 maintainer
seqscore 0.6.0
SeqScore: Scoring for named entity recognition and other sequence labeling tasks
9 versions - Latest release: 7 months ago - 1 dependent repositories - 99 downloads last month - 23 stars on GitHub - 1 maintainer
leeroo-client 0.0.2
A client library for Leeroo Workflow Management
2 versions - Latest release: 12 months ago - 14 downloads last month - 1 stars on GitHub - 1 maintainer
Top 6.7% on pypi.org
agenta 0.50.3
The SDK for agenta is an open-source LLMOps platform.
321 versions - Latest release: 2 days ago - 2 dependent repositories - 5.8 thousand downloads last month - 575 stars on GitHub - 1 maintainer
calldict 0.13
Protocol to markup and evaluate functions in data structures
11 versions - Latest release: 12 months ago - 1 dependent repositories - 80 downloads last month - 2 stars on GitHub - 1 maintainer
helicone_helpers 1.0.5
A Python wrapper for some of Helicone's common functionalities
6 versions - Latest release: 7 days ago - 1.32 thousand downloads last month - 4,248 stars on GitHub - 2 maintainers
Top 5.7% on pypi.org
helicone 1.0.14
A Python wrapper for the OpenAI API that logs all requests to Helicone.
17 versions - Latest release: over 1 year ago - 2 dependent packages - 2 dependent repositories - 1.22 thousand downloads last month - 4,248 stars on GitHub - 1 maintainer
helicone-async 1.0.6
A Python wrapper for logging llm traces directly to Helicone, by passing the proxy, with OpenLLMe...
7 versions - Latest release: 5 months ago - 4.96 thousand downloads last month - 4,248 stars on GitHub - 1 maintainer
quotientai 0.4.6
Python library for tracing, logging, and detecting problems with AI Agents
40 versions - Latest release: 2 days ago - 787 downloads last month - 1 maintainer
v-stream 0.1.2
STREAM: Spatio-TempoRal Evaluation and Analysis Metric for Video Generative Models
8 versions - Latest release: over 1 year ago - 34 downloads last month - 26 stars on GitHub - 1 maintainer
Top 5.6% on pypi.org
prdc 0.2
Compute precision, recall, density, and coverage metrics for two sets of vectors.
1 version - Latest release: over 5 years ago - 2 dependent packages - 17 dependent repositories - 975 downloads last month - 256 stars on GitHub - 1 maintainer
Top 8.9% on pypi.org
ragas 0.3.0
69 versions - Latest release: 14 days ago - 25 dependent packages - 1 dependent repositories - 534 thousand downloads last month - 1 maintainer
arkas 0.0.1a15
Library to evaluate ML model performances
16 versions - Latest release: 7 months ago - 20.1 thousand downloads last month - 0 stars on GitHub - 1 maintainer
trajectopy 3.1.0
Trajectory Evaluation in Python
83 versions - Latest release: 2 days ago - 298 downloads last month - 82 stars on GitHub - 1 maintainer
python-lilypad 0.0.34
An open-source prompt engineering framework.
37 versions - Latest release: 4 months ago - 282 downloads last month - 171 stars on GitHub - 1 maintainer
evalscope 0.17.1
EvalScope: Lightweight LLMs Evaluation Framework
35 versions - Latest release: 11 days ago - 9.55 thousand downloads last month - 160 stars on GitHub - 1 maintainer
math-verify 0.8.0
HuggingFace library for verifying mathematical answers
17 versions - Latest release: 29 days ago - 347 thousand downloads last month - 859 stars on GitHub - 1 maintainer
hbb2obb 1.0.0
Toolkit for converting horizontal bounding boxes to oriented bounding boxes using segmentation mo...
1 version - Latest release: 4 months ago - 19 downloads last month - 1 stars on GitHub - 1 maintainer
chevy 0.2.2
Chess position Evaluation via hand-crafted features
5 versions - Latest release: about 3 years ago - 1 dependent repositories - 18 downloads last month - 1 stars on GitHub - 1 maintainer
evalne-gui 0.1.0
Plotly Dash based GUI for EvalNE
1 version - Latest release: about 3 years ago - 8 downloads last month - 3 stars on GitHub - 1 maintainer
fiddler-auditor 0.0.5
Auditing large language models made easy.
12 versions - Latest release: over 1 year ago - 1 dependent repositories - 719 downloads last month - 183 stars on GitHub - 1 maintainer
boridge 0.1.10
A library of functions for selecting features using bootstrapped ridge regression
10 versions - Latest release: over 5 years ago - 1 dependent repositories - 39 downloads last month - 1 stars on GitHub - 1 maintainer
Top 3.1% on pypi.org
langsmith 0.4.8
Client library to connect to the LangSmith LLM Tracing and Evaluation Platform.
363 versions - Latest release: 13 days ago - 86 dependent packages - 2,234 dependent repositories - 108 million downloads last month - 605 stars on GitHub - 4 maintainers
zeroeval 0.5.0
ZeroEval SDK
27 versions - Latest release: 3 days ago - 781 downloads last month - 1 maintainer
gaico 0.2.0
GenAI Results Comparator, GAICo, is a Python library to help compare, analyze and visualize outpu...
6 versions - Latest release: 17 days ago - 120 downloads last month - 4 stars on GitHub - 2 maintainers
dyff-client 0.18.0
Python client for the Dyff AI auditing platform.
32 versions - Latest release: 3 days ago - 2 dependent packages - 686 downloads last month - 0 stars on GitLab.com - 5 maintainers
Top 6.5% on pypi.org
moverscore 1.0.3
MoverScore: Evaluating text generation with contextualized embeddings and earth mover distance
2 versions - Latest release: over 5 years ago - 10 dependent repositories - 851 downloads last month - 206 stars on GitHub - 1 maintainer
sensirion-uart-svm4x 2.0.3
SHDLC driver for the Sensirion SVM4X sensor family
1 version - Latest release: over 1 year ago - 8 downloads last month - 0 stars on GitHub - 1 maintainer
debobo 0.1.6
Package for evaluating object detection models
6 versions - Latest release: about 6 years ago - 1 dependent repositories - 23 downloads last month - 1 stars on GitHub - 1 maintainer
evalica 0.3.2 πŸ’°
Evalica, your favourite evaluation toolkit.
16 versions - Latest release: 8 months ago - 6.24 thousand downloads last month - 38 stars on GitHub - 1 maintainer
agenttest 0.1.0
A pytest-like testing framework for AI agents and prompts
1 version - Latest release: about 1 month ago - 50 downloads last month - 0 stars on GitHub - 1 maintainer
nubia-score 0.1.5
NUBIA (NeUral Based Interchangeability Assessor) is a SoTA evaluation metric for text generation
6 versions - Latest release: over 4 years ago - 1 dependent repositories - 53 downloads last month - 51 stars on GitHub - 1 maintainer
fdatasets 1.12.1
HuggingFace/Datasets is an open library of NLP datasets.
1 version - Latest release: over 3 years ago - 20,388 stars on GitHub
tsml-eval 0.6.0
A package for benchmarking time series machine learning tools.
10 versions - Latest release: 3 months ago - 59 downloads last month - 52 stars on GitHub - 1 maintainer
live-bench 0.0.0.dev0
Live Bench
1 version - Latest release: about 1 year ago - 14 downloads last month - 2,793 stars on GitHub - 1 maintainer
xfinder 0.2.6
An Robust and Pinpoint Answer Extractor for LLM Evaluation
9 versions - Latest release: 7 months ago - 44 downloads last month - 175 stars on GitHub - 1 maintainer
modelsafety 0.1.2
ModelSafety SDK
2 versions - Latest release: 3 months ago - 14 downloads last month - 0 stars on GitHub - 1 maintainer
weavel 1.11.0
Weavel, automated prompt engineering and observability for LLM applications
35 versions - Latest release: 10 months ago - 111 downloads last month - 4 stars on GitHub - 2 maintainers
mobile-env 2.0.3
mobile-env: An Open Environment for Autonomous Coordination in Wireless Mobile Networks
15 versions - Latest release: 9 months ago - 1 dependent repositories - 68 downloads last month - 129 stars on GitHub - 1 maintainer
Top 8.9% on pypi.org
codebleu 0.7.0
Unofficial CodeBLEU implementation that supports Linux, MacOS and Windows available on PyPI.
14 versions - Latest release: about 1 year ago - 3 dependent repositories - 5.49 thousand downloads last month - 95 stars on GitHub - 1 maintainer
daze 0.1.1
Better multi-class confusion matrix plots for Scikit-Learn, incorporating per-class and overall e...
3 versions - Latest release: over 4 years ago - 1 dependent repositories - 31 downloads last month - 5 stars on GitHub - 1 maintainer
bob.bio.vein 5.0.1
Vein Recognition Library
17 versions - Latest release: about 1 year ago - 1 dependent package - 47 downloads last month - 1 maintainer
syntherela 0.0.4
SyntheRela - Synthetic Relational Data Generation Benchmark
4 versions - Latest release: 5 months ago - 26 downloads last month - 57 stars on GitHub - 2 maintainers
tno.sdg.tabular.eval.utility-metrics 0.4.1
Utility metrics for tabular data
3 versions - Latest release: 8 months ago - 42 downloads last month - 3 stars on GitHub - 1 maintainer
Top 7.7% on pypi.org
insight 1.0
A python library for monitoring, comparing and extracting insights from data.
23 versions - Latest release: over 1 year ago - 5 dependent repositories - 265 downloads last month - 11 stars on GitHub - 1 maintainer
Top 9.1% on pypi.org
synthesized-insight 0.7
Synthesized data insights and evaluation framework.
6 versions - Latest release: over 3 years ago - 1 dependent package - 1 dependent repositories - 179 downloads last month - 11 stars on GitHub - 1 maintainer
bob.ip.dlib 1.0.9
Bob interface for dlib functions
9 versions - Latest release: almost 5 years ago - 18 downloads last month - 1 maintainer
Top 3.6% on pypi.org
coconut 3.1.2 πŸ’°
Simple, elegant, Pythonic functional programming.
43 versions - Latest release: 11 months ago - 3 dependent packages - 22 dependent repositories - 1.34 thousand downloads last month - 4,213 stars on GitHub - 1 maintainer
bocoel 0.1.4
Bayesian Optimization as a Coverage Tool for Evaluating Large Language Models
11 versions - Latest release: 8 months ago - 89 downloads last month - 285 stars on GitHub - 1 maintainer
bob.bio.face 8.0.1
Tools for running face recognition experiments
25 versions - Latest release: about 1 year ago - 3 dependent packages - 2 dependent repositories - 39 downloads last month - 10 maintainers
frd-score 0.0.2
Package for calculating FrΓ©chet Radiomics Distance (FRD)
2 versions - Latest release: about 1 year ago - 17 downloads last month - 10 stars on GitHub - 1 maintainer
mandoline 0.4.0
Official Python client for the Mandoline API
7 versions - Latest release: 5 days ago - 102 downloads last month - 1 stars on GitHub - 1 maintainer
topic999 1.0.1.dev1
Topic Model User Evaluation
1 version - Latest release: about 8 years ago - 1 dependent repositories - 6 downloads last month - 1 maintainer
pyieoe 0.1.1
pyIEOE: a Python package to facilitate interpretable OPE evaluation
2 versions - Latest release: almost 4 years ago - 1 dependent package - 4 dependent repositories - 979 downloads last month - 32 stars on GitHub - 1 maintainer