hex.pm "statistical-analysis" keyword
crucible_bench 0.4.0
Comprehensive benchmarking framework for AI research. Measures latency, throughput, cost, and rel...6 versions - Latest release: 4 months ago - 370 downloads total - 0 stars on GitHub - 1 maintainer
eval_ex 0.1.5
Model evaluation harness for standardized benchmarking with semantic similarity, exact match, and...6 versions - Latest release: 4 months ago - 311 downloads total - 0 stars on GitHub - 1 maintainer
Related Keywords
beam
2
benchmarking
2
hypothesis-testing
2
llm
2
machine-learning
2
research
2
nshkr-crucible
2
otp
2
ai-research
1
bleu
1
confidence-intervals
1
elixir
1
evaluation
1
f1-score
1
metrics
1
model-comparison
1
north-shore-ai
1
reproducibility
1
rouge
1
testing-framework
1
t-test
1
statistics
1
statistical-testing
1
reliability
1
power-analysis
1
mann-whitney
1
ensemble-methods
1
effect-size
1
anova
1
ai
1