hex.pm "hypothesis-testing" keyword
eval_ex 0.1.5
Model evaluation harness for standardized benchmarking with semantic similarity, exact match, and...6 versions - Latest release: 3 months ago - 311 downloads total - 0 stars on GitHub - 1 maintainer
crucible_bench 0.4.0
Comprehensive benchmarking framework for AI research. Measures latency, throughput, cost, and rel...6 versions - Latest release: 3 months ago - 316 downloads total - 0 stars on GitHub - 1 maintainer
Related Keywords
statistical-analysis
2
research
2
otp
2
nshkr-crucible
2
machine-learning
2
llm
2
benchmarking
2
beam
2
reliability
1
power-analysis
1
statistical-testing
1
statistics
1
mann-whitney
1
ensemble-methods
1
effect-size
1
t-test
1
anova
1
ai
1
testing-framework
1
rouge
1
reproducibility
1
north-shore-ai
1
model-comparison
1
metrics
1
f1-score
1
evaluation
1
elixir
1
confidence-intervals
1
bleu
1
ai-research
1