fair-evaluation-baseline
by HomericIntelligencev1.0.0
Implement baseline pipeline capture and regression detection to distinguish agent-introduced failures from pre-existing issues in E2E evaluations
evaluation GitHub
by HomericIntelligencev1.0.0
Implement baseline pipeline capture and regression detection to distinguish agent-introduced failures from pre-existing issues in E2E evaluations