Note
- https://addyosmani.com/blog/ai-evals/
- Baseline evaluation: firstly create a test suite and evaluate it to have a baseline score.
- Analyze failures: treat the failure like bug reports, then iterate one by one to improve.
- Propose improvement.
- Re-evaluate.
- Repeat.
- https://addyosmani.com/blog/ai-evals/
Done
- DONE Update tags for 100 test cases => list available + missing tags
/ 2025-07-28
Created Mon, 28 Jul 2025 00:00:00 +0000
Modified Mon, 25 May 2026 06:02:25 +0000