Chinh (lelouvincx) / 2025-07-28

Created Mon, 28 Jul 2025 00:00:00 +0000 Modified Mon, 25 May 2026 06:02:25 +0000
50 Words
  • Note

    • https://addyosmani.com/blog/ai-evals/
      • Baseline evaluation: firstly create a test suite and evaluate it to have a baseline score.
      • Analyze failures: treat the failure like bug reports, then iterate one by one to improve.
      • Propose improvement.
      • Re-evaluate.
      • Repeat.
  • Done

    • DONE Update tags for 100 test cases => list available + missing tags