Build evaluation systems that catch regressions before your users do. From simple assertions to LLM-as-judge — evaluate outputs systematically.