Skip to content
← LibraryCollection

Dev & eval

The surfaces that make AI products debuggable.

Prompts change, models change, latencies wobble, tokens leak. The interfaces that let a team diff a prompt, watch a trace, and run a small eval in-product are the ones that ship confident releases instead of prayerful ones.

4 patterns
  1. Prompt diff
    Two prompts, side by side, same inputs, different outputs.
    11 min
  2. The eval dock
    A live pass/fail grid across a frozen test set.
    12 min
  3. Latency trace
    A waterfall of where the seconds went.
    10 min
  4. Token map
    A picture of which tokens did what, and why.
    11 min
Other collections