The step receipt

An after-the-fact trail of what the agent actually did.

8 min

After the agent finishes, the user has questions. What did it read? What did it query? What took so long? The naive answer is to replay the whole transcript every log line, every intermediate token which is roughly as useful as handing someone the full stack trace when they ask what time it is. The step receipt is a small, structured audit after the fact.

"After the work: receipts, not transcripts. Collapsed by default, expandable on demand."

The pattern

One line per step, expandable for the detail.

The receipt is an ordered list of steps the agent took, each a single line with verb, target, and duration. Steps collapse by default. Clicking expands a short detail list for that step what was read, what was matched, what was decided. Total duration sits in the header.

Receipt · collapsed by default

Click a step to expand its detail

Receipt · 4 steps · 7.5s total

- Ran event-funnel query over April 1–14
- Returned 8,421 rows
- Filtered to retention cohort

After the work: receipts, not transcripts. Collapsed by default, expandable on demand.

The why

Audit is different from observability.

Observability is for operators debugging failures. Audit is for users deciding whether to trust the output. The two information needs are radically different operators want everything, users want the right summary. Mixing them gives operators a useless tool and users an intimidating one.

Three moves

Receipt hygiene.

Verbs first. Read, Query, Compare, Write. A short list of action verbs is faster to skim than a paragraph of explanation.
Durations in line. Each step shows how long it took. Long steps become visible for follow-up questions.
Expandable detail. Three to five bullets per step, max. The reader can drill in when they need to, ignore otherwise.

The trap

Receipts that are actually logs.

When the receipt balloons to fifteen lines per step, it stops being a summary. Users stop reading. The tool becomes an icon that's always ignored, and then a tool that's removed. A great receipt knows what to leave out.

Failure modes

What this pattern gets wrong when it gets wrong.

Ghost citation: A source is shown but doesn't actually back the claim, or links to a page that doesn't contain the quoted text.
Silent truncate: The response ran out of room or tokens and the product didn't tell the user where it stopped.
Confidence theater: Language or typography that performs certainty beyond what the model actually has.

Seen in the wild

Three shipping variants worth copying.

A ledger grouped by verb, with counts and costs
A link back to each tool call's receipt in place
A 'save this run' affordance that archives it for later