Skip to content

The tool call, visible

When and how to show the model reaching for a tool.

10 min

The tool call is where the model reaches out and touches something real. A search index, a web page, a file on disk, a billing API, a Slack channel. This is also the point where most AI products become instantly less trustworthy, because the interface decides — usually by accident — how much of the tool call to show.

Two common failures: hide everything, or dump everything. A product that hides tool calls feels magical until it fails silently. A product that shows a raw JSON log feels transparent until the user glazes over and stops reading. Neither is a design.

"You don't read your bank statement. You glance at it. Design tool calls the same way."
The pattern

A receipt, not a log.

The right shape is a two-line receipt. A tool name, a status dot, a one-sentence summary of what happened. The user can glance at it in under a second. If they want the detail, they click to expand — at which point they get the arguments, the truncated result, and a rerun affordance.

Receipts are designed to be ignored. That's the point. The user should be able to read the model's response as prose, with receipts as monospace breadcrumbs in between, and their brain should be able to skip the breadcrumbs unless something looks wrong.

Three tool calls in a response · click to expand
Green, green, red. The red doesn't hide.
Response transcript

Pulling the Q2 churn cohort, checking live billing data, and attempting to post a summary to the team channel. The Slack post failed; details below.

Two greens and a red. The red doesn't hide. The interface doesn't pretend the post went through.

Four rules

How to show a tool call.

  • One line, one sentence, one glance. The collapsed receipt should be readable at speed. Tool name in monospace, summary in the display font, status dot on the left.
  • Status as color, not text. Green for success, red for error, brass for in-progress. The word "error" only appears on hover or expand. Color plus dot carries it.
  • Expand reveals, never interrupts. Expanding a receipt never scrolls the answer away. It opens inline, the rest of the message reflows gently. A modal is always wrong here.
  • Failures ship with a next step. A red receipt should tell the user what to do. Missing scope? "Reauthorize Slack." Rate-limited? "Retry in 34s." The error is the first surface where the agent asks for help.

The why

Legibility is leverage.

Making tool calls visible but skimmable is the move that lets you ship longer agentic flows. Each receipt takes the user less than a second to process. A plan with twelve steps becomes readable. Hide them, and twelve invisible tool calls feel like the model is off-leash. Dump them, and twelve JSON blocks feel like a server log, not a product.

The middle path — the receipt — is a surface primitive. Once the team builds it, every future tool gets the same treatment. Users learn the shape, and transfer trust across tools.

Failure modes

What this pattern gets wrong when it gets wrong.

Ghost citation
A source is shown but doesn't actually back the claim, or links to a page that doesn't contain the quoted text.
Phantom tool
A visible tool call that didn't happen, or happened but with different arguments than shown.
Silent truncate
The response ran out of room or tokens and the product didn't tell the user where it stopped.
Seen in the wild

Three shipping variants worth copying.

  • A single-row 'receipt' per call with status glyph
  • Click to reveal arguments and truncated result
  • Red tick for failed calls, not a toast