The thinking indicator

Everything you put on screen between enter and the first token.

8 min

Every AI product has a gap between enter and the first streamed token. Sometimes it's 400ms, sometimes it's 14 seconds. What you put on the screen during that gap is the clearest signal in the whole product about how seriously you've thought about your user.

Dots are a polite lie. They tell the user nothing, but reassure them something is happening. At 400ms, fine. At 4 seconds, the lie starts to hurt. At 14 seconds, it's malpractice.

"A three-dot shimmer is a polite lie. A six-word summary of what the model decided to do is a product."

The pattern

A précis, not a spinner.

The pattern I want is a single line of text, in the model's voice, that names what it is currently doing. Not "thinking." Not "working on it." A specific verb and a specific object. "Reading the PRD, checking for scope drift against stated goals." Six to twelve words, maximum.

The précis does three things at once. It confirms the model understood the question. It sets expectations for the kind of answer. And it earns the wait by showing the wait is being spent on the right thing.

Three thinking indicators, side by side

Run each variant. Notice what you're told, and what you're not.

Press run to see how each thinking indicator lands.

Layered

When to show what.

0 to 400ms: Nothing. The composer collapses. The cursor moves to the response. No indicator is needed at this scale — the user will see the first token before they've finished reading an indicator anyway.
400ms to 2s: A single-line précis in italic, in the model's voice. No animation. Nothing bouncing.
2s to 8s: Keep the précis. Add an elapsed counter, small, monospace, on the right. Add a cancel affordance. Never a retry.
Past 8s: The précis should now update with a second sentence. "Still reading the PRD. This one is longer than most." The update itself is the reassurance.

The why

Honesty is cheap, theater is expensive.

The reason thinking indicators go wrong is that teams treat them as branding surfaces. A bouncing mascot. A custom shimmer animation. A gradient orb that pulses. These are all attempts to turn latency into personality, and they all fail for the same reason: they compete with the answer for the user's attention.

A thinking indicator is a servant, not a performance. Its job is to keep the user oriented while the model works, then get out of the way. Typographic treatment carries everything. Animation, almost nothing.

Failure modes

What this pattern gets wrong when it gets wrong.

Latency lie: The interface pretends speed the backend doesn't have. Spinners that bounce faster than the real throughput.
Phantom tool: A visible tool call that didn't happen, or happened but with different arguments than shown.
Confidence theater: Language or typography that performs certainty beyond what the model actually has.

Seen in the wild

Three shipping variants worth copying.

A single-line précis of the chosen approach
An ambient elapsed counter, shown only past 4s
A cancel affordance that actually cancels