Skip to content

The rate-limit display

Making invisible quotas visible without anxiety.

9 min

Rate limits are the quiet tax of every AI product. They exist for good reasons (models cost money, infrastructure is finite, abuse is real), but they're hidden until they trip. The user is halfway through an agent run when the product stops and says 'please try again in a few minutes'. There's no context, no budget bar, no sense of what's left.

The fix is a live budget display: three bars, one per axis (tokens, wall-clock, tool credits), updated as the agent works. The user sees what's left. The panic becomes planning. The budget becomes part of the product, not a surprise.

"You can't ration what you can't see. Every invisible limit is an excuse for bad feedback loops."
The pattern

Three axes, always visible.

Token budget. Wall-clock budget. Tool-credit budget. One bar each, one label each. The bars live in the agent panel, not in a settings page. They update in real time. When any axis drops below 10%, its bar turns brass. Never red. Red is for catastrophic failures; brass is for 'plan ahead'.

Live budgets while the agent runs
Run the agent; watch the bars deplete
Live budgets
Tokens182,000 / 500,000 tok
healthy
Wall time42 / 180 s
healthy
Tool credits12 / 25 calls
healthy

Three axes, shown at once. The bar turns brass before red — a warning, not a panic.

The why

Invisible caps are the cruelest UX.

A cap the user can't see is a cap they spend time testing. Every retry probes the invisible wall. That probing is expensive, both in literal cost and in user trust. A visible cap reverses the polarity: the user decides how to spend the budget they have. The agent becomes a collaborator the user can steer.

Three moves

Budget UI that doesn't cause anxiety.

  • Three axes, not one. Tokens, time, and tool calls fail independently. A single bar always lies about one of them.
  • Brass before red. Amber is the warning color. Red is reserved for real stops. A constant red bar becomes wallpaper.
  • 'Why I stopped' card. When a cap trips, show which axis hit zero and what wasn't done because of it. The user should never have to guess.

The trap

Budgets that lie about their shape.

The worst failure mode is a budget bar that moves nonlinearly: the first half takes twenty seconds and the second half takes four. Or a bar that 'resets' silently. Both teach the user the display is decorative. Once that lesson lands, the display is worse than no display at all.

Failure modes

What this pattern gets wrong when it gets wrong.

Throttle silence
A rate limit, queue, or budget cap that silently slows or stops the product without telling the user why.
Ambiguous state
Running, done, errored, paused all look the same. The user has to infer from context.
Latency lie
The interface pretends speed the backend doesn't have. Spinners that bounce faster than the real throughput.
Seen in the wild

Three shipping variants worth copying.

  • A tri-axis budget bar: tokens, wall-clock, tool credits
  • The bar turns brass at 10% remaining, never red
  • A 'why I stopped' card when any axis hits zero