Skip to content

Budget guard

A hard cap the agent cannot silently blow through.

10 min

A budget without teeth is a wish. Every agentic product makes some version of the same promise: "you can set a spending limit." The products that keep the promise are the ones where the agent actually stops. The products that break it are the ones where the agent keeps going with a polite warning and the invoice arrives next month.

Budget guard is the small, boring, correct pattern for bounding a long-running agent's cost. Soft warning early, hard halt at the cap, one-click lift with a logged reason, and a receipt that names what the spend bought.

"The first thing a user should be able to do is turn your product off, in one motion."
The pattern

80, 100, lift, log.

Four states. Below 80 percent of cap, the budget is silent. At 80, the budget enters a soft-warn state visible in the chrome — a color shift, not a modal. At 100, the agent halts. Not "finishes the current step" or "tries to be graceful," a halt. The user gets a lift-cap affordance that requires a brief reason, which writes to the audit log.

The receipt for any cap breach shows what work the overage bought. If the answer is "a better draft of the same thing," the user learns to cap tighter. If the answer is "three subagents exploring in parallel," the user learns that their prompt was ambiguous. The receipt teaches.

A working budget guard
Watch the bar. Hit the cap. Lift with a reason. Read the audit.
Budget guard · one hand on the brake
Spent / cap
$0.000of $0.50
soft-warn @ 80%hard-halt @ 100%
The why

A cap is a contract.

A user who sets a cap is making a prediction: this task should cost less than X. When the agent halts at the cap, the halt is the moment to renegotiate the prediction, not a bug to route around. Pretending the cap is advisory is disrespectful to the user and eventually catastrophic to the bill.

The best budget guards also name the unit. Tokens, minutes, dollars, tool calls — users care about different units at different times and a guard that only offers one unit is one bad conversation away from feeling useless.

Three moves

The guard I'd ship.

  • The cap halts, full stop. No "grace runs." No "finishing the thought." The agent stops and the interface invites the user to continue, lift, or cancel.
  • Lift captures a reason. The user types one sentence to lift the cap. That sentence goes into the audit. Next month when the bill is bigger than expected, the audit explains why.
  • Receipts are itemized, not lump-sum. "Research subagent used 40% of total spend" is useful. "Spent $12" is not. Users cap tighter when they can see what each dollar bought.

The trap

The bleed nobody watches.

The quiet failure mode is a low background spend that never trips the cap but accumulates month over month. Chat-based products with retrieval and long system prompts are particularly prone to this. The guard needs a daily or weekly rollup view to catch bleeds that a per-task cap can't see.

Budgets are not just limits. They are a shape. Show the shape.

Failure modes

What this pattern gets wrong when it gets wrong.

Silent cost bleed
The product spends time, tokens, or money without surfacing the cost at the moment it is incurred.
Runaway agent
An agent that loops, spends, or edits past the user's intent with no visible cap.
Throttle silence
A rate limit, queue, or budget cap that silently slows or stops the product without telling the user why.
Seen in the wild

Three shipping variants worth copying.

  • A burn-down bar with spent, remaining, and ETA to cap
  • A soft-warn at 80% that requires an explicit dismiss
  • A lift-cap affordance that writes an audit line naming who lifted it