The reasoning trace

Showing how the model got there, without overwhelming.

12 min

Reasoning traces are suddenly everywhere. Models now routinely produce thousands of tokens of thought before they speak. Most products dump that thought into a collapsible 'thinking' panel and call it a day. It's almost always too much, in the wrong shape, unreadable past the first few steps.

A useful trace is a curated path. Five to seven titled steps, each collapsible, each scored for confidence. The user can audit the answer the same way they'd audit a colleague's work: read until you trust it, then skip to the conclusion.

"Reasoning is worth showing only if it's readable. A dumped chain-of-thought is noise pretending to be trust."

The pattern

Titled steps. Collapsible. Scored.

Each step has a title the user can read in one breath. The body is optional: a paragraph that explains what the step did, why. A dot next to each step shows confidence. Discarded branches live as faded siblings, collapsible, labeled. The reader can expand all, collapse all, or hover a dot to see confidence color.

A reasoning trace, curated

Expand / collapse / reveal discarded

How I got here

Identify the core claim

The user asked whether the enterprise tier drove the Q2 margin shift. That's a two-variable question: segment contribution and margin delta.

Pull segment P&L

Fetched revenue_by_segment for Q1 and Q2. Enterprise grew from 41% to 47% of gross revenue.

Sanity-check against invoices

Compute margin delta

Gross margin moved from 62.4% to 61.1%, a 1.3pt drop. The enterprise tier's margin was flat at 58%.

Reconcile: did enterprise drive the drop?

Enterprise grew as a share of revenue. Its margin is below blended. So mix alone pushes blended margin down. I estimate mix contributes roughly 0.9pt of the 1.3pt drop.

Answer

Yes, but partially. Roughly 70% of the margin compression is attributable to enterprise mix; the remaining 30% is from mid-market discounting.

High confidenceMediumUncertain / discarded

A staircase, not a dump. Each step is collapsible. Discarded branches are visible on demand.

The why

Raw thinking is harder to trust than the answer.

There's a clever counterintuition at work. Users assume a visible chain of thought will make them trust the model more. Research keeps finding the opposite: raw thinking, uncurated, tends to erode confidence because it's hard to parse and full of dead ends. A curated trace inverts the effect. Five titled steps feel like craftsmanship; three thousand tokens feel like mess.

Three moves

Traces that build trust instead of confusion.

Title every step. A step with no title is a step the user skips. The title is the contract.
Score confidence. One dot, three states: high, medium, uncertain. Anything more precise is theater.
Show discarded branches. The most-underused honesty move in AI UX. Users trust a model that visibly considered and rejected an option.

The trap

Traces that invent their own story.

The traces in many products are post-hoc. The model generates the answer, then generates a plausible-looking rationalization, then shows the rationalization as the trace. This is worse than no trace. It teaches the user a story that may or may not match what actually happened.

Failure modes

What this pattern gets wrong when it gets wrong.

Confidence theater: Language or typography that performs certainty beyond what the model actually has.
Unverified claim: A figure or fact presented without provenance, in a place where the reader will treat it as cited.
Citation overload: So many citations that the user stops reading them, which defeats the purpose of having them at all.

Seen in the wild

Three shipping variants worth copying.

A collapsible 'how I got here' panel with titled steps
Each step has a confidence notch and a jump-to-source link
Discarded branches shown as faded siblings, expandable