Hedge calibration
Saying 'probably' in a way the reader can actually parse.
'Likely' means 70%. 'Plausibly' means 55%. 'Almost certainly' means above 95%. Every product has a hedge vocabulary; almost no product publishes it. The result is a language that looks calibrated but is, in practice, poetic. A user reading 'likely' has to guess whether it's a coin flip or near-certain.
The fix is small and publishable. Attach each hedge word to a visible band. A dot notation, three-to-four levels of fill, a one-line legend. The vocabulary becomes a notation. Users learn it in twenty seconds and carry it forever.
"Hedge words are a notation system. Publish the key. The rest is usable typography."
Dot marks. Three to four bands. Pinned legend.
Four bands cover most hedge space: high (almost certainly), likely, possible, uncertain. Each hedge word in the answer carries a small dot cluster next to it, filled to match its band. A pinned legend maps each pattern to a probability range. Hover any hedge; the matching band lights up in the legend.
Retention almost certainly improved in the new cohort, and likely by more than 2 points. The driver was most plausibly the onboarding redesign, though it's possible the pricing change contributed. It is unclear whether the effect persists past 30 days.
Hover any hedge word; the matching band lights up in the key. A system every reader can learn in 20 seconds.
Hedging is language. Language needs a key.
When 'likely' means different things in different sentences, the user stops parsing hedges. Eventually they ignore them, which defeats the whole point. A shared key turns hedges into measurable signals. Suddenly the model is calibrated, in a way the user can actually verify.
Hedges that mean something.
- Pick four bands, not seven. More bands sound precise; they're harder to remember. Four fits the working memory of every reader.
- Use typography, not color. Dots scale better than background tints, especially in long answers.
- Pin the key. The legend shouldn't live in a help doc. Show it under the response, once, so readers can learn it inline.
Hedges that don't match the math.
The worst failure mode is a calibration key that doesn't match the model's behavior. The model says 'likely' in sentences that are actually 95% certain and other sentences that are actually 60%. The key becomes a promise you can't keep. Users feel the mismatch over time, then distrust every hedge.
What this pattern gets wrong when it gets wrong.
- Confidence theater
- Language or typography that performs certainty beyond what the model actually has.
- Empty disclaimer
- A legal-feeling warning that carries no specific information about this particular response.
- Unverified claim
- A figure or fact presented without provenance, in a place where the reader will treat it as cited.
Three shipping variants worth copying.
- A 3-dot confidence mark next to every hedge word
- A pinned legend explaining what each mark means
- Hedges that can't be calibrated are explicitly omitted