Token map
A picture of which tokens did what, and why.
A token map is a photograph of a prompt. It takes the invisible, shape-shifting, surprisingly expensive thing you send to the model and renders it as a highlighted inventory: system prompt in one color, retrieval context in another, user turn in a third, the response in a fourth. The moment a team has a token map on the wall, prompts get shorter and retrieval tightens on its own.
Most teams ship blind to their token spend. They know the monthly bill. They don't know which tokens are buying them results and which are buying them nothing. The map fixes that.
"You can't optimize a prompt you haven't counted."
Color the prompt, count the color.
The token map splits a prompt into semantic regions and counts each region's tokens. System instructions, tool definitions, few-shot examples, retrieved chunks, chat history, the user's actual turn. Each region gets a color. The sidebar shows "system: 420 tokens, retrieval: 2,800 tokens, user turn: 38 tokens," and the total.
The map is static. That is the whole point. The team reads it, understands it, argues about it, and then edits the prompt. It is not a live dashboard. It is a calm, legible diagram of where the words are going.
- System prompt38 tokens
You are an assistant for a billing platform. Keep answers concise. Cite the internal doc when making numeric claims. Refuse any request to share PII.
- Retrieved context1140 tokens
[doc:pricing.md · 380 tokens] · [doc:billing-faq.md · 520 tokens] · [doc:legal-notices.md · 240 tokens]
- User turn18 tokens
What's my current pricing for enterprise? Mention anything that changed this quarter.
- Model response68 tokens
Enterprise pricing is a per-seat minimum of $72/month, with the audit-log bundle included at tiers above 500 seats. Two changes this quarter: the pricing floor moved from $68 → $72, and onboarding services are no longer bundled below 250 seats. (Source: pricing.md · line 41; billing-faq.md · line 19)
Drop a source to see the counterfactual. Token count is exact, not estimated.
Cost and quality are the same axis.
Long prompts cost more, latency more, and often produce worse outputs because the model drowns in context. Short, well-structured prompts cost less, stream faster, and produce tighter answers. The token map is the tool that turns this trade-off from a hand-wave into a number.
Retrieval is often the biggest offender. Teams using RAG often discover that their top-k is too generous, their chunks are too long, or their re-ranker isn't earning its keep. A map makes these costs visible in a way that the bill never does.
The map I'd ship.
- Semantic regions, not just turns. "System prompt, retrieval, user turn" is what the team needs to argue about. A region labeled "message 1" is useless.
- Budget bars per region. Show the token count and the share of the total. "Retrieval is 62% of your prompt" is a sentence that changes minds.
- Diff across map versions. A prompt change should show its token delta. "This edit saved 180 tokens per call at no quality cost" is the kind of result that ships.
Optimizing tokens over quality.
The predictable failure mode is a team that cuts tokens too aggressively and ships a cheaper, worse product. The map shows what's expensive; it doesn't say what's load-bearing. Pair the map with the eval dock. A token cut that passes eval is a real improvement. A token cut that doesn't is a regression dressed up as a savings.
Cheap and good are not opposites, but they are not the same axis either.
What this pattern gets wrong when it gets wrong.
- Citation overload
- So many citations that the user stops reading them, which defeats the purpose of having them at all.
- Silent cost bleed
- The product spends time, tokens, or money without surfacing the cost at the moment it is incurred.
Three shipping variants worth copying.
- A color-stripe per paragraph that names the token source
- A hover count that shows exact tokens per segment
- A 'what if I dropped retrieval' ghost view with the counterfactual total