Mario Alexandre  ·  March 26, 2026  ·  token-savings llm-costs sinc-llm

I Measured Every Token for 7 Days — Here's What I Found

I thought my LLM costs were too high. But a feeling is not proof. So I built a logging layer. I recorded every single prompt and every response for 7 days. Here is what I found.

21,194
user prompts
33,133
assistant responses
7.14B
total tokens
12.9M
output tokens
$967.01
actual 7-day cost
4.2
responses/prompt (before)

The Number That Stopped Me Cold

I sent 21,194 prompts. The model sent back 33,133 responses. That works out to 1.56 responses per prompt by the end of the week. But at the start of the week the number was 4.2 responses per prompt.

Think about what that means. For every message I typed, the model replied 4.2 times before I got what I needed. That means 3.2 of those replies were not useful. They were fixes, extra questions, or partial answers. They cost money and added no value.

At that rate, my weekly cost would have been $2,597.96. I got it down to $967.01. The difference is $1,588.56. All of it came from cutting out those extra back-and-forth replies.

What I Was Actually Measuring

I added a logging hook to Claude Code. It recorded every prompt going in and every response coming out. It counted tokens on both sides. I tracked five things:

Exchange rate — responses per prompt
Input token volume — how much context each request carried
Output token volume — how much the model generated
SNR (signal-to-noise ratio) — quality of prompt vs. noise
Haiku overhead — cost of the scatter hook per call

The SNR result surprised me most. Before I added the auto-scatter hook, my average prompt SNR was 0.003. That is almost zero. My raw prompts were nearly all noise from the model's point of view. They were vague. They were missing context. They had no limits set.

After I turned on scatter, the average SNR jumped to 0.855. That is a 285x improvement. And it matched the drop in exchange rate exactly.

sinc-LLM signal formula — prompt as sampled signal
x(t) = Σ x(nT) · sinc((t - nT) / T)

The 80K Token Incident

On Day 3, one conversation grew to 80,000 tokens before I noticed. I was stuck in a debug loop. Each response added more context. Each clarification added more back-and-forth. The conversation just kept getting longer.

That one conversation cost more than a full normal day. It shows how exchange rate can compound fast. A conversation that should have been 5 replies at 1,000 tokens each (5,000 tokens total) turned into 80 replies. Every reply carried all the old context forward.

That incident is why I built the scatter hook right away. The data made the problem impossible to ignore.

The Haiku Math

Here is what the scatter hook itself cost. I made 21,194 scatter calls at Claude Haiku pricing. The total was $42.39 for the week. That is $0.002 per call.

Each scatter call saves $0.08 in exchange overhead that I avoid. That is a 38x return on every dollar I spend on the hook. For every $1 I spend on Haiku, I save $38 on the main model. That is a good trade.

The hook breaks each raw prompt into 6 sinc frequency bands: PERSONA, CONTEXT, DATA, CONSTRAINTS, FORMAT, TASK. That structured JSON goes in as system context before the main model sees the prompt. The model gets the full picture on the very first reply. Back-and-forth drops by a lot.

What Would Have Happened Without Measurement

I would have kept spending $2,597.96 a week. I would have thought that was just the price of this kind of work. Or I would have tried random fixes: switching models, shortening system prompts, cutting context windows. None of that would have told me which change actually helped.

Measurement showed me the exact thing to fix: exchange rate. Everything else was secondary. Once I knew that, the fix was clear. Structure the prompt so the model does not need to ask follow-up questions.

If you run any production LLM workflow and you have not measured your exchange rate, do it this week. The number may surprise you.

// Production AI Engineering

Build AI systems that hold up in production.

sinc-LLM designs, audits, and stabilises production AI infrastructure: from vendor evaluation and cost accountability to incident controls and MCP architecture.

See what we do →