I had a feeling my LLM costs were higher than they should be. But feelings are useless when you're debugging a system. So I built a measurement layer, logged everything, and ran it for 7 days straight. Here's what I found.
33,133 assistant responses for 21,194 user prompts. That's 1.56 responses per prompt averaged out — but that was after I'd already started making fixes. The baseline at the start of the week was 4.2 responses per prompt.
Think about that. For every time I typed something, the model was responding 4.2 times on average before I got what I needed. That means 3.2 of those responses were clarifications, corrections, partial answers, or follow-up exchanges. They weren't delivering value — they were overhead.
At scale, that overhead is catastrophic. My projected cost at the 4.2 baseline would have been $2,597.96 for the week. I got it down to $967.01. The difference — $1,588.56 — came entirely from reducing unnecessary exchanges.
I instrumented Claude Code with a logging hook. Every prompt in, every response out, with token counts on both sides. I tracked:
Exchange rate — responses per prompt
Input token volume — how much context each request carried
Output token volume — how much the model generated
SNR (signal-to-noise ratio) — quality of prompt vs. noise
Haiku overhead — cost of the scatter hook per call
The SNR measurement was the most surprising. Before adding the auto-scatter hook, my average prompt SNR was 0.003. That's near-zero signal. The raw prompts I was sending were almost entirely noise from the model's perspective — full of ambiguity, missing context, no constraints specified.
After scatter, average SNR: 0.855. That's an 285x improvement. And it directly tracks with the exchange rate improvement.
Day 3, I hit a wall. One conversation ballooned to 80,000 tokens before I realized what was happening. I was in a debugging loop — each response added more context, each clarification request added more back-and-forth, and the whole conversation just kept growing.
That single conversation cost me more than a full day of normal usage. It was a perfect illustration of how exchange rate compounds. A conversation that should have been 5 exchanges at 1,000 tokens each (5,000 tokens) became 80 exchanges at various sizes, carrying all the prior context forward.
This is why I built the scatter hook urgently after that. The measurement data made the problem impossible to ignore.
Here's the cost breakdown for the scatter hook itself. 21,194 scatter calls at Claude Haiku pricing. Total: $42.39 for the week. That's $0.002 per scatter call.
Per scatter call, I save $0.08 in avoided exchange overhead. That's a 38x return on every dollar I spend on the hook. For every $1 of Haiku cost, I save $38 in main model cost. That's a pretty good trade.
The hook decomposes each raw prompt into 6 sinc frequency bands: PERSONA, CONTEXT, DATA, CONSTRAINTS, FORMAT, TASK. That structured JSON gets injected as system context before the main model processes the original prompt. The model gets a complete picture on every first response. Back-and-forth drops dramatically.
I would have kept spending $2,597.96 per week and assumed it was the cost of doing this kind of work. Or I would have tried random optimizations — switching models, shortening system prompts, reducing context windows — without knowing which ones actually moved the needle.
Measurement told me the exact lever to pull: exchange rate. Everything else was secondary. And once I knew that, the fix was obvious: structure the input so the model doesn't need to ask follow-up questions.
If you're running any kind of production LLM workflow and you haven't measured your exchange rate, do it this week. The number might surprise you.
Try sinc-LLM free — sincllm.com
The scatter spec is open. Leave a comment for the hook code.