Hey. You've probably looked at your Anthropic or OpenAI bill and thought — why is this so expensive? And you've probably tried all the obvious things. Switching to smaller models. Caching responses. Shortening your system prompts. Maybe even writing better prompts for a week before giving up on that.
None of that fixed it, did it? Because the actual problem isn't what you think it is.
The problem is your exchange rate. And most people have never even heard of it as a metric.
Exchange rate, in the context I'm using it, is the number of assistant responses you get per user prompt. In a perfect world, every prompt you send gets one good response and that's it. Exchange rate of 1.0.
In the real world — especially with complex technical work — the model doesn't have enough information to answer correctly the first time. So it asks a clarifying question. Or it answers a slightly different question and you have to correct it. Or it does half the task and you prompt again for the other half. Each of these extra exchanges multiplies your cost.
I measured my own exchange rate over 7 days, 21,194 prompts. It was 4.2. That means every task that should cost 1 unit of tokens was actually costing 4.2 units. That's not 4.2x the bill — it's worse, because each exchange also carries the growing conversation context forward, compounding the input token cost.
I want to be direct about this because I see people constantly blaming the model. "Claude is too verbose." "GPT-4 asks too many questions." "Gemini doesn't follow instructions."
The model is doing exactly what it should do. When it doesn't have enough information, it asks. That's good behavior. The problem is that your prompts aren't giving it enough information to not ask.
A structured prompt — one that explicitly covers who the model should be, what the context is, what data is relevant, what constraints apply, what format the output should take, and what the task actually is — eliminates most of those questions before they're asked. The model gets everything it needs in one shot and gives you what you need in one shot.
Here's why the cost compounds faster than you'd expect. Each exchange in a conversation carries the full context of all previous exchanges. So if you have a 10-exchange conversation to accomplish one task, the last few exchanges are paying input tokens for all 10 turns of history.
In my measurement: 12.9 million output tokens in 7 days. 7.14 billion total tokens across the whole system. The output tokens are the expensive ones per token, but the input token volume from compounding context is where the surprise bill comes from.
At 4.2 exchanges/prompt vs. 1.6 exchanges/prompt, I saved $1,588.56 over 7 days — a 61% reduction. My projected cost at the old rate was $2,597.96. Actual cost at the new rate: $967.01.
A lot of people try to solve the cost problem by switching to cheaper models. That's partially valid but it misses the core issue. If your exchange rate is 4.2, switching from Claude Sonnet to Claude Haiku cuts your per-token cost but not your exchange count. You still get 4.2 responses per prompt — they're just cheaper per token.
The real fix is reducing exchange rate. And the way to do that is to give the model complete information on the first prompt, every time. That requires a structural change to how prompts are built, not just how they're priced.
I built an auto-scatter hook — a tiny server that intercepts every prompt before it reaches the main model, calls Claude Haiku to decompose it into 6 structured bands (PERSONA, CONTEXT, DATA, CONSTRAINTS, FORMAT, TASK), and injects that structure as system context. The hook costs $0.002 per call. It saves $0.08 per call in avoided exchange overhead. ROI: 38x.
The code is open source. Leave a comment and I'll drop the link. It takes about 15 minutes to set up. If your bill is anywhere near what mine was, it pays for itself in the first hour of use.
Try sinc-LLM free — sincllm.com
Open source. 15 minutes to install.