Mario Alexandre · March 26, 2026 · token-savings auto-scatter claude-code

How I Saved $1,588.56 in One Week With a Prompt Hook

Last week I built a tiny hook. It catches every prompt I send to Claude before the model sees it. It rewrites the prompt into a structured JSON format. Then it injects that structure as context. In 7 days, it saved me $1,588.56. Not as a guess. For real.

I know that sounds like a clickbait headline. It is not. I will show you the exact numbers. I will show you how the hook works and why it works. Then I will give you the code for free.

$1,588.56

saved in 7 days — from $2,597.96 projected to $967.01 actual

The Problem I Was Trying to Solve

I run a big Claude Code setup. It uses multi-agent pipelines, vault memory, and tool calls. My token bills were getting very high. I was not doing anything obviously wrong. The problem was harder to see than that.

I was averaging 4.2 assistant responses per user prompt. That means every time I typed something, Claude went back and forth with me 4.2 times before doing the actual task. It asked follow-up questions. It misunderstood and then corrected itself.

That back-and-forth costs a lot of money. Every exchange burns tokens on both sides. I ran 21,194 prompts in one week. At that scale, those extra rounds add up to a huge bill.

What I Built

I built what I call an auto-scatter hook. It is a small Python server that sits between me and Claude. Every prompt I type is caught by the hook. The hook sends it to Claude Haiku (the cheap model). Claude Haiku breaks it into 6 structured frequency bands. Then that structured JSON is injected as system context before the main Claude model ever sees my original message.

sinc-LLM signal formula

x(t) = Σ x(nT) · sinc((t - nT) / T)

The 6 bands are: PERSONA, CONTEXT, DATA, CONSTRAINTS, FORMAT, TASK. Each band captures a different type of information from your prompt. CONSTRAINTS carries 42.7% of the quality weight. That is a lot. Most people write one-sentence prompts with zero constraints. Then they wonder why the model does strange things.

The Numbers That Blew My Mind

Here is what I measured over 7 days:

Before: 4.2 assistant responses per user prompt. Projected cost: $2,597.96.
After: 1.6 assistant responses per user prompt. Actual cost: $967.01.
Difference: $1,588.56 saved. A 61% reduction.

The hook cost me $42.39 in Haiku API calls over the same 7 days. Each scatter call cost $0.002. Each scatter call saved $0.08 in avoided back-and-forth. That is a 38x return on the Haiku spend.

Total tokens in 7 days: 7.14 billion. Output tokens alone: 12.9 million. Output tokens are the expensive ones. Scatter cut the wasted output tokens by a large amount.

Why It Works

Here is the key idea. When you send a raw prompt like "fix the auth bug", the model has to figure out many things on its own: who you are, what context you are working in, what data matters, what constraints apply, what format you want, and what the actual task is. If it gets any of those wrong, it asks a clarifying question. You answer. It asks another. You answer again. This keeps going.

The auto-scatter hook does that sorting work upfront for $0.002. It reads your raw prompt and figures out all 6 bands. It hands the model a complete, structured picture before the model starts generating. The model just does the task. No back-and-forth needed.

The Catch-22 I Had to Solve

There was a funny problem early on. The hook was catching every prompt, including the prompts I used to debug the hook itself. When I typed "fix the hook", the hook would catch it and misread it. I ended up in a strange loop where the tool fixing itself was being broken by itself.

I fixed it with a pass-through rule. If the input is already valid sinc JSON (it has the formula, the T field, and the fragments array), the hook does not scatter it. It passes it straight through. That way I could write raw sinc JSON to talk directly to the model when I needed to skip the hook.

You Can Have This for Free

I open-sourced the whole thing: the scatter server, the hook code, and the sinc format specification. Leave a comment below and I will drop the link. I also fine-tuned a local 7B model (Qwen2.5 base, trained in 107 seconds on an RTX 5090) that does the same scatter at zero API cost. The local model runs at 290 tok/s and the GGUF is 4.7GB.

If you run any heavy Claude or LLM workflow, the savings alone will pay for the 15 minutes it takes to set this up. $1,588.56 in a week is real money.

// Production AI Engineering

Build AI systems that hold up in production.

sinc-LLM designs, audits, and stabilises production AI infrastructure: from vendor evaluation and cost accountability to incident controls and MCP architecture.

See what we do →