Mario Alexandre  ·  March 26, 2026  ·  token-savings auto-scatter claude-code

How I Saved $1,588.56 in One Week With a Prompt Hook

Hey, I want to show you something wild that happened to me last week. I built a tiny hook that intercepts every prompt I send to Claude before it reaches the model, rewrites it into a structured JSON format, and injects that structure as context. In 7 days, it saved me $1,588.56. Not hypothetically. Actually.

I know that sounds like a clickbait headline. I promise you it's not. I'm going to show you the exact numbers, how the hook works, and why it works. Then I'm going to give you the code for free.

$1,588.56
saved in 7 days — from $2,597.96 projected to $967.01 actual

The Problem I Was Trying to Solve

I run a pretty heavy Claude Code setup. Multi-agent pipelines, vault memory, tool calls — the whole thing. My token bills were getting insane. And not because I was doing anything obviously wrong. The problem was subtler than that.

I was averaging 4.2 assistant responses per user prompt. That means for every time I typed something, Claude was going back and forth with me 4.2 times before actually doing the thing. Clarifying. Asking follow-ups. Misunderstanding and self-correcting.

That back-and-forth is incredibly expensive. Each exchange burns tokens on both sides. And when you're running this at scale — 21,194 prompts in a week — those clarification loops add up to a catastrophic bill.

What I Built

I built what I call an auto-scatter hook. It's a small Python server that sits between me and Claude. Every single prompt I type gets intercepted by the hook, sent to Claude Haiku (the cheap model), decomposed into 6 structured frequency bands, and then that structured JSON gets injected as system context before the main Claude model ever sees my original message.

sinc-LLM signal formula
x(t) = Σ x(nT) · sinc((t - nT) / T)

The 6 bands are: PERSONA, CONTEXT, DATA, CONSTRAINTS, FORMAT, TASK. Each band captures a different frequency of information from your prompt. CONSTRAINTS carries 42.7% of the quality weight — which is wild if you think about it. Most people write one sentence prompts with zero constraints and wonder why the model does weird things.

The Numbers That Blew My Mind

Here's what I measured over 7 days:

Before: 4.2 assistant responses per user prompt. Extrapolated cost: $2,597.96.
After: 1.6 assistant responses per user prompt. Actual cost: $967.01.
Difference: $1,588.56 saved. 61% reduction.

The hook itself cost me $42.39 in Haiku API calls over the same period. That's $0.002 per scatter call, against $0.08 saved per scatter call in avoided back-and-forth. That's a 38x ROI on the Haiku spend.

Total tokens in 7 days: 7.14 billion. Output tokens alone: 12.9 million. Those output tokens are the expensive ones, and scatter cut the wasted ones drastically.

Why It Works

Here's the insight. When you send a raw prompt like "fix the auth bug", the model has to figure out on its own: who you are, what context you're working in, what data is relevant, what constraints apply, what format you want, and what the actual task is. If it gets any of those wrong, it asks a clarifying question. You answer. It asks another. You answer. And so on.

The auto-scatter hook does that disambiguation work upfront for $0.002. It reads your raw prompt, infers all 6 bands, and hands the model a complete structured picture before it starts generating. The model just... does the thing. No back-and-forth needed.

The Catch-22 I Had to Solve

There was a hilarious problem early on. The hook was intercepting every prompt — including the prompts I was using to debug the hook itself. So when I typed "fix the hook", the hook would intercept it, misinterpret it, and I'd end up in a weird loop where the tool fixing itself was being corrupted by itself.

I solved it with a pass-through rule: if the input is already valid sinc JSON (has the formula, the T field, and the fragments array), don't scatter it. Pass it straight through. That meant I could construct raw sinc JSON to communicate directly with the model when I needed to bypass the hook.

You Can Have This for Free

I open-sourced the whole thing. The scatter server, the hook code, the sinc format specification — all of it. Leave a comment below and I'll drop the link. I also fine-tuned a local 7B model (Qwen2.5 base, trained in 107 seconds on an RTX 5090) that does the same scatter at zero API cost. The local model runs at 290 tok/s and the GGUF is 4.7GB.

If you're running any kind of heavy Claude or LLM workflow, the exchange rate reduction alone will pay for the 15 minutes it takes to set this up. $1,588.56 in a week is not nothing.

Try the sinc format free — sincllm.com

Open source. No signup required to read the spec.