Let me explain this from scratch. I have been calling it an "auto-scatter hook" for weeks. But I never explained what it actually means.
An auto-scatter hook is a piece of code that sits between you and your LLM. Every time you type a prompt, the hook grabs it. It rewrites it into a structured JSON format. Then it adds that structured version as context before the model sees your original words. You never see this happen. It is invisible. But the model behaves very differently because of it.
The name comes from signal processing. I treat prompts like signals. I "scatter" them, which means I break them into their parts. The math behind it is the Nyquist-Shannon sampling theorem. It treats a prompt as a sampled signal with 6 distinct frequency bands. Each band carries a different type of information.
The 6 bands are: PERSONA (who the model is), CONTEXT (what situation you are in), DATA (relevant facts), CONSTRAINTS (what the model cannot do), FORMAT (how output should look), and TASK (what you are actually asking). Each band has a different impact on quality. CONSTRAINTS alone carries 42.7% of output quality. That is a huge number.
The "auto" part matters a lot. I tried manual scatter first. That meant writing the sinc JSON myself for every prompt. It worked well for quality. But it was not sustainable at all. Nobody structures their prompts by hand every single time. You are in the middle of work. You type what you want. You hit enter.
The auto-scatter hook makes all of that invisible. You still type whatever you want. The hook handles the structure for you. No effort required. It just runs.
In Claude Code, I built it as a PreToolUse hook. It fires on every user message. The hook is a Python server running locally on port 8461. The hook overhead is 2ms. The Haiku API call takes 300 to 800ms on top of that. The hook is non-blocking. If the scatter call fails, it passes your prompt through unchanged. Your workflow does not break.
One important edge case: if the prompt is already valid sinc JSON (meaning it has the formula field, T field, and fragments array), the hook passes it straight through. It does not scatter it again. This matters for two reasons.
First, it prevents double-scattering. If you or another agent already structured the prompt correctly, the hook leaves it alone. Second, it gives me an escape hatch. If I need to talk to the model directly, bypassing the hook, I can write sinc JSON by hand. It will reach the model without any changes.
7 days. 21,194 prompts scattered. The exchange rate dropped from 4.2 to 1.6 responses per prompt. Cost savings: $1,588.56. Haiku overhead: $42.39. Net gain: $1,546.17. ROI: 38x.
I also fine-tuned a local Qwen2.5-7B model to do the scatter at zero API cost. Training took 107 seconds on an RTX 5090. Inference runs at 290 tok/s. The model file is 4.7GB GGUF. If you have a local GPU, the Haiku cost drops to zero. Your savings become 97% month-over-month.
The code is open source. Leave a comment and I will drop the GitHub link.
// Production AI Engineering
sinc-LLM designs, audits, and stabilises production AI infrastructure: from vendor evaluation and cost accountability to incident controls and MCP architecture.
See what we do →