Mario Alexandre  ·  March 26, 2026  ·  auto-scatter claude-code token-savings

How a 2ms Hook Eliminates Clarification Loops in Claude

The most expensive thing Claude does is not answering your question. It is asking you clarifying questions. Each clarification is an extra round-trip: one more output, one more input, a bigger conversation, and more time before you get what you wanted.

I fixed this with a hook. Here is how it works and why 2ms of overhead is all it takes.

Why Claude Asks Clarifying Questions

Claude is not asking questions to be difficult. It asks because it does not have enough information to give you a good answer. When you send "fix the bug", Claude faces real uncertainty: Which bug? In what context? With what limits? What format do you want?

A well-trained, honest model will ask rather than guess. That is the right behavior. But if we give it enough information upfront (the full picture it needs), it does not need to ask. It just does.

That is the whole idea behind the auto-scatter hook.

The 2ms Breakdown

When I say "2ms hook", I mean the time the hook itself takes: the Python server process, the HTTP roundtrip to localhost:8461, the JSON parsing, and the injection into the system message. That part is 2ms.

The Haiku API call that does the actual scatter decomposition adds 300-800ms, depending on Anthropic's response time. Total time before your prompt hits Claude Sonnet: about 400-900ms. For the quality improvement that produces, that is a great trade.

StepLatency
Hook capture + HTTP to localhost~2ms
Haiku API call (scatter decomposition)300-800ms
JSON parse + system message injection~3ms
Total hook overhead~305-805ms
Saved: 3.2 clarification rounds avoided-5,000-15,000ms

Net effect: adding 500ms upfront saves multiple 2-5 second clarification round trips. Every hook call is a net time gain, not a cost.

sinc-LLM — 6-band prompt decomposition
x(t) = Σ x(nT) · sinc((t - nT) / T)

What the Hook Actually Injects

The Haiku call breaks your prompt into 6 sinc bands. Together, those 6 bands give the model everything it needs to answer correctly on the first try:

PERSONA — who the model should be. Sets the right expertise level, tone, and assumptions.
CONTEXT — what situation you are in. Places the problem in its setting.
DATA — relevant facts and numbers. Gives the model solid anchors.
CONSTRAINTS — what the model cannot do (42.7% of quality weight). Cuts wrong answers before they are even generated.
FORMAT — how the output should look (26.3% of quality weight). Stops "here is an explanation" when you wanted "here is a diff".
TASK — the actual ask. Makes the goal clear.

When CONSTRAINTS and FORMAT are fully specified, the two biggest sources of clarification questions go away. "Should I refactor or just patch?" is no longer asked, because CONSTRAINTS says "minimal footprint". "Do you want code or an explanation?" is no longer asked, because FORMAT says "code diff only". The model just does the task.

The Measurement

Before the hook: 4.2 assistant responses per user prompt. After the hook: 1.6. That 2.6-exchange reduction across 21,194 prompts in 7 days equals $1,588.56 in avoided API cost. The hook overhead (Haiku API) cost $42.39. Net gain: $1,546.17 in one week.

The clarification loop is not just a cost problem. It is also a flow problem. When you are deep in a coding session and Claude asks a clarification question, you have to stop, re-read, think, respond, and wait for the next response. Each loop breaks your focus. The hook removes those breaks almost entirely.

// Production AI Engineering

Build AI systems that hold up in production.

sinc-LLM designs, audits, and stabilises production AI infrastructure: from vendor evaluation and cost accountability to incident controls and MCP architecture.

See what we do →