The most expensive thing Claude does isn't answering your question. It's asking you clarifying questions. Each clarification is an extra round-trip: one more output, one more input, a growing conversation context, and more time before you get what you actually wanted.
I fixed this with a hook. Here's how it works and why 2ms of overhead is all it takes.
Claude isn't asking questions to be difficult. It's asking because it doesn't have enough information to give you a confident, useful answer. When you send "fix the bug", Claude faces genuine uncertainty: Which bug? In what context? With what constraints? What format do you want the fix in?
A well-trained, honest model will ask rather than guess. That's the right behavior. But if we give it enough information upfront — the complete picture it needs — it doesn't need to ask. It just does.
That's the entire premise of the auto-scatter hook.
When I say "2ms hook", I mean the overhead attributable to the hook infrastructure itself — the Python server process, the HTTP roundtrip to localhost:8461, the JSON parsing, the injection into the system message. That part is 2ms.
The Haiku API call that does the actual scatter decomposition adds 300-800ms depending on Anthropic's response time. Total end-to-end before your prompt hits Claude Sonnet: about 400-900ms. For the quality improvement that produces, that's an excellent trade.
| Step | Latency |
|---|---|
| Hook capture + HTTP to localhost | ~2ms |
| Haiku API call (scatter decomposition) | 300-800ms |
| JSON parse + system message injection | ~3ms |
| Total hook overhead | ~305-805ms |
| Saved: 3.2 clarification rounds avoided | -5,000-15,000ms |
Net effect: adding 500ms upfront saves multiple 2-5 second clarification round trips. Every hook invocation is a net time gain, not a cost.
The Haiku call decomposes your prompt into 6 sinc bands. Those 6 bands together constitute a complete specification of everything the model needs to respond correctly on the first try:
PERSONA — who the model should be. Calibrates expertise level, tone, assumptions.
CONTEXT — what situation you're in. Locates the problem in its environment.
DATA — relevant facts and numbers. Gives the model concrete anchors.
CONSTRAINTS — what the model cannot do (42.7% of quality weight). Eliminates wrong-direction answers before they're generated.
FORMAT — how output should look (26.3% of quality weight). Eliminates "here's an explanation" when you wanted "here's a diff".
TASK — the actual ask. Disambiguates the target.
When CONSTRAINTS and FORMAT are fully specified, the two biggest sources of clarification questions disappear. "Should I refactor or just patch?" — no longer asked, because CONSTRAINTS says "minimal footprint". "Do you want code or an explanation?" — no longer asked, because FORMAT says "code diff only". The model just does the task.
Before the hook: 4.2 assistant responses per user prompt. After the hook: 1.6. That 2.6-exchange reduction across 21,194 prompts in 7 days equals $1,588.56 in avoided API cost. The hook overhead (Haiku API) cost $42.39. Net: $1,546.17 gain in one week.
The clarification loop isn't just a cost problem — it's a flow problem. When you're deep in a coding session and Claude asks you a clarification question, you have to context-switch, re-read, think, respond, wait for the next response. Each loop breaks your concentration. The hook eliminates those breaks almost entirely.
Try sinc-LLM free — sincllm.com
Open source. Leave a comment for the hook code.