Mario Alexandre · March 26, 2026 · open-source token-savings auto-scatter

Free Tool to Cut LLM Costs 61% — Auto-Scatter Hook

I built a tool that saved me $1,588.56 in one week. I am releasing it for free. This article tells you what is in the package, how it works, and how to get it.

Leave a comment and I will drop the GitHub link. I want you to read this first. Understanding what you are getting means you will use it the right way.

What It Is

The auto-scatter hook is a local Python server (scatter_server.py). It catches every Claude Code prompt before the main model sees it. It calls Claude Haiku to break your raw prompt into 6 structured frequency bands (sinc JSON). Then it injects that structure as system context. The main model gets a fully specified prompt. You get a first-try answer instead of a back-and-forth loop.

sinc-LLM — the mathematical basis

x(t) = Σ x(nT) · sinc((t - nT) / T)

What's in the Package

scatter_server.py: FastAPI server, runs locally on port 8461
scatter_hook.py: Claude Code PreToolUse hook that calls the server
settings.json snippet: hook registration for Claude Code
sinc_spec.json: the sinc-LLM 6-band format specification
local_model/: Qwen2.5-7B GGUF fine-tuned for scatter (4.7GB, optional)
test_scatter.py: validation tests to confirm the hook is working
requirements.txt: fastapi, uvicorn, anthropic

The Two Operating Modes

Haiku API mode: uses Claude Haiku for scatter. Cost: $0.002 per call. Saves $0.08 per call. Net: $0.078 saved per prompt. Most people will use this mode. All you need is an Anthropic API key.

Local model mode: uses the fine-tuned Qwen2.5-7B GGUF (included in the package). Zero API cost. Requires an NVIDIA GPU (RTX 3090 or better recommended). Runs at 290 tok/s on an RTX 5090. Scatter quality is slightly lower than Haiku on edge cases, but it is excellent for standard prompts. At zero marginal scatter cost, monthly savings project to $1,500+ for my workflow volume.

What to Expect After Installing

If your current exchange rate is near 4.2 (typical for unstructured Claude Code workflows), expect it to drop to 1.6-1.8 within the first day. The drop is immediate, not gradual. Every prompt gets structured from the very first hook call.

Total latency increase per prompt: 400-900ms (Haiku API call). This is front-loaded. You wait a little longer for the first response. But you wait much less overall because you skip the clarification loops. Net time per task goes down.

The hook adds no observable memory overhead. It does not modify any Claude Code files. It runs as a side-car process. Hook registration lives in settings.json only. To uninstall, remove those 3 lines.

Prerequisites

# Requirements
Python 3.9+
Anthropic API key (for Haiku mode)
Claude Code (for hook integration)

# Install
pip install -r requirements.txt
uvicorn scatter_server:app --port 8461 &

# Add to Claude Code settings.json
# (see README for exact snippet)

Numbers from 7 Days of Production Use

21,194 prompts scattered. Exchange rate: 4.2 to 1.6. Cost saved: $1,588.56. Haiku overhead: $42.39. SNR improvement: 0.003 to 0.855. ROI: 38x. The hook has been running without a stop since I deployed it. No crashes. No blocking incidents. The non-blocking fallback has never been needed.

Leave a comment and I will send you the GitHub link. I am releasing this for free because it is genuinely useful. Also, the more people use it, the better the training data gets for local model fine-tuning.

// Production AI Engineering

Build AI systems that hold up in production.

sinc-LLM designs, audits, and stabilises production AI infrastructure: from vendor evaluation and cost accountability to incident controls and MCP architecture.

See what we do →