Mario Alexandre  ·  March 26, 2026  ·  auto-scatter prompt-hook open-source

I Built a Prompt Interceptor That Pays for Itself

Most tools you build cost money to run. This one earns. I built a small Python server that intercepts every prompt I send to Claude, restructures it in real time, and injects the structured version as context before the main model processes it. The server costs $0.002 per invocation to run. Each invocation saves me $0.08 in avoided back-and-forth. 38x return. In a week it saved me $1,588.56 on $42.39 of running cost.

Let me show you how I built it.

The Architecture

The interceptor is a Python FastAPI server running locally. Claude Code (and any Claude workflow) is configured with a PreToolUse hook that fires on every user message. That hook makes an HTTP POST to localhost:8461/scatter with the raw prompt text. The server processes it and returns structured sinc JSON. That JSON gets injected as the system message context before Claude processes the original prompt.

# Simplified core of scatter_server.py
from fastapi import FastAPI
import anthropic, json

app = FastAPI()
client = anthropic.Anthropic()

SCATTER_SYSTEM = """
Decompose the user's prompt into sinc JSON with 6 fragments:
n=0 PERSONA, n=1 CONTEXT, n=2 DATA, n=3 CONSTRAINTS (longest),
n=4 FORMAT, n=5 TASK. Return ONLY valid JSON.
"""

@app.post("/scatter")
async def scatter(payload: dict):
    prompt = payload.get("prompt", "")

    # Pass-through if already sinc JSON
    if is_sinc_json(prompt):
        return {"scattered": prompt, "passthrough": True}

    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=1024,
        system=SCATTER_SYSTEM,
        messages=[{"role": "user", "content": prompt}]
    )

    sinc_json = json.loads(response.content[0].text)
    return {"scattered": json.dumps(sinc_json), "passthrough": False}

The Sinc Format

The structured output follows the sinc-LLM format. Six fragments, each with an index (n), a type (t), and the content (x). The formula is embedded as a header field — it's the mathematical foundation of the whole approach.

sinc-LLM signal reconstruction
x(t) = Σ x(nT) · sinc((t - nT) / T)

Each fragment maps to one dimension of what the model needs to produce a great first-try response. CONSTRAINTS (n=3) gets the most content — it carries 42.7% of quality weight. If the model knows what it cannot do, it doesn't waste a response doing the wrong thing and asking whether that was what you meant.

The Failure I Didn't Anticipate

My first version of the server blocked every prompt while waiting for the Haiku API response. That meant if the API was slow or down, everything stopped. I'd sit there watching Claude Code frozen while the scatter call timed out.

The fix was two-part: a 25-second timeout on the scatter call, and an instant heuristic fallback if the API fails. The fallback does template-based scatter using keyword inference — not as good as Haiku, but it never blocks the prompt. The prompt goes through no matter what.

I also added special pass-through rules: if the prompt starts with @agent (routing to a sub-agent), or /slash (a slash command), or is already valid sinc JSON, skip scatter entirely. No point structuring something that's already structured or already routed.

The Claude Code Hook Integration

In Claude Code, hooks are configured in settings.json. Here's the hook entry that wires the scatter server into every user message:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": ".*",
        "hooks": [{
          "type": "command",
          "command": "py -X utf8 C:/Users/Mario/scatter_hook.py"
        }]
      }
    ]
  }
}

The scatter_hook.py script reads the prompt from stdin, POSTs to localhost:8461/scatter, and writes the sinc JSON to the system message injection point. The whole thing adds 300-800ms of latency per prompt (Haiku API call time). For me that's completely acceptable given the ROI.

What 7 Days Proved

21,194 prompts intercepted. Exchange rate dropped from 4.2 to 1.6. $1,588.56 saved. The screenshot in my other article shows the actual numbers from my measurement dashboard. This isn't hypothetical. It worked exactly as designed and continues to work every day I use it.

The code is open source on GitHub. Leave a comment and I'll drop the link — I want to make sure people actually find it useful before I start pushing it everywhere.

Try sinc-LLM free — sincllm.com

Full spec on the site. Hook code on GitHub.