How I Reduced LLM Costs by 97% With Structured Prompts

Published March 27, 2026 · By Mario Alexandre
$1,500
Before (monthly)
$45
After (monthly)
97%
Cost reduction

In January 2026, our OpenAI bill was $1,500. By March, it was $45. We did not switch to a cheaper model. We did not reduce our call volume. We did not build a caching layer. We restructured our prompts using sinc-LLM's 6-band format, and the token waste disappeared.

This is the story of how I discovered that most LLM costs are caused by prompt inefficiency, not model pricing.

Where the $1,500 Was Going

I audited our API usage and found a pattern I had not expected. Our average API call was not one request and one response. It was one request, one response that was wrong or incomplete, a follow-up clarification, another response, another correction, and finally a usable output. The average task took 3.2 API calls to complete.

Each "clarification loop" re-sent the entire conversation history plus the new instruction. A task that should have cost $0.02 in tokens was costing $0.08-0.12 because of accumulated context in the retry loops. Multiply that by 15,000 tasks per month, and you get $1,500.

The root cause was underspecified prompts. Our developers were sending raw prompts like "generate a summary of this document" with no constraints on length, format, tone, or what to include. The model produced a 500-word essay when we wanted a 3-sentence abstract, so the developer sent a follow-up: "Make it shorter." Then another: "Format it as bullet points." Each follow-up doubled the token count because the full conversation had to be re-sent.

The Realization: Prompts Are Signals

I found sinc-LLM while researching prompt engineering best practices. The core idea hit me immediately: a prompt is a specification signal, and underspecified prompts cause aliasing — the model fills gaps with its own interpretation, which rarely matches your intent.

The sinc-LLM framework models this with the sampling theorem:

x(t) = Σ x(nT) · sinc((t - nT) / T)

If you sample your specification at the Nyquist rate — 6 bands (PERSONA, CONTEXT, DATA, CONSTRAINTS, FORMAT, TASK) — the model can reconstruct your intent perfectly. If you under-sample (send only 2-3 bands), the reconstruction aliases. The model's output looks wrong, and you enter a clarification loop to correct it.

I realized that our clarification loops were literally us manually supplying the missing bands after the fact. "Make it shorter" was supplying the missing CONSTRAINTS band. "Format as bullet points" was supplying the missing FORMAT band. We were paying to transmit the full conversation context just to supply information we could have included in the initial prompt.

The Fix: 6-Band Structured Prompts

I converted our 20 most common prompt templates to the sinc-LLM 6-band format. Here is what the "document summary" prompt looked like before and after:

Before (raw prompt, average 3.2 calls):

"Summarize this document: [document text]"

After (sinc-LLM structured, average 1.0 calls):

{
  "formula": "x(t) = Σ x(nT) · sinc((t - nT) / T)",
  "T": "specification-axis",
  "fragments": [
    {
      "n": 0, "t": "PERSONA",
      "x": "Technical writer producing executive summaries. Concise, factual, no opinions."
    },
    {
      "n": 1, "t": "CONTEXT",
      "x": "Internal document summary for the engineering team's weekly digest. Readers are senior engineers who need to quickly assess relevance."
    },
    {
      "n": 2, "t": "DATA",
      "x": "[document text inserted here]"
    },
    {
      "n": 3, "t": "CONSTRAINTS",
      "x": "Exactly 3 sentences. First sentence: what the document is about. Second sentence: the key finding or recommendation. Third sentence: action items or next steps. Do not include background information. Do not include the author's qualifications. Do not use the phrase 'this document.' No bullet points. Under 80 words total."
    },
    {
      "n": 4, "t": "FORMAT",
      "x": "Plain text paragraph. No markdown headers. No bullet points. Three sentences separated by periods."
    },
    {
      "n": 5, "t": "TASK",
      "x": "Write the 3-sentence executive summary of the provided document following all constraints."
    }
  ]
}

The Results: Month by Month

January: $1,500. All raw prompts. Average 3.2 calls per task. 15,000 tasks.

February: $320. Top 10 templates converted to 6-band format. Average 1.4 calls per task. Same 15,000 tasks. The remaining high-call-count tasks were unconverted templates.

March: $45. All 20 templates converted. Average 1.05 calls per task. Same 15,000 tasks. The 0.05 excess is from genuine edge cases where the output needed human feedback, not from specification gaps.

Why Structured Prompts Save Money

I identified four mechanisms by which the 6-band structure reduced our costs:

1. Elimination of clarification loops (80% of savings): With all 6 bands specified, the model produces the correct output on the first call. No "make it shorter" follow-ups. No "wrong format" retries. This alone cut our average from 3.2 calls to 1.05 calls — a 67% reduction in API calls.

2. Reduced output tokens (12% of savings): Without a FORMAT and CONSTRAINTS band, the model defaults to verbose outputs. Our summaries went from an average of 280 words (model's default) to 75 words (our specification). Fewer output tokens = lower cost per call.

3. Smaller model sufficiency (5% of savings): With a fully specified 6-band prompt, GPT-4o-mini produces outputs equivalent to what GPT-4o produced with raw prompts. We moved 60% of our tasks to the mini model. The structured prompt compensates for the smaller model's lower capability.

4. Prompt caching (3% of savings): The sinc-LLM system prompt (bands 0, 1, 3, 4) is identical across calls for the same task type. Only the DATA band changes per document. OpenAI's prompt caching kicks in on the repeated system portion, reducing input token costs for subsequent calls.

Lessons Learned

I came away from this project with a conviction I did not have before: LLM cost optimization is primarily a prompt engineering problem, not a infrastructure problem. Caching, batching, and model selection matter — but they are second-order effects compared to getting your prompts right.

The 6-band structure from sinc-LLM is not just a formatting convention. It is a forcing function that makes you think about what you actually want before you send the request. That upfront thinking is what eliminates the expensive clarification loops downstream.

If your LLM bill is higher than you think it should be, do not start by looking at caching or cheaper models. Start by auditing your prompts. Count the average number of API calls per completed task. If it is above 1.2, you have a specification problem, and the sinc-LLM 6-band format will fix it.

Start Optimizing Prompts Free →