How I Reduced LLM Costs by 97% With Structured Prompts
In January 2026, our OpenAI bill was $1,500. By March it was $45. We did not switch to a cheaper model. We did not cut the number of calls. We did not build a caching layer. We changed our prompts using sinc-LLM's 6-band format, and the wasted tokens went away.
This is the story of how I learned that most LLM costs come from bad prompts, not from the price of the model.
Where the $1,500 Was Going
I looked at our API usage and found something I did not expect. A typical task was not one request and one reply. It was one request, then a reply that was wrong. Then a fix request. Then another reply. Then another fix. The average task took 3.2 API calls to finish.
Each fix sent the whole conversation again, plus the new instruction. A task that should cost $0.02 in tokens ended up costing $0.08 to $0.12 because of all that extra context. Multiply that by 15,000 tasks per month and you get $1,500.
The real problem was vague prompts. Our developers sent things like "generate a summary of this document" with no rules about length, format, or tone. The model wrote a 500-word essay when we wanted 3 sentences. So the developer sent a follow-up: "Make it shorter." Then another: "Format it as bullet points." Each follow-up sent the whole conversation again. That doubled the token count every time.
The Realization: Prompts Are Signals
I found sinc-LLM while reading about prompt engineering. The main idea was clear to me right away: a prompt is a signal. A vague prompt causes aliasing. The model fills in the gaps with its own guesses, and those guesses rarely match what you want.
The sinc-LLM framework uses the sampling theorem to show this:
If you fill all 6 bands (PERSONA, CONTEXT, DATA, CONSTRAINTS, FORMAT, TASK), the model can rebuild your intent with no gaps. If you only fill 2 or 3 bands, the model guesses the rest. The output comes out wrong, and you have to send fix requests.
I saw that our fix requests were just supplying the missing bands after the fact. "Make it shorter" was the CONSTRAINTS band. "Format as bullet points" was the FORMAT band. We were paying to send the full conversation again just to add information we could have put in the first prompt.
The Fix: 6-Band Structured Prompts
I changed our 20 most-used prompt templates to the sinc-LLM 6-band format. Here is what the "document summary" prompt looked like before and after:
Before (raw prompt, average 3.2 calls):
"Summarize this document: [document text]"
After (sinc-LLM structured, average 1.0 calls):
{
"formula": "x(t) = Σ x(nT) · sinc((t - nT) / T)",
"T": "specification-axis",
"fragments": [
{
"n": 0, "t": "PERSONA",
"x": "Technical writer producing executive summaries. Concise, factual, no opinions."
},
{
"n": 1, "t": "CONTEXT",
"x": "Internal document summary for the engineering team's weekly digest. Readers are senior engineers who need to quickly assess relevance."
},
{
"n": 2, "t": "DATA",
"x": "[document text inserted here]"
},
{
"n": 3, "t": "CONSTRAINTS",
"x": "Exactly 3 sentences. First sentence: what the document is about. Second sentence: the key finding or recommendation. Third sentence: action items or next steps. Do not include background information. Do not include the author's qualifications. Do not use the phrase 'this document.' No bullet points. Under 80 words total."
},
{
"n": 4, "t": "FORMAT",
"x": "Plain text paragraph. No markdown headers. No bullet points. Three sentences separated by periods."
},
{
"n": 5, "t": "TASK",
"x": "Write the 3-sentence executive summary of the provided document following all constraints."
}
]
}
The Results: Month by Month
January: $1,500. All prompts were raw. Average 3.2 calls per task. 15,000 tasks.
February: $320. The top 10 templates were changed to 6-band format. Average 1.4 calls per task. Still 15,000 tasks. The tasks still using old templates were the ones with high call counts.
March: $45. All 20 templates were changed. Average 1.05 calls per task. Still 15,000 tasks. The extra 0.05 calls came from real edge cases that needed a human to review the output, not from vague prompts.
Why Structured Prompts Save Money
I found four reasons why the 6-band structure cut our costs:
1. Elimination of clarification loops (80% of savings): With all 6 bands filled in, the model gets it right on the first call. No "make it shorter" follow-ups. No "wrong format" retries. This one change cut our average from 3.2 calls to 1.05 calls. That is a 67% drop in API calls.
2. Reduced output tokens (12% of savings): Without a FORMAT and CONSTRAINTS band, the model writes long replies by default. Our summaries went from 280 words on average to 75 words. Fewer output tokens means a lower cost per call.
3. Smaller model sufficiency (5% of savings): With a full 6-band prompt, GPT-4o-mini gives results as good as GPT-4o did with raw prompts. We moved 60% of our tasks to the mini model. The structured prompt makes up for the smaller model's weaker ability.
4. Prompt caching (3% of savings): The sinc-LLM system prompt (bands 0, 1, 3, 4) stays the same for every call of the same task type. Only the DATA band changes per document. OpenAI's prompt caching applies to the repeated part, so input token costs drop on later calls.
Lessons Learned
I finished this project with a clear belief I did not have before: cutting LLM costs is mainly a prompt problem, not an infrastructure problem. Caching, batching, and model choice all matter. But they are small gains compared to writing good prompts.
The 6-band structure from sinc-LLM is not just a style rule. It forces you to think about what you really want before you send the request. That thinking up front is what kills the expensive fix loops later.
If your LLM bill is too high, do not start by looking at caching or cheaper models. Start by checking your prompts. Count the average number of API calls per finished task. If that number is above 1.2, you have a vague-prompt problem. The sinc-LLM 6-band format will fix it.
// Production AI Engineering
Build AI systems that hold up in production.
sinc-LLM designs, audits, and stabilises production AI infrastructure: from vendor evaluation and cost accountability to incident controls and MCP architecture.
See what we do →