For a long time my workflow with LLMs looked like this: I had a problem, so I opened a chat window and started typing. I typed the first sentence. Then I saw where it went. Then I added more. The prompt grew as I went, the way a conversation grows when you figure out what to say while you are saying it.
This felt natural. It looked like how I would explain a problem to a coworker: start talking, fix it as you go, answer their questions. But there is one big difference. A coworker asks questions first. A model does not. It starts executing right away. Sometimes it asks questions after. By then it has already spent tokens going the wrong way. I only saw this clearly after wasting 80,000 tokens on one vague request.
Real-time prompting has a core problem. I was thinking and typing at the same time. My thinking was not done when I started, so my prompt was not done when the model started. The model got an incomplete spec. It filled the missing pieces with its own best guesses. Some guesses were right. Some were wrong. I never knew which until the output arrived. By then the tokens were gone.
Real-time prompting is like writing code with no plan. The code works for the first feature. Then you add a second feature and patch around the first. Then a third, and you patch around the second. The structure starts holding things up in ways you never meant. The system piles up technical debt as fast as you improvise.
Real-time prompting piles up specification debt. Each message patches the gaps from the last one. You say something vague. The model picks one meaning. You clarify. The model adjusts, but it still carries its first reading. That first reading shapes how it reads your fix. You are not building a spec. You are debugging one, live, in an expensive context window.
I tracked this for a month. My average conversation with the model to get a useful output was 4.2 exchanges. Each exchange cost tokens. The clarification loop (prompt, wrong output, clarification, closer output, correction, final output) used 3x to 5x more tokens than one precise prompt would have needed. I was paying a specification debt interest rate of 200-400% per task.
The fix was not to get better at real-time prompting. The fix was to stop real-time prompting entirely for anything harder than a one-line lookup.
I changed my workflow. Now I do not open a chat window until I am done thinking. I write the prompt the way an architect writes a blueprint: before building starts, with a full spec of what needs to be built, how it must behave, and what limits the structure must respect.
The order is: understand the problem, think through the approach, write the spec, then call the model. The model is the last step, not the first. It is the contractor who builds from the blueprint. It is not the architect who designs it.
This felt slow at first. I was spending 5 or 10 minutes writing a prompt before the model did anything. But I found something important: that pre-thinking time was not extra cost. It was moved cost. The time I used to spend in clarification loops (4.2 exchanges, each several minutes) was now spent up front as thinking instead of as correction. Total time stayed about the same. Output quality went up a lot. Token cost fell by 97%.
Pre-thinking gave me one more benefit I did not expect. It sometimes showed me that I did not know what I wanted. When I tried to fill the CONSTRAINTS band and could not, it meant I had not yet decided what the constraints were. A blank band was not a prompt problem. It was a thinking problem. The format surfaced that before I wasted a whole context window finding it out.
The sinc format turned pre-thinking into a structured process. Six bands. Each one is a question I must answer before I send the prompt.
PERSONA: Who is answering this? What skill level, what voice, what role? If I cannot answer this, I have not thought about who the best responder would be for this task.
CONTEXT: What does the model need to know about the situation? Not everything. Just the context that changes how the task should be done. If I write nothing here, I am asking the model to guess the situation from the task description alone.
DATA: What facts, references, or examples does the model need to do this right? If I leave this empty, the model uses its training data. That data may not match my specific situation.
CONSTRAINTS: What must not happen? What are the scope limits? What would I reject in the output? This is the hardest band to fill. It requires negative specification: thinking about what the output is NOT, not just what it is. I found that CONSTRAINTS carries 42.7% of output quality. That is precisely because it is the band most people skip when thinking gets hard.
FORMAT: What does the output look like? Length, structure, code vs prose, headers or no headers, examples or just explanation? Without this the model picks a format from its own defaults. That is often a generic format that does not fit my needs.
TASK: What is the actual request? This is the only band I used to fill. It is the smallest band by quality contribution: 2.8%.
The checklist forces me to think before I ask the model to work. This is the right order. Thinking should come before execution. When I was doing real-time prompting, I had the order backwards: executing first, thinking second (in reaction to what the model produced). That backwards order is expensive and produces weak output.
The most surprising change was not the token savings. It was the quality of my own thinking.
When I started writing blueprints before calling the model, I began to notice gaps in my own understanding that real-time prompting had hidden. In the clarification loop, the model fills the gaps for me. It guesses, I react, and together we land on something. In the blueprint workflow, there is no model to fill the gaps. If there is a gap, the blank band sits in front of me until I fill it myself.
This is uncomfortable in a useful way. It forces me to fully understand what I want before I ask for it. Fully understanding what I want often shows that the problem is simpler than I thought, or harder, or different in kind. Sometimes I discover during blueprint writing that the task I planned to give the model is not the task that actually solves my problem. I catch that before spending a single token.
The workflow shift also changed how I see the model's role. It is not a thinking partner. It is a precision execution engine. When I treat it as a thinking partner, I hand my thinking off to it. My thinking gets worse over time because outsourcing any skill consistently weakens that skill. When I treat it as an execution engine, I do my thinking first. I produce better thinking. Then I get better execution.
My prompts are my blueprints now. I write them with the same care I give a function signature or a system design document. They exist as artifacts before the model ever sees them. The model does not shape the blueprint. It executes it.
That order, thinking first, prompting second, execution third, is the single biggest change I have made to how I work with AI tools. Not the framework. Not the fine-tuned model. Not the sinc formula. The order of operations. Think, then wish. The Genie is waiting. Make the wish precise before you say it aloud.
// Production AI Engineering
sinc-LLM designs, audits, and stabilises production AI infrastructure: from vendor evaluation and cost accountability to incident controls and MCP architecture.
See what we do →