What language does AI actually understand?

AI does not understand any language. It processes token IDs through attention-weighted vector computations. Structured key-value data (JSON) maps closest to its processing format. Natural language forces 5 lossy translation steps, each degrading accuracy.

Why are structured prompts better than conversational ones?

Conversational prompts undergo 5 implicit translations at ~90% accuracy each, yielding 59% combined accuracy. Structured JSON prompts eliminate these translations, achieving 90-98% accuracy with the same model.

AI Does Not Speak English — You Just Forced It To

By Mario Alexandre March 23, 2026 13 min read Intermediate AI ArchitectureSignal Processing

The Illusion of Conversation
The Actual Processing Pipeline
The Five Translations Your Prompt Undergoes
What the Model Actually Sees
Structured Input Eliminates Translation Layers
The Native Language of AI
Implications for How You Communicate

The Illusion of Conversation

When you open ChatGPT, you see a text box. It looks like a chat app. You type in English and get English back. The whole design makes you think you are having a real conversation.

You are not.

You are sending a message in natural language, one of the most confusing and noisy systems ever made. A number-based processor gets that message. It must decode it through many steps before it can do any real work. Every step can go wrong. Every unclear word, every hidden assumption, every missing detail is a place where the signal gets worse.

The chat box exists for your comfort. It is not how the model works. It is not what the model needs. And it makes every interaction worse than it could be.

The Actual Processing Pipeline

Here is what really happens when you type "Write me a marketing strategy" into an LLM:

Tokenization. Your sentence gets split into tokens. "Write me a marketing strategy" becomes about 5 tokens. Each token is a whole-number ID from a vocabulary of 32,000-128,000 entries. The word "marketing" maps to one token. The phrase "marketing strategy" maps to 2 tokens that the model must learn to link. Information is already lost. The idea of "marketing strategy" as a single concept is split across 2 number indices.
Embedding. Each token ID is turned into a high-dimensional vector (768 to 12,288 dimensions depending on model size). This vector captures statistical links between this token and every other token seen in training. It does not capture your intent. It captures which words tend to appear near this one.
Positional encoding. The model adds information about where each token sits in the sequence. It does not know that "Write" is a command and "strategy" is the object. It only knows that token at position 0 comes before token at position 1. That is order information, not meaning.
Attention computation. Through 32 to 128 attention layers, each token vector is updated based on its link to every other token vector. This is where the model tries to infer what "marketing strategy" means given the nearby tokens. But with only 5 tokens of input, the attention mechanism has almost nothing to work with. It fills the gaps from parametric memory, the patterns it learned during training.
Probability distribution. The final layer produces a probability list over the whole vocabulary for the next token. The model picks the highest-probability token and starts generating. This pick is based on the 5-token input plus billions of parameters that encode patterns from training data.

Count the steps: words become tokens (lossy), tokens become number lists (based on statistics, not meaning), those lists flow through attention layers (filling gaps from training data), then out comes a probability list for the next word. Four steps, each adding noise. I mapped this pipeline while building sinc-LLM.

The Five Translations Your Prompt Undergoes

On top of the mechanical steps, your natural language prompt also needs semantic translations. These pile up errors:

Translation	What Happens	Error Source
Ambiguity resolution	Model guesses which meaning of each word you intended	"Strategy" could mean military, business, game, or communication strategy
Implicit context inference	Model infers unstated context from training distribution	Assumes your company size, industry, budget, timeline from statistical averages
Intent decomposition	Model decomposes your vague request into sub-tasks	Decides what "marketing strategy" includes/excludes without guidance
Constraint inference	Model invents boundaries you did not specify	Picks a length, tone, format, detail level, and scope arbitrarily
Output format selection	Model decides how to structure the response	Chooses between bullet points, paragraphs, headers, tables with no direction

Each step has an accuracy rate. If each step is 90% accurate (a generous guess for unclear natural language), then 5 steps at 90% each give: 0.9 × 0.9 × 0.9 × 0.9 × 0.9 = 59% final accuracy. This is the translation tax. You pay it on every chat prompt. And 90% per step is optimistic. For truly unclear prompts, each step can drop to 70%, giving: 0.7^5 = 16.8% final accuracy.

What the Model Actually Sees

Set aside the chat box. Look at what the model really processes. It does not see your sentence. It sees a list of whole numbers:

[16594, 757, 264, 8661, 8446]

That is your "Write me a marketing strategy." Five numbers. No grammar, no meaning, no goal. Just five positions in a lookup table. The model must rebuild everything else from those 5 numbers and billions of stored patterns. It has to guess your intent, your context, your limits, and the format you want.

Now compare that to a structured sinc prompt, the format I designed. Instead of 5 unclear tokens, the model gets 150-200 tokens in clear key-value pairs. Each band is labeled, typed, and bounded. The model does not need to translate. It does not need to guess. The signal arrives already decoded.

Structured Input Eliminates Translation Layers

JSON-structured input removes translation layers because it matches how the model processes information:

Key-value pairs map to attention patterns. When the model sees "persona": "B2B SaaS marketing strategist", the attention mechanism can link the persona directly to every later token. No guessing needed.
Explicit labels remove the need to guess intent. The model does not have to figure out what kind of limits you want because they sit in a field labeled "constraints."
Typed fields remove format guessing. When you write "format": "3 strategies in table format", the model skips the format-selection step entirely.
Hierarchical nesting maps to contextual dependency. The model's attention mechanism handles nested structures naturally. Transformer architecture was designed to process hierarchical relationships.

With structured input, the 5 translation steps shrink to about 1 (tokenization, which cannot be avoided). Accuracy goes from 59% (5 steps at 90%) to 90% (1 step at 90%). When you also account for the lower ambiguity of structured tokens, it is closer to 95-98%.

The Native Language of AI

People ask me what language AI "thinks" in. My answer: it does not think. But the closest match to its processing format is structured key-value data. JSON. Not English. Not any natural language.

The transformer was built to process token sequences with attention-weighted links. Structured data gives those links directly. Natural language makes the model hunt for them through statistics. One way is efficient. The other wastes compute, tokens, and money on translations that should not be needed.

When you talk to AI in English, you are making a number-based processor do 5 lossy translations before it can even start on your problem. When you talk to AI in structured JSON, you are using something close to its native processing format. The difference in output quality is like the difference between a whisper and a clear order.

Implications for How You Communicate

This is not an argument against natural language interfaces. They exist for a good reason. Most people cannot and should not write JSON. My point is that the layer between you and the model should handle the translation, converting your natural language into 6-band structured input before it reaches the model.

That is what my sinc-LLM framework does. You give it a raw prompt. It breaks that prompt into 6 specification bands. It checks that nothing is missing. It computes the signal-to-noise ratio. Then it sends a structured signal to the model that cuts translation loss.

The model does not speak English. You just made it try. Every time you do that, you pay the translation tax in accuracy, in tokens, and in money. A better way exists. It is free. And it works.

Transform any prompt into 6 Nyquist-compliant bands

Try sinc-LLM Free

Or install: pip install sinc-llm

// Production AI Engineering

Build AI systems that hold up in production.

sinc-LLM designs, audits, and stabilises production AI infrastructure: from vendor evaluation and cost accountability to incident controls and MCP architecture.

See what we do →