AI Does Not Speak English — You Just Forced It To

By Mario Alexandre March 23, 2026 13 min read Intermediate AI ArchitectureSignal Processing

The Illusion of Conversation

When you open ChatGPT, you see a text box. It looks like a chat app. You type in English and get English back. The whole design makes you think you are having a real conversation.

You are not.

You are sending a message in natural language, one of the most confusing and noisy systems ever made. A number-based processor gets that message. It must decode it through many steps before it can do any real work. Every step can go wrong. Every unclear word, every hidden assumption, every missing detail is a place where the signal gets worse.

The chat box exists for your comfort. It is not how the model works. It is not what the model needs. And it makes every interaction worse than it could be.

The Actual Processing Pipeline

Here is what really happens when you type "Write me a marketing strategy" into an LLM:

  1. Tokenization. Your sentence gets split into tokens. "Write me a marketing strategy" becomes about 5 tokens. Each token is a whole-number ID from a vocabulary of 32,000-128,000 entries. The word "marketing" maps to one token. The phrase "marketing strategy" maps to 2 tokens that the model must learn to link. Information is already lost. The idea of "marketing strategy" as a single concept is split across 2 number indices.
  2. Embedding. Each token ID is turned into a high-dimensional vector (768 to 12,288 dimensions depending on model size). This vector captures statistical links between this token and every other token seen in training. It does not capture your intent. It captures which words tend to appear near this one.
  3. Positional encoding. The model adds information about where each token sits in the sequence. It does not know that "Write" is a command and "strategy" is the object. It only knows that token at position 0 comes before token at position 1. That is order information, not meaning.
  4. Attention computation. Through 32 to 128 attention layers, each token vector is updated based on its link to every other token vector. This is where the model tries to infer what "marketing strategy" means given the nearby tokens. But with only 5 tokens of input, the attention mechanism has almost nothing to work with. It fills the gaps from parametric memory, the patterns it learned during training.
  5. Probability distribution. The final layer produces a probability list over the whole vocabulary for the next token. The model picks the highest-probability token and starts generating. This pick is based on the 5-token input plus billions of parameters that encode patterns from training data.

Count the steps: words become tokens (lossy), tokens become number lists (based on statistics, not meaning), those lists flow through attention layers (filling gaps from training data), then out comes a probability list for the next word. Four steps, each adding noise. I mapped this pipeline while building sinc-LLM.

The Five Translations Your Prompt Undergoes

On top of the mechanical steps, your natural language prompt also needs semantic translations. These pile up errors:

TranslationWhat HappensError Source
Ambiguity resolutionModel guesses which meaning of each word you intended"Strategy" could mean military, business, game, or communication strategy
Implicit context inferenceModel infers unstated context from training distributionAssumes your company size, industry, budget, timeline from statistical averages
Intent decompositionModel decomposes your vague request into sub-tasksDecides what "marketing strategy" includes/excludes without guidance
Constraint inferenceModel invents boundaries you did not specifyPicks a length, tone, format, detail level, and scope arbitrarily
Output format selectionModel decides how to structure the responseChooses between bullet points, paragraphs, headers, tables with no direction

Each step has an accuracy rate. If each step is 90% accurate (a generous guess for unclear natural language), then 5 steps at 90% each give: 0.9 × 0.9 × 0.9 × 0.9 × 0.9 = 59% final accuracy. This is the translation tax. You pay it on every chat prompt. And 90% per step is optimistic. For truly unclear prompts, each step can drop to 70%, giving: 0.7^5 = 16.8% final accuracy.

What the Model Actually Sees

Set aside the chat box. Look at what the model really processes. It does not see your sentence. It sees a list of whole numbers:

[16594, 757, 264, 8661, 8446]

That is your "Write me a marketing strategy." Five numbers. No grammar, no meaning, no goal. Just five positions in a lookup table. The model must rebuild everything else from those 5 numbers and billions of stored patterns. It has to guess your intent, your context, your limits, and the format you want.

Now compare that to a structured sinc prompt, the format I designed. Instead of 5 unclear tokens, the model gets 150-200 tokens in clear key-value pairs. Each band is labeled, typed, and bounded. The model does not need to translate. It does not need to guess. The signal arrives already decoded.

Structured Input Eliminates Translation Layers

JSON-structured input removes translation layers because it matches how the model processes information:

With structured input, the 5 translation steps shrink to about 1 (tokenization, which cannot be avoided). Accuracy goes from 59% (5 steps at 90%) to 90% (1 step at 90%). When you also account for the lower ambiguity of structured tokens, it is closer to 95-98%.

The Native Language of AI

People ask me what language AI "thinks" in. My answer: it does not think. But the closest match to its processing format is structured key-value data. JSON. Not English. Not any natural language.

The transformer was built to process token sequences with attention-weighted links. Structured data gives those links directly. Natural language makes the model hunt for them through statistics. One way is efficient. The other wastes compute, tokens, and money on translations that should not be needed.

When you talk to AI in English, you are making a number-based processor do 5 lossy translations before it can even start on your problem. When you talk to AI in structured JSON, you are using something close to its native processing format. The difference in output quality is like the difference between a whisper and a clear order.

Implications for How You Communicate

This is not an argument against natural language interfaces. They exist for a good reason. Most people cannot and should not write JSON. My point is that the layer between you and the model should handle the translation, converting your natural language into 6-band structured input before it reaches the model.

That is what my sinc-LLM framework does. You give it a raw prompt. It breaks that prompt into 6 specification bands. It checks that nothing is missing. It computes the signal-to-noise ratio. Then it sends a structured signal to the model that cuts translation loss.

The model does not speak English. You just made it try. Every time you do that, you pay the translation tax in accuracy, in tokens, and in money. A better way exists. It is free. And it works.

Transform any prompt into 6 Nyquist-compliant bands

Try sinc-LLM Free

Or install: pip install sinc-llm

// Production AI Engineering

Build AI systems that hold up in production.

sinc-LLM designs, audits, and stabilises production AI infrastructure: from vendor evaluation and cost accountability to incident controls and MCP architecture.

See what we do →