// AI Systems Engineering · Linear Algebra

Embedding Distance: Linear Algebra Applied to Prompt Variation

By Mario Alexandre · AI Systems Engineer, DLux Digital · April 13, 2026 · 5 min read

You write a prompt. It works. You change two words. The output changes completely. You put the words back. The output goes back too. You stare at the screen and wonder what happened.

Those two words moved your prompt to a new spot in the model's embedding space. Embedding space is the giant multi-dimensional space where the model stores meaning as numbers. A tiny change in words can mean a big jump in that space. A big change in words can mean almost no jump at all. You cannot tell which one it is just by reading.

The right tool lets you see it.

What the Embedding Distance Visualizer Does

The free Embedding Distance Visualizer takes two prompts and looks at them as vectors, which are just lists of numbers. Here is what it gives you:

Cosine similarity (0.0 to 1.0): how close the two prompts point in the same direction in embedding space
Semantic distance label: a plain-language category (identical / near / moderate / distant / orthogonal)
Shared concepts: what both prompts have in common
Unique to A: ideas found only in the first prompt
Unique to B: ideas found only in the second prompt
Semantic shift: a plain description of what changed in meaning
Intent match (0.0 to 1.0): whether the two prompts are really asking for the same thing
Likely same response: a yes/no guess on whether the model will give the same answer

The tool draws the cosine similarity as a colored bar. Red means the prompts point in very different directions. Yellow means moderate difference. Green means they are very close and the model will likely give the same answer. You can see right away whether your small wording change was actually small to the model.

Why Cosine Similarity, Not Edit Distance

Edit distance counts how many characters you need to change to go from one prompt to the other. Humans use that to judge how different two pieces of text are. The model does not care about characters at all. The model sees tokens (small word chunks), turns them into vectors (lists of numbers), and runs those numbers through many layers of math.

What the model cares about is the direction your prompt points in embedding space. Two prompts that point the same way will produce similar outputs, even if they look very different on screen. Two prompts that point different ways will produce different outputs, even if only one word changed.

That is why "rewrite this email professionally" and "make this email sound business-appropriate" give nearly the same output (cosine about 0.95): they point the same way. It is also why "summarize this" and "summarize this in one sentence" give very different output (cosine about 0.6): adding the constraint moves the prompt to a new region of the space.

What Linear Algebra Adds

The math inside a transformer is not magic. It is all linear algebra, the branch of math that works with vectors and matrices:

Embedding: each token is turned into a vector by looking it up in a big table called the embedding matrix
Q/K/V projections: the input vectors are pushed through learned matrices to create Query, Key, and Value spaces
Attention: the dot product of the Query and Key vectors sets the attention weights, which decide which tokens pay attention to which other tokens
Output: the attention-weighted Value vectors are added up and pushed through the output matrix

Every step is matrix multiplication. The eigenvalues of the embedding matrix set the main directions of meaning the model can hold. The singular values from SVD of the projection matrices tell you which dimensions matter most for your task. LoRA fine-tuning works because it adds small, low-rank matrices to the existing projections, making targeted changes only in the directions that matter and leaving the rest of the model alone.

You do not need to know this math to use the model. You do need to know it to build and tune the model.

From a wiki synthesis I built mapping linear algebra to transformer internals: "Matrix multiplication = every transformer layer. Eigenvalues = principal directions in embedding space. Q/K/V projections = linear projections. SVD = LoRA fine-tuning. Dot product = attention score. Understanding this math = understanding what the model actually does with your prompt."

How to Use the Visualizer

Compare prompt variants before A/B testing: if cosine similarity is 0.97, you do not need to A/B test. The outputs will be nearly the same. If it is 0.6, you have a real difference worth testing.
Diagnose surprising output changes: if a small wording edit broke production, paste both versions. The cosine similarity will show you whether the model really sees them as different.
Engineer prompt minimization: find the shortest prompt that still has high cosine similarity to your full prompt. That is your compressed prompt with the least information lost.

From Visualization to Engineering

The free Visualizer reads the surface. It gives you cosine similarity and a plain description of what changed. The paid service goes deeper: real attention-head analysis on your model, probing the embedding space for your tasks, and SVD-based LoRA fine-tuning aimed at the directions that matter for your work. When prompt tuning has stopped helping and you need to change the model itself, this is where you go.

// Try It Free

Compare Two Prompts in Embedding Space

Paste two prompts. Returns cosine similarity, semantic-shift analysis, shared concepts, what is unique to each, intent match, likelihood of producing the same response.

→ Open the Tool All 8 Free Tools

// Need It at Production Scale?

Transformer Internals Audit & Fine-Tuning — Service #40

Read what your model actually does. Attention-head analysis, embedding-space probing, SVD-based LoRA design. When prompt-tuning has plateaued, this is the layer beneath.

→ See Service · $20K – $60K + $1K – $3K/mo Book a Discovery Call

Embedding space Linear algebra Cosine similarity Prompt comparison LoRA Transformer internals

// Production AI Engineering

Build AI systems that hold up in production.

sinc-LLM designs, audits, and stabilises production AI infrastructure: from vendor evaluation and cost accountability to incident controls and MCP architecture.

See what we do →