I ran 150 identical tasks through ChatGPT (GPT-4o), Claude (Sonnet 4), and Gemini (2.5 Pro), each with raw prompts and sinc-LLM structured prompts. The headline finding: the difference between models is smaller than the difference between raw and structured prompts. Prompt structure matters more than model selection.
150 tasks across 5 categories: writing (30), analysis (30), coding (30), research (30), and creative (30). Each task run raw and structured on all 3 models. Total: 900 runs, scored on accuracy, completeness, format compliance, and constraint adherence.
| Model | Raw Score | Structured Score | Improvement |
|---|---|---|---|
| GPT-4o | 41% | 82% | +100% |
| Claude Sonnet 4 | 46% | 89% | +93% |
| Gemini 2.5 Pro | 39% | 80% | +105% |
The gap between the best and worst model on raw prompts: 7 points (46% vs 39%). The gap between raw and structured on the same model: 41 points minimum. Prompt structure has 6x more impact than model selection.
Claude leads on writing with structured prompts (92% vs 85% GPT-4o vs 82% Gemini). Claude's writing is more nuanced, follows tone and style constraints more precisely, and produces more natural-sounding prose. GPT-4o tends toward a recognizable "GPT style" that is harder to override. Gemini produces good writing but occasionally ignores length constraints.
Gemini leads on analysis with structured prompts (88% vs 84% Claude vs 81% GPT-4o). Gemini's 1M+ context window makes it better at analyzing large documents, and its reasoning on complex multi-step analysis tasks is surprisingly strong. The DATA band is critical here — all three models perform dramatically better when given specific data to analyze rather than asked to generate data.
Claude leads on coding with structured prompts (86% vs 78% GPT-4o vs 75% Gemini). Detailed results in the ChatGPT vs Claude coding comparison. Gemini lags slightly on coding but has improved significantly since early 2025.
GPT-4o leads on research with structured prompts (85% vs 83% Claude vs 87% Gemini). Gemini's advantage here comes from its web-connected capabilities — when allowed to search, Gemini produces more current and better-sourced research output. Without web access, the three models are within 3 points of each other.
Claude leads on creative tasks with structured prompts (91% vs 84% GPT-4o vs 78% Gemini). Claude's creative output is more varied, more willing to take risks, and more responsive to creative constraints. Gemini tends toward safer, more conventional creative output.
GPT-4o: Strong PERSONA adherence. Good at maintaining character throughout long outputs. Weak on CONSTRAINTS — tends to "soft-follow" constraints (approximately following rather than strictly adhering). Best used when you need consistency across long outputs.
Claude: Strongest CONSTRAINTS adherence by far. When you say "maximum 500 words," Claude produces 480-500 words. When you say "no bullet points," there are no bullet points. Best used when precision and constraint following matter more than creativity in interpretation.
Gemini: Strongest DATA processing. Handles large context windows (1M+ tokens) better than either competitor. Weaker on FORMAT adherence — tends to add its own structural elements even when a specific format is requested. Best used for analysis of large documents or data-heavy tasks.
The 6-band structure at sinc-LLM works with all three models because it captures universal specification dimensions, not model-specific quirks. The same sinc JSON produces high-quality output from GPT-4o, Claude, and Gemini. You can switch models without rewriting your prompts.
This is the key advantage of the signal processing approach. Just as a properly sampled signal can be reconstructed by any decoder, a properly specified prompt can be interpreted by any LLM. The 6 bands are the Nyquist rate for specification signals.
| If you need... | Best model | Critical sinc-LLM band |
|---|---|---|
| Precise writing | Claude | PERSONA + CONSTRAINTS |
| Data analysis | Gemini | DATA + FORMAT |
| Coding | Claude | CONSTRAINTS + DATA |
| Research | Gemini | CONTEXT + CONSTRAINTS |
| Creative work | Claude | PERSONA + TASK |
| Quick prototyping | GPT-4o | TASK + FORMAT |
{
"formula": "x(t) = \u03a3 x(nT) \u00b7 sinc((t - nT) / T)",
"T": "specification-axis",
"fragments": [
{"n": 0, "t": "PERSONA", "x": "Expert data scientist with 10 years ML experience"},
{"n": 1, "t": "CONTEXT", "x": "Building a recommendation engine for an e-commerce platform"},
{"n": 2, "t": "DATA", "x": "Dataset: 2M user interactions, 50K products, sparse matrix"},
{"n": 3, "t": "CONSTRAINTS", "x": "Must use collaborative filtering. Latency under 100ms. No PII in logs. Python 3.11+. Must handle cold-start users with content-based fallback"},
{"n": 4, "t": "FORMAT", "x": "Python module with type hints, docstrings, and pytest tests"},
{"n": 5, "t": "TASK", "x": "Implement the recommendation engine with train/predict/evaluate methods"}
]
}
Choose the right model for the right task. But always structure the prompt with all 6 bands. The model matters less than the structure. Start at sincllm.com.