// AI Systems Engineering · Control Theory

AI System Stability: Pole-Zero Analysis for Multi-Agent Workflows

By Mario Alexandre · AI Systems Engineer, DLux Digital · April 13, 2026 · 6 min read

An AI agent calls a second agent. That second agent calls a third. When something goes wrong, the first agent keeps retrying until it runs out of money. The user files a ticket: "the AI system is stuck." Engineers look through the logs. Nobody can name what went wrong.

This failure has a name. It has had a name since 1932. It is called a positive feedback loop with gain greater than 1. Control engineers fix this problem every day in machines, circuits, and factories. The same math applies when you build a system from LLM agents instead of physical parts.

Every Agent Is a Transfer Function

Here is the key idea that makes AI workflows manageable: an agent takes input, produces output, and holds internal state. That is exactly what a transfer function does, written H(z) in discrete-time control math. The poles of H(z) are the failure modes, the places where output explodes or the system goes dark. The zeros are the suppression points, the places where input gets dropped or output gets silenced.

When you chain agents together, one agent's output feeds the next agent's input. You are composing transfer functions. Two individually stable agents can form an unstable pair. Three agents in a loop where each one amplifies the last one's output will spiral out of control. None of this is a surprise. The math predicts it.

What the Auditor Actually Checks

The free AI System Stability Auditor reads your workflow description and runs control-theory checks:

Unstable poles: Where will the output spiral out of control? Examples include recursive calls with no stopping rule, retries with no backoff, and cost loops with no budget cap. Each failure point gets a name, a location ("right-half plane unstable" / "marginal" / "stable"), and a description.
Zeros (suppression points): Where does input get dropped? Where does intermediate output go silent? These are the spots where your system loses data and you cannot figure out why.
Gain margin (in dB): How close is your system to the edge of instability? A gain margin of 2 dB means a tiny change can tip you into runaway. A gain margin of 20 dB means you are well inside safe territory.
Feedback loops: Each loop gets a label: positive_runaway (will blow up), negative_stable (self-corrects), or none. Runaway loops are flagged with the exact cascade pattern that causes the problem.
PID-style fixes: The auditor returns three types of fixes. Each type matches one part of the proportional-integral-derivative controller idea.

The PID Mapping for AI Workflows

The PID controller has been the main tool in industrial control since the 1940s. Its three parts map directly onto AI agent fixes:

P (Proportional), immediate fix: the action you take right now when you spot the error. For an LLM agent, this means retrying immediately, checking the output right away, or switching to a safer model on the spot.
I (Integral), accumulated learning: the action you take based on a history of past errors. For an LLM agent, this is a record of past failures, prompts that kept producing bad output, and edge cases worth remembering across sessions.
D (Derivative), predictive guard: the action you take based on how fast things are changing. For an LLM agent, this is the early warning stop: "the budget is burning at 3x the normal rate; halt before we hit the cap."

A workflow with all three corrections can resist the failures that break simple systems. Most production AI systems use P (retries), partly use I (some logging), and skip D entirely (no rate-of-change tracking). That is why they crash in unexpected ways: they have no predictive guard.

Why This Matters Right Now

Every AI startup is shipping multi-agent systems. Most of them will fail under load in ways that look mysterious to teams without a control-theory background. Teams will call it "the AI is buggy" and patch it with one-off try/except blocks. The real tools that catch these failures early, things like stability analysis, gain margins, and pole/zero tracking, are missing because most engineers learned web frameworks, not control systems.

From a wiki synthesis I built that maps control systems to AI orchestration: "Rotating Bowl IS a feedback control system. Attempt → evaluate predicates → adjust. PID maps to: P = immediate retry, I = accumulated pattern learning (vault), D = predictive growth (halt before cascade)."

Try It on Your Real Workflow

Paste a real description of your AI workflow into the auditor. Even a short paragraph is enough. The tool will find the structural risks. If it flags an unstable pole that your current code does not fix, you have found a future incident. If it shows a missing D term, you have found the failure that will drain your budget at 3 AM.

The free version uses Nemotron 120B (with Gemma 31B fallback) to run the analysis. The output is structured JSON: a pole list, a zero list, a feedback-loop catalog, PID recommendations, a stability verdict, and a score. Think of it as a control engineer's report on your AI system, produced by an AI system that a control engineer designed.

From Diagnosis to Production

Finding a stability problem takes minutes. Fixing it takes weeks. For production multi-agent systems where failure has a real cost, such as runaway budgets, cascading retries, or outages customers can see, check out the paid service. The approach: every production agent has a defined transfer function, every connection has a measured gain, every loop has a documented stability margin, and every failure mode has a named pole with a documented fix. That is what BSEE-grade orchestration looks like when applied to AI.

// Try It Free

Audit Your Workflow's Stability

Paste an AI workflow or agent prompt. Returns pole-zero analysis, gain margins, identified feedback loops, and PID-style fixes. Cites specific text from your workflow.

→ Open the Tool All 8 Free Tools

// Need It at Production Scale?

Multi-Agent Orchestration Architecture — Service #35

Production multi-agent system with control-theory rigor — feedback loops, stability margins, circuit breakers, emergence detection, full BSEE-grade reliability engineering.

→ See Service · $30K – $80K + $2K – $5K/mo Book a Discovery Call

Control theory PID controller Pole-zero analysis Agent orchestration Feedback loops AI stability

// Production AI Engineering

Build AI systems that hold up in production.

sinc-LLM designs, audits, and stabilises production AI infrastructure: from vendor evaluation and cost accountability to incident controls and MCP architecture.

See what we do →