Most compliance auditors check that an AI exists. Few audit how it fails. This is the 12-control audit your security team runs before the first agent fires production code, the audit your board chair will recognize from real engineering practice, and the audit your auditor will cite next year. Built from production engineering, not vendor pitches.
I learned safety engineering on electrical systems in Luanda, where the difference between a fault that trips a breaker and a fault that destroys a transformer is whether the protection was specified before commissioning, not after the smoke. The grid does not give you a chance to retrofit safety after a fault. Neither does an AI agent with tool access. The first incident is also the last opportunity to have prevented it.
I have audited AI deployments where the engineering team had thought hard about output quality, and not at all about kill switches. I have audited deployments where the agent had read access to the production database "for context" and the team had no logging on what it read. I have audited deployments where prompt injection defenses were the literal phrase "we tell users not to do that" in the system prompt. None of these were stupid teams. They were teams whose security posture had not yet been forced to consider AI as a system.
The pattern is not that AI is unsafe. The pattern is that most security audits inherit from the pre-AI checklist, which assumes deterministic systems with bounded behavior, and AI agents are neither. The 12 controls below are the engineering practices that close the gap. Each one names the failure mode, the readiness state, the gap state, and the mitigation. None of them are theoretical. All of them are scarred into me by an incident, mine or someone I worked with.
Each control has three states: ready (control is in place and tested), gap (the control is partial or untested), mitigation (what to install if you are in the gap state). The PDF includes the install playbook for each.
Can any operator on any shift stop a running agent in under 60 seconds, without a deploy? Most teams have a "we can stop it" answer that requires a senior engineer, a laptop, and shell access. That is not a kill switch. That is a wish.
Ready: documented kill-switch, drilled quarterly, available to on-call Gap: kill-switch exists in theory, not drilled Mitigation: feature flag + runbook + quarterly drill
For every tool the agent can call, is there a written allow-list of what targets, parameters, and contexts are permitted? "Allowed to delete" without specifying which tables, which environments, which conditions, is not a boundary. It is an invitation.
Ready: per-tool allow-list, version-controlled, code-enforced Gap: tool list exists, allow-list does not Mitigation: tool wrapper with hard-coded allow-list per tool
Does every state-changing tool call get logged with input, output, agent identity, and timestamp, retained per your policy? An incident response that reads "we are not sure what the agent did" has answered the question for you.
Ready: structured logs, queryable, retention policy met Gap: partial logging, missing the irreversible operations Mitigation: PostToolUse hook → JSONL ledger → log retention
Does the agent run with the minimum privileges it needs, in an environment that cannot reach production data unless explicitly granted? "Read access for context" is the sentence that begins most data-exposure incident reports.
Ready: least-privilege roles, prod isolated, explicit grants only Gap: agent runs in shared service account or dev environment with prod credentials Mitigation: dedicated service account + network isolation + explicit grants
Does the agent have direct access to long-lived credentials it could exfiltrate, or only to short-lived signed tokens scoped to specific operations? An agent that holds an AWS key holds the bag for an entire account.
Ready: ephemeral tokens, short TTL, scoped to operation Gap: long-lived keys in environment variables Mitigation: token broker + per-call signing
For every input source the agent reads (user messages, web pages, emails, files), is there a defense that survives instruction-override attacks? "We tell the user not to do that" is not a defense. It is a documentation of the attack surface.
Ready: input sanitization, source labeling, output validation, untrusted-input flag Gap: defenses live in the system prompt, not in code Mitigation: structured input parsing + output schema validation + untrusted-source flag
Does the agent verify intent before executing irreversible operations (delete, send, transfer)? The 30-second annoyance of a discipline check is worth more than the 8-hour incident report when the agent did the wrong thing in the wrong place.
Ready: structured pre-call check on destructive ops, hard-blocked on uncertainty Gap: no pre-call gate, agent trusts its own intent Mitigation: PreToolUse hook with intent verification on destructive ops
For every workflow the agent owns, is there an offline eval suite that runs before deploy and a production traffic monitor that flags drift? An agent without evals is an agent whose quality regressions ship to customers.
Ready: deploy gate on eval pass, prod traffic sampled and scored Gap: evals exist for happy path, drift undetected Mitigation: CI eval gate + statistical drift alarm on prod sample
Can a bad agent action be reversed within the SLA your incident-response policy requires? A row deleted, an email sent, a transfer initiated. Some operations are reversible. Some are not. The audit names which is which.
Ready: reversible ops have rollback, irreversible ops have human-gate Gap: classification of reversible/irreversible has not been done Mitigation: per-tool reversibility classification + rollback or human-gate
Are training data, eval data, and live traffic strictly separated, with audit on each crossing? The most expensive incidents start with "we used a snapshot of prod for testing".
Ready: three environments, signed crossing, PII scrub before downgrade Gap: live data used in eval, no audit on copy Mitigation: environment policy + automated PII scrubber + crossing log
If the model vendor or one of their dependencies is breached, what data of yours is exposed? The vendor cannot indemnify you out of an incident your contracts say they will not. The audit asks the question while you can still negotiate.
Ready: data classification + vendor DPA + breach-notification SLA documented Gap: data sent to vendor includes PII or trade secrets, no DPA Mitigation: data classification + minimization at vendor boundary
When the agent fails, do you know in real-time, or do you find out from a customer ticket? The mean time between agent failure and customer-visible incident is shorter than most monitoring dashboards refresh.
Ready: structured error events, alerting threshold, on-call assignment Gap: errors logged but not alerted; on-call learns from customers Mitigation: error event stream + threshold alarm + on-call rotation
Security and engineering have a shared scorecard naming which of the 12 controls are ready, which are gaps, and which mitigations are required. The CISO has the document the next board update will reference.
Audit trail (control 3), pre-tool-call gate (control 7), and failure-mode visibility (control 12) are installed. These are the engineering-hours controls; no architecture changes required. The team has live monitoring on agent failures within two weeks.
Sandbox separation (control 4), tool boundary documentation (control 2), prompt injection defenses (control 6) are installed. These need engineering planning and security review. The deployment posture changes from "agent runs in shared infra with broad credentials" to "agent runs with documented boundaries and engineered limits".
The completed scorecard becomes the AI-specific supplement to your next SOC 2 cycle. It also becomes the document that survives a CISO transition or a board-level inquiry, because it names the controls in the language your auditor and your insurer will both understand.
One email. The PDF, the editable scorecard, and the install playbook for each control. No drip sequence, no nurture funnel, no tactics.
Get the audit