// FREE · 10-Point AI Vendor Audit

AI engineered like critical infrastructure.
Run the audit before you sign.

Most AI agencies ship demos. Few engineer for production. This is the 10-point checklist that tells you which is which, in 15 minutes, before you commit a six-figure budget. Built from BSEE-grade engineering: redundancy, fault tolerance, monitoring, ownership.

// Most AI vendors give the same demo to every prospect. The audit is most useful before contract signing, not after.

Free PDF · One-time email Run on any AI vendor 15 min to complete

// What I will not do I will not put you on a newsletter. I will not call you uninvited. I will not enroll you in a sequence. The PDF is the entire deliverable. If you want a follow-up review, you book it yourself from a link inside the PDF.

// Why this audit exists

I used to think most AI agencies were the same. They aren't.

For a while, I assumed the AI agency landscape was uniform. They all ship working demos. They all promise integrations. They all show charts at the kickoff. Picking one was a coin flip on price and personality.

Then I watched a $50K AI project break in week 3 because nobody had built monitoring. The agency had wired a Claude prompt to a Stripe webhook and called it shipped. When the Stripe API rate-limited under real volume, the project went silent. No alarm. No fallback. No log of what failed. Just a stack of refund tickets and a confused operations team.

That's when I realized the agencies-are-the-same story is wrong, and the fix isn't shopping harder. It's learning to spot the engineering layer the agency doesn't talk about. Production AI has a structural shape. Demos don't. The 10 points below make that shape visible. Run them on any vendor in 15 minutes. The ones who pass are the ones engineered for production.

The 10 Points

01 – 10 / 10

Each criterion is a yes/no with a single specific question to ask. If your vendor can't answer it in plain English, that's the fail signal. You don't have to evaluate the answer's merit, you have to notice whether they have an answer at all.

Monitoring on every critical path

Production AI fails silently more often than it fails loudly. Without monitoring, the first signal is angry customers. Every external API call, every model invocation, every webhook trigger needs an observability hook.

Ask: "Show me the dashboard for the last 24 hours of production traffic. If you can't, that's a fail."

Error budgets and SLOs

Engineering teams that work in production write down what "acceptable failure" looks like. Without a stated availability target, you can't tell whether an outage is normal or not, and you can't decide when to roll back.

Ask: "What is the system's stated availability target, and what happens operationally when it's breached?"

Source-code ownership and audit trail

If you can't see the code that runs your workload, you don't own it. Period. No "we'll send a copy on request". No "it's hosted on our platform". Real ownership means git access today, not a promise.

Ask: "Walk me through the git history of the production deploy. Who committed what, when?"

Drift and concept-drift detection

An AI system that worked on a demo dataset can quietly degrade as real input distributions shift. Without drift detection, you discover the degradation through customer complaints. Through quality-of-service tickets. Through churn.

Ask: "How does the system detect that the distribution of inputs has shifted from training time?"

Fallback paths when the LLM fails

Every LLM API will return 5xx eventually. Every model deprecation will eventually happen. Production code knows what to do when the primary call fails. Demo code throws an exception and stops.

Ask: "Trace the code path that runs when OpenAI or Anthropic returns a 5xx. Show me the line."

Cost-anomaly alarms

A misconfigured retry loop or a bug in a prompt template can multiply your monthly bill 10x in an afternoon. You need a hard ceiling, an alert on unusual spend, and a kill switch. Not on next month's invoice. Right now.

Ask: "What dollar threshold of unexpected spend triggers an alert, and to whom?"

Model-update cadence and rollback

Anthropic ships a new Claude every few months. OpenAI ships a new GPT every few months. Each is a behavioral change. Without a documented evaluation procedure, model updates either ship blind or never ship at all.

Ask: "When the next major model ships, what's the documented procedure for evaluating, deploying, and rolling back?"

On-call coverage and incident response

If your AI system goes down at 2 AM on a Saturday, somebody has to wake up. If nobody is named, nobody is responsible. The agency that says "we monitor business hours" is telling you what happens at 2 AM.

Ask: "Who gets paged when the system goes down at 2 AM, and what's the documented response time?"

Data-handling and privacy boundaries

Your customer data flows through the AI vendor's stack into model APIs and back. Each hop is a privacy decision. Without a clear documented map, your compliance posture is a guess.

Ask: "What customer data leaves your infrastructure, where does it go, and how is it handled at each hop?"

Documented hand-over, no platform lock-in

Every consulting engagement ends. The good ones leave you operating, with full source, deployment scripts, runbooks, credentials. The bad ones leave you locked out of a dashboard you can't access without the vendor's login.

Ask: "If I terminate our contract tomorrow, exactly what do I receive, and what do I keep operating?"

// Honest answers

Frequently Asked

I'm not technical. Can I run this audit?

Yes. Each criterion is one yes/no with a single specific question to ask. If your vendor can't answer the question in plain English, that's the fail signal. You don't have to evaluate the answer's merit, you have to notice whether they have an answer at all. Most non-technical buyers run the audit with their CTO or a trusted technical friend on the call.

We already have an AI vendor. Should I run this on them?

Yes. Run it on every existing vendor and every prospective vendor. The audit is most useful pre-renewal, when you're still leverage-balanced. After renewal you're locked in for another term and the questions get expensive.

What if my vendor passes 7 of 10?

That's the most common outcome and it's still informative. The 3 they fail tell you exactly what your operational risk looks like. You can then decide whether to push them to fix those gaps, accept the risk, or pivot. A vendor that passes 10 of 10 is rare. A vendor that fails 5 or more is a structural problem.

Is this a sales pitch?

Partly. The audit itself is free, no strings, no follow-up sequence. If your current vendor fails badly, the obvious next step is hiring someone who passes, and yes, that could be me. But the audit is useful even if you never talk to me again. Run it, share the results with your team, decide from there.

Who built this audit?

Mario Alexandre. BSEE from the University of South Florida, 7 years designing electrical systems in Luanda, Angola, then a transition to building production AI systems. The 10 criteria come from the same engineering discipline I applied to power-system design: redundancy, fault tolerance, monitoring, fallback, ownership. The medium changed. The method didn't.

// What changes after you run it

After the audit

You can name the 3 weakest links in your AI vendor's stack

Not "I have a bad feeling about this". Specific gaps, in writing, that you brought to a vendor meeting and they couldn't fill. That's a different conversation than "is this working".

You ask the right follow-up questions in the next vendor meeting

The audit shifts the conversation from "tell me about your AI" to "show me your monitoring dashboard". Vendors who can answer the second question want different clients than vendors who only handle the first.

You have a written record of what they answered (and didn't)

If something breaks 6 months from now, the audit notes are the difference between "we never asked" and "we asked, and they said it was handled". That difference matters legally, contractually, and politically inside your organization.

If your current system fails the audit badly, you have a structured rebuild path

Each failed criterion is a specific gap with a specific fix. That's not a "we need to rebuild everything" panic. It's a prioritized list, sized by effort, that your engineering team can execute against. Or that you can hand to me, if you want it done.

AI engineered like critical infrastructure.Run the audit before you sign.