// FREE · 9-Question AI Spend Audit

Find the AI spend leaks in one Friday.
Recover 30 to 50 percent in six weeks.

Most CFOs see the AI line item double in 12 months and cannot name what doubled. This is the 9-question audit your finance team can run on Friday morning, your engineering team can fix in six weeks, and your board chair will not push back on. Each question has a healthy state, a watch state, a bleeding state, and the lever that recovers the dollars.

If a CFO can name where AI is bleeding money, the engineering team fixes 80 percent of it before quarter close.

// Most AI vendor contracts auto-renew on 90-day notice. The audit is most useful before renewal, not after.

Free PDF · One-time email Self-administered · No API keys required Vendor-agnostic

// What I will not do I will not put you on a newsletter. I will not call you uninvited. I will not enroll you in a sequence. The PDF is the entire deliverable. If you want a follow-up review, you book it yourself from a link inside the PDF.

// Why this audit exists

The AI line item doubled. Nobody can name what doubled. That is not an accounting problem. That is an instrumentation problem.

I learned cost discipline running electrical systems in Luanda for seven years, where every kilowatt-hour you waste shows up on a real invoice that a real CFO has to defend at a real board meeting. The grid does not forgive sloppy load profiles. Neither does production AI infrastructure. The difference is that AI vendors do not send you a load profile. They send you a total. And the total grows.

I have audited AI bills for production teams shipping live workloads. Every one of them had recoverable spend. The cheapest finding is usually idle GPU time on reserved capacity. The highest-leverage finding is usually model-tier mismatch (calls running on a $15-per-million model that would have been just as good on a $0.50-per-million model). The most embarrassing finding is usually shadow AI tools nobody in finance has ever seen.

The pattern is not that AI is expensive. The pattern is that most companies have never run an audit specifically on the AI line item, because AI feels new and inscrutable, and so the cost is treated as a tax instead of a measurable system. This audit treats it as a measurable system. The 9 questions below cover where the money goes, what healthy looks like, what bleeding looks like, and what to do about each one. Run it in one Friday afternoon. Hand it to your engineering team. Watch the next quarterly close.

The 9 Questions

01 – 09 / 09

Each question has three states: a healthy state, a watch state, and a bleeding state. The full PDF includes the lever that recovers the spend on each, plus a one-page scorecard you take into your next finance review.

Cost per resolved task

Take total monthly AI spend, divide by number of successful AI-touched outcomes (not API calls). If you cannot compute the metric, that is the diagnosis. AI delivers value per outcome, not per call. Most teams are tracking the denominator and not the numerator, and so the ratio is invisible.

Healthy: ratio is computable, trending flat or down vs revenue per outcome Watch: cost per outcome rising 10%+ MoM with stagnant outcome quality Bleeding: outcomes are not instrumented, ratio is uncomputable

Idle infrastructure burn

If you run dedicated AI infrastructure (reserved GPU instances, fine-tuned model deployments, vector databases), how many hours per month is the infrastructure billed and idle? "Reserved capacity" is a polite phrase for "paid whether used or not". The serverless alternative bills only on use.

Healthy: below 5% idle, or fully serverless Watch: 5–20% idle, especially overnight or weekend Bleeding: above 20% idle, reserved instances on unused capacity

Model-tier mismatch

What percent of your premium-tier model calls (Opus, GPT-5, Gemini-Ultra) are doing work the mid-tier (Sonnet, GPT-4o, Gemini-Pro) or low-tier (Haiku, Flash) would handle just as well? The cost ratio between tiers is 5x to 30x. The work-quality ratio on bounded tasks is rarely more than 1.1x. The math nearly always favors downshifting.

Healthy: below 20% premium-tier, tier-routing rule documented Watch: 20–50% premium-tier, no routing rule in place Bleeding: above 50% premium-tier, "we just use the best model for everything"

Cache-miss tax

All major LLM vendors offer prompt caching with a 5-minute TTL window. The cache key is the system prompt and early conversation. If you make 10 calls in 4 minutes with a stable prompt, you hit 90% cache. If you make the same 10 calls in 6 minutes, you hit 0%. Most teams do not measure the difference. Their bill does.

Healthy: above 80% cache hit on repeat workflows Watch: 30–80% cache hit, opportunity unaddressed Bleeding: below 30% cache hit, system prompts rebuilt every call

Vendor concentration premium

What percent of your total AI spend goes to your largest single vendor? A high concentration is not a sin. A high concentration with no priced-out alternative is. The vendor sets your pricing if you have not credibly tested a substitute in the last 12 months.

Healthy: below 60% to top vendor, alternatives tested in last year Watch: 60–80% to top vendor, alternatives known but untested Bleeding: above 80% to single vendor, no documented swap path

Auto-renewal exposure

List every AI vendor contract. Mark the dollar value and renewal date of each. How much spend renews in the next 90 days without an explicit re-up decision? Auto-renewals are how vendor pricing escalates without anyone defending the increase. The pre-renewal window is the only leverage you have.

Healthy: zero auto-renewals, or all reviewed and intentionally re-upped Watch: below 25% of AI spend on un-reviewed auto-renewals Bleeding: above 50% on auto-renewals not priced against alternatives

Shadow AI spend

How many AI tools are billed to corporate cards, department budgets, or individual subscriptions that do not appear in your central vendor consolidation? Shadow AI is the gap between what finance sees and what the company spends. It is usually 10 to 30 percent of the real number.

Healthy: zero shadow tools, SSO-required AI policy, central procurement Watch: below 10% of spend on shadow tools Bleeding: above 20% shadow, CISO has no visibility, data routes to unknown processors

Hallucination rework cost

How many hours per month does your team spend redoing AI-generated work that turned out to be wrong? Rework cost is the dark cost of AI. It does not show up on the AI invoice. It shows up as engineer time, marketing time, customer-success time, legal time. It is real money the spreadsheet does not see.

Healthy: below 5% of AI-touched work needs rework Watch: 5–15% rework rate Bleeding: above 15%, AI outputs require full re-verification by a human

Internal labor on AI debugging

How many engineer-hours per month go to fixing AI tool errors that no AI vendor will fix for you? When AI fails in production, somebody pays the debug bill. Usually it is your engineering team, in hours that nobody attributes back to the AI line item. If an engineer is effectively a full-time AI-fixer, that is real headcount cost on the wrong P&L row.

Healthy: below 10 engineer-hours/month on AI debugging Watch: 10–40 hours/month, undetected drift Bleeding: above 40 hours/month, an engineer is effectively an AI fixer

// Honest answers

Frequently Asked

Does this require giving you our API keys, invoices, or financial data?

No. The audit is read-by-finance-and-engineering, not run-by-Mario. The PDF is a self-administered framework with diagnostic questions, healthy thresholds, and recoverable levers. Your team runs it on your data. I never see anything unless you specifically ask for a follow-up review.

How much can a 9-question audit actually find?

In production AI workloads I have audited, 30 to 50 percent recoverable spend is typical for teams who have not specifically optimized AI cost before. The cheapest finding is usually idle GPU on reserved capacity (question 2). The highest-leverage finding is usually model-tier mismatch (question 3). The most embarrassing finding is usually shadow AI tools (question 7). Your numbers will be specific to your stack, but the order of magnitude is consistent.

We use OpenAI / Anthropic / Bedrock / Azure OpenAI. Does this still apply?

Yes. The 9 questions are vendor-agnostic. Every major LLM provider exposes the metrics this audit references (caching rates, tier-pricing differences, idle infrastructure billing). The lever changes per vendor. The diagnostic does not.

Is this a sales pitch?

Partly. The framework is free, no strings, no follow-up sequence. If your audit reveals 20 percent or more recoverable spend and your engineering team is at capacity, the obvious next step is hiring someone who has shipped this work. That could be me. But the framework is useful even if you never talk to me again. Run it, document the findings, fix the cheap ones first.

Who built this framework?

Mario Alexandre. BSEE from the University of South Florida, seven years designing electrical systems in Luanda, Angola, then a transition to building production AI infrastructure. The 9 questions come from the same accounting discipline that electrical engineering taught me: every kilowatt-hour you waste shows up on a real invoice. AI infrastructure is electrical infrastructure with extra abstraction layers. The bill works the same way.

// What changes after you run it

After the audit

Week 1: visibility

Finance and engineering have a shared spreadsheet showing dollar exposure per question. The CFO knows where the AI bill is going for the first time. Disagreement about "what the AI bill bought" stops being possible because the document exists.

Weeks 2–3: cheap fixes

Idle infrastructure (question 2) and tier-mismatch (question 3) are addressed. No architecture changes required. 10 to 20 percent recovered spend, typically. Confirms the audit was directionally right; budget for the harder questions follows.

Weeks 4–6: architecture fixes

Caching discipline (question 4), vendor concentration (question 5), auto-renewal renegotiation (question 6) are addressed. Another 15 to 30 percent recovered. These are the ones that need engineering effort and finance buy-in. They are also the ones that compound year over year.

Quarter end: documented savings

Quarterly P&L shows AI cost flat or declining despite usage growth. The CFO has a defensible narrative for the board chair. The engineering team has a recurring metric they can be measured against. Next quarter the audit gets re-run, and the bleeding questions become watch questions, and the watch questions become healthy.

Find the AI spend leaks in one Friday.Recover 30 to 50 percent in six weeks.