AI Vendor Exit Checklist: Migrating Off a Provider Without Losing Production Continuity
Table of Contents
- Why Vendor Exits Fail (and It Is Not the New Provider's Fault)
- Before You Migrate: What You Need From the Exiting Vendor
- During Migration: Running Parallel Without Breaking Production
- After Cutover: Verifying the New Provider Before Retiring the Old
- The Checklist in One View
- Evaluate Your Replacement Vendor With the Same Rigor
- Conclusion
Why Vendor Exits Fail (and It Is Not the New Provider's Fault)
Every AI vendor migration that breaks production shares the same three structural failures: no code ownership, no data export path, and no tested fallback. These are not execution failures at migration time. They are contract failures from the signing date.
When an engineering team begins a vendor exit, they discover that the pipeline code lives in the vendor's private repository, the fine-tuned model weights are stored in a proprietary format the new provider cannot read, and the incident runbook references escalation paths that only the vendor's staff can execute. None of this was documented in the original contract. The migration cannot proceed until each gap is resolved, and the resolution timeline was not budgeted.
The underlying cause is the absence of Audit Criterion 10 from sincllm's 10-Point AI Vendor Audit: documented handover with no lock-in. Most AI contracts address runtime reliability and SLOs. They omit exit rights entirely. A vendor can have 99.9% uptime and still own your code, your data, and your runbooks in ways that make exit expensive. Operational reliability and exit rights are independent contract properties. The SLA governs runtime; the handover clause governs exit.
The NIST AI RMF 1.0 GOVERN function addresses third-party AI vendor management and exit planning as a required element of responsible AI deployment (see NIST AI RMF 1.0). ISO/IEC 42001:2023 includes supplier relationship management requirements covering transition and handover (see ISO/IEC 42001:2023). Both treat exit planning as a front-of-contract obligation, not a post-migration cleanup task.
Running this checklist before your next vendor relationship prevents the contract gaps you just spent weeks resolving.
Download the 10-Point AI Vendor AuditBefore You Migrate: What You Need From the Exiting Vendor
1. Demand a Complete Handover Package
The handover package is not a data export. It is a transfer of operational ownership. A complete package includes: a git repository URL with your team as owner (not a zip archive), all model configuration files in a portable format, integration scripts and API client code with documented dependencies, API keys and credential transfer procedures with defined revocation timelines, and written documentation of every customization the vendor made that is not in the public API.
The practical test: can your team stand up the integration from the handover materials without contacting the vendor? If the answer is no, the handover is incomplete.
This maps to Audit Criterion 3 (source-code ownership and audit trail). The engineering proof that code ownership makes a vendor switch feasible is documented in sincllm's own production work: see engineering proof that replacing an API provider is feasible when the code is yours. Without code ownership, the migration timeline is set by the vendor's cooperation level, not your team's execution speed.
For teams concerned about whether the next vendor relationship will repeat this gap, the source-code ownership contract clause analysis covers the specific language to require before signing.
2. Confirm Data Portability and Export Format
Training data, fine-tune datasets, and logged inference history are often stored in the vendor's cloud infrastructure with no documented export path. The contract may have a data-residency clause that blocks export to specific regions. Discovering this during migration, rather than before, adds weeks to the timeline.
Request the following in writing before migration begins: the export format for all training data and fine-tune datasets, the export format for logged inputs and outputs (with date range coverage), any data-residency restrictions that apply to the export, and the vendor's data retention and deletion schedule after contract termination.
This maps to Audit Criterion 9 (data-handling and privacy boundaries). The EU AI Act (Regulation 2024/1689) places documentation and handover obligations on providers of high-risk AI systems, including data portability requirements at termination (see EU AI Act). For teams evaluating the replacement provider's data handling before committing, the Build vs Buy Framework covers data sensitivity and residency as Criterion 3 of the replacement evaluation.
3. Get the Incident History and Runbook
The incident history is a migration-critical artifact, not a courtesy document. The new team inherits the failure modes of the old system. Without the incident log, they will rediscover each failure in production, without the context of how it was resolved the first time.
A concrete example of the risk: if the exiting vendor handled a prompt-injection incident quietly by adding a server-side filter, the new team may not know the filter exists, may not replicate it in the new integration, and will encounter the same injection pattern in production without the defense layer in place. OWASP LLM Top 10 (2025) classifies supply chain vulnerabilities (LLM05) as one of the primary risk categories that vendor-exit planning directly addresses (see OWASP LLM Top 10 2025).
Request the full incident log for the past 12 months, the current on-call runbook with escalation paths, and documentation of any known failure modes and their current mitigations. This maps to Audit Criterion 8 (on-call and incident response).
During Migration: Running Parallel Without Breaking Production
4. Stand Up a Shadow Environment Before Cutting Over
A shadow environment routes a controlled percentage of production traffic to the new provider while the old integration continues serving live requests. The shadow responses are logged and compared; they do not affect the response returned to the end user. This is not a full parallel system. It is a read-only routing tap on existing infrastructure.
Start at 5% of traffic routed to the new provider. Define your success criteria before the shadow run begins, not after: target latency at p95, accuracy rate on your evaluation set, and cost per resolved task. A shadow run without defined pass/fail criteria is an observation exercise, not a migration gate.
This protocol is grounded in the same parallel-run and shadow-testing discipline used in sincllm's own production work. The production migration example of a local model replacing a vendor API without downtime used a controlled traffic-routing approach before cutover.
5. Establish a Fallback Path Back to the Exiting Vendor
The old vendor integration must stay live and tested until the new one passes a defined evaluation period. This is Audit Criterion 5 (fallback paths). A fallback that is not tested is not a fallback.
"Live and tested" means: a synthetic request routed to the old provider daily, with an alert that fires if that path returns an error. Not "we kept the old API keys." A fallback path that has not been exercised in two weeks will fail when you need it, and you will need it at the worst possible moment.
Define the evaluation period before migration begins. A reasonable baseline for most production workloads is 5 business days of shadow traffic passing the defined success criteria before cutover. Adjust the period based on the criticality of the workload and the available evaluation data.
6. Migrate Monitoring and Alerting First, Not Last
The most common migration sequencing error is moving the model first and adding monitoring later. A migration that moves the model but not the observability layer fails silently. By the time an unmonitored error rate climbs to a level that triggers a user complaint, the blast radius is already large.
Every critical path needs a monitor before the cutover date. This is Audit Criterion 1 (monitoring on every critical path). The reliability-engineering principle is the same as testing circuit breakers before restoring power: the monitoring layer is the diagnostic surface, and a system without it is a system that fails without warning.
The specific monitors to stand up before cutover: latency at p95 and p99, error rate by endpoint, cost-per-task (to catch runaway token consumption), and accuracy on a synthetic evaluation set run on a defined schedule.
After Cutover: Verifying the New Provider Before Retiring the Old
7. Run the 10-Point AI Vendor Audit on the Replacement Provider
The reader is replacing one vendor. The only way to avoid the same contract gaps on the next relationship is to audit the replacement provider against the same 10 production-engineering criteria before signing. All 10 criteria from sincllm's 10-Point AI Vendor Audit apply equally to the vendor you are moving to: monitoring, SLOs, source-code ownership, drift detection, fallback paths, cost-anomaly alarms, rollback cadence, incident response, data-handling, and documented handover.
The contract signed at the start of a vendor relationship governs the worst-case scenario. The demo and the SLA govern the expected case. Most teams evaluate only the expected case. The 10-Point Audit evaluates the worst case, which is where vendor exits live.
Run the same production-engineering checklist on your replacement vendor before you sign.
The 10-Point AI Vendor Audit translates these questions into a repeatable production-engineering checklist: source-code ownership, audit trail, SLOs, fallback paths, and exit clause. Free 16-page PDF, 15 minutes per vendor.
→ Download the 10-Point AI Vendor Audit8. Test Rollback Before You Need It
Rollback is not a contingency plan. It is an engineering step with a defined trigger and a verified execution path. Define the rollback trigger in metrics before cutover: latency above a defined threshold at p95, error rate above a defined percentage, or cost-per-task above a defined ceiling. These numbers are specific to your workload; they are not set by the vendor.
Verify the rollback path is live and tested before decommissioning the old provider. This maps to Audit Criterion 7 (model-update cadence and rollback). The verification test is simple: execute the rollback procedure in a staging environment, confirm the old integration responds correctly, and confirm the routing switch takes less than your defined maximum acceptable response time.
9. Document the New Architecture and Update On-Call Runbooks
Every architectural change that is not documented is a future incident. The migration is complete from an operational standpoint only when the new architecture is documented: an updated architecture diagram, new on-call runbooks with the replacement provider's escalation paths, and any new failure modes discovered during the shadow period documented with their mitigations.
This is not post-migration cleanup. It is an exit criterion for the migration itself. A migration that ends without updated runbooks transfers the institutional knowledge gap from the old vendor to the new one.
10. Close the Old Vendor Relationship: Data Deletion and Contract Exit Clauses
After cutover is verified and rollback is confirmed no longer needed, close the old vendor relationship with the same rigor applied to opening it. Confirm in writing that the vendor has deleted your data from their systems. This is required for compliance under GDPR and the EU AI Act for most production workloads.
Revoke all API credentials and rotate any keys the vendor had access to, including keys used for integrations the vendor operated. Verify whether the exiting contract has an auto-renewal clause: many AI vendor contracts auto-renew 30 to 90 days before the contract end date, and the legal termination notice period may need to be submitted before the renewal trigger fires. Discovering this after the renewal date means paying for another contract term on a provider you have already migrated off.
The Checklist in One View
| Step | What to Obtain or Verify | Audit Criterion | Owner | Status |
|---|---|---|---|---|
| 1 | Complete handover package: git repo, model configs, integration scripts, credentials, customization docs | Criterion 3: Source-code ownership | Eng | ☐ |
| 2 | Data portability confirmed: export format, data-residency restrictions, deletion schedule | Criterion 9: Data-handling boundaries | Eng / Legal | ☐ |
| 3 | Incident history and runbook received: 12-month log, on-call procedures, known failure modes | Criterion 8: On-call and incident response | Eng | ☐ |
| 4 | Shadow environment live: 5% traffic routed to new provider, success criteria defined | Criterion 1: Monitoring on every critical path | Eng | ☐ |
| 5 | Fallback path live and tested: daily synthetic request to old provider, alert on failure | Criterion 5: Fallback paths | Eng | ☐ |
| 6 | Monitoring and alerting stood up on new provider before cutover: latency, error rate, cost, accuracy | Criterion 1: Monitoring on every critical path | Eng | ☐ |
| 7 | 10-Point AI Vendor Audit run on replacement provider before signing new contract | Criterion 10: Documented handover, no lock-in | Eng / Legal | ☐ |
| 8 | Rollback trigger defined in metrics and rollback path verified in staging | Criterion 7: Rollback cadence | Eng | ☐ |
| 9 | Architecture diagram updated, on-call runbooks updated with new escalation paths | Criterion 8: On-call and incident response | Eng | ☐ |
| 10 | Old vendor: data deletion confirmed in writing, credentials revoked, auto-renewal clause verified | Criterion 10: Documented handover, no lock-in | Legal / Finance | ☐ |
The table above also serves as a red-flag detector during the handover negotiation. For completeness, the following vendor response patterns during exit indicate structural blockers:
| Exit Request | Red Flag Response | What It Means Operationally |
|---|---|---|
| Source code repository transfer | "We will export your data when you close the account" | Code was never contractually yours. Migration timeline is set by vendor cooperation, not your team. |
| Fine-tune dataset export | "That data is stored in our proprietary format. We can provide a summary." | Fine-tune work cannot be ported to the replacement provider. You are starting from scratch. |
| Incident history and runbook | "Our runbooks are internal. We can provide a summary of uptime." | New team will rediscover every failure mode in production without mitigation context. |
| Fallback path during migration | "Once you cancel, access is immediately revoked." | No overlap window. A cutover failure has no recovery path. Do not cut over until this is resolved in writing. |
| Data deletion confirmation | "We retain data per our standard retention policy." | Vendor retains your data after contract end. GDPR and EU AI Act compliance exposure remains open. |
Evaluate Your Replacement Vendor With the Same Rigor
The reader has just completed a painful exit. Every step in this checklist corresponds to a gap that was absent from the original contract. The only protection against repeating the same gap on the next vendor relationship is running the 10-Point AI Vendor Audit on the replacement provider before signing.
The 10-Point Audit covers all the criteria named in this checklist: monitoring, SLOs, source-code ownership, drift detection, fallback paths, cost-anomaly alarms, rollback cadence, incident response, data-handling boundaries, and documented handover. For teams also working through the build-versus-buy decision on the replacement provider, the Build vs Buy Framework provides a structured evaluation across 10 criteria including data residency and vendor lock-in tolerance.
Note: this checklist is an engineering operations guide. Contract termination, data deletion obligations, and regulatory compliance requirements vary by jurisdiction and industry. Teams in regulated sectors (healthcare, financial services) should involve legal counsel on the contract-exit steps, particularly around data deletion confirmation and auto-renewal exposure.
Know what you are buying before you sign.
The 10-Point AI Vendor Audit translates these questions into a repeatable production-engineering checklist: source-code ownership, audit trail, SLOs, fallback paths, and exit clause. Free 16-page PDF, 15 minutes per vendor.
→ Download the 10-Point AI Vendor AuditConclusion
A clean vendor exit is an engineering operation, not a procurement event. The checklist in this article maps each migration step to the production-engineering controls that should have been in the original contract. The teams that complete vendor exits without breaking production are the ones that treated exit rights as a first-class engineering requirement at the start of the relationship, not a negotiation detail to revisit at termination.
The ISO/IEC 42001:2023 AI Management System standard addresses supplier relationship management for AI systems, including transition and handover requirements, at the organizational governance level (see ISO/IEC 42001:2023). The standard treats exit planning as part of the supplier lifecycle, not an afterthought. Production AI teams benefit from the same discipline.
Bring your current AI setup. We will tell you what is production-ready and what is not.
A focused 30-minute audit call with a production AI engineer (7 years EE, BSEE University of South Florida, sincllm-mcp v2.0.0 in production). Built for CTOs who need a fast read on migration risk before the exit window opens. No pitch deck. You bring the architecture; we bring the checklist.