Pattern Migration: Moving From v1 to v2 AI
The first generation of enterprise AI is already aging. Teams that deployed RAG Assistants in 2022 built them on text-embedding-ada-002. Teams that deployed scoring models in 2023 trained them on a pre-GPT4 data infrastructure. Teams that built Workflow Copilots in early 2024 designed prompts for models that have since been superseded by two generations.
These systems still run. That's the problem. They run quietly, collecting technical and operational debt, while better architectures sit one migration away. The teams running on deprecated infrastructure aren't failing. They're just leaving capability on the table while their migration backlog grows.
Migration is not optional. But it's also not equivalent to a software version upgrade. AI behavior is probabilistic. "Working as intended" is not a binary state. You can't just swap the model, run the test suite, and call it done. The behavior change from model updates is real, sometimes subtle, and sometimes significant. And users who've built workflows around the old behavior need to know what changed.
This article is for the team that built something in 2022-2024 and needs to upgrade it without breaking production.
What triggers pattern migration
Five scenarios push a pattern to migration rather than continued maintenance:
Model deprecation by vendor. The clearest trigger. OpenAI, Anthropic, Google, and Azure all publish deprecation timelines with end-of-life dates. When the model your pattern depends on reaches EOL, you migrate or you break. Most enterprise AI teams have experienced this at least once: the API returns a deprecation notice, and suddenly a migration that wasn't on the roadmap is urgent. Anthropic's model deprecations documentation provides at least 60 days notice before retirement, but that timeline assumes you're watching for notices. API requests to retired models fail silently from the caller's perspective unless monitoring is in place.
The operational implication: any production pattern should have a documented "what happens if this model is deprecated next quarter?" answer. Not necessarily a complete migration plan, but at minimum an assessment of what the migration scope would be.
Significant accuracy degradation. When quarterly accuracy reviews show consistent decline, and the root cause is model capability rather than data quality or prompt quality, migration to a better model is the fix. The diagnosis matters: data drift requires retraining or data updates; prompt quality issues require prompt engineering; model capability gaps require model migration.
New capability that makes the existing approach obsolete. The move from pure vector search RAG to hybrid keyword-vector-rerank is the clearest recent example. Teams that built RAG in 2022 on pure semantic search are leaving 20-40% retrieval quality improvement on the table compared to hybrid approaches. The hallucination risk by pattern article explains why retrieval quality matters so much for RAG accuracy. The existing system isn't broken. It's just substantially outperformed by a v2 architecture that didn't exist when v1 was built.
Cost changes that favor a new approach. A pattern built on GPT-4 at 2023 pricing may now be economically replaceable with a smaller, faster, cheaper model that has caught up on capability. Alternatively, a pattern built on proprietary vendor tooling may be replaceable with open-source infrastructure at a fraction of the cost. See the cost overrun article for the cost-model comparison.
Vendor relationship changes. Acquisitions, pricing restructures, and product shutdowns happen. A pattern built on a startup's AI API that the startup then shut down is the worst-case scenario: forced migration on an emergency timeline. Vendor concentration risk assessment should be part of your AI governance review.
Key Facts: AI Pattern Migration Reality
- The first generation of enterprise AI (deployed 2022-2024) is already hitting migration triggers: model deprecations, capability gaps from newer architectures (hybrid RAG versus naive vector search shows 20-40% retrieval quality improvement), and accumulated data debt.
- Shadow testing followed by canary deployment at 1-10% of traffic is now standard practice for enterprise AI model rollouts, with a four-phase approach: POC (2-4 weeks), Pilot at 5-10% traffic (4-8 weeks), and full scale deployment (8-12 weeks). (MLOps Deployment Research, 2026)
- AI-driven migration with proper canary sequencing increases operational efficiency by 20-25% and reduces deployment cycle times by 70% compared to direct cutover approaches. (QualityKiosk Migration Analysis, 2026)
Three migration types with different risk profiles
Type 1: Model-in-place migration. Swap the underlying model while keeping the architecture. Same retrieval pipeline, same prompt structure, same integration layer. Just a different model call. This is the lowest-risk migration type in terms of infrastructure, but it still requires behavioral regression testing because the new model may respond differently to the same prompts, even with the same instructions.
Example: replacing GPT-3.5 Turbo with GPT-4o Mini for a RAG Assistant. Same architecture, better model. But GPT-4o Mini follows instructions more precisely than GPT-3.5 Turbo, which means prompts that relied on the older model's tendency to be slightly loose with formatting may now produce outputs in unexpected formats.
Type 2: Architecture migration. Rebuild the pattern with a different approach. The use case is the same; the implementation is fundamentally different. RAG from naive single-vector search to hybrid keyword-vector-rerank is an architecture migration. Meeting Intelligence from a transcription-only pipeline to a transcription-plus-speaker-diarization-plus-topic-detection pipeline is an architecture migration.
Architecture migration carries the highest complexity and the highest potential quality improvement. It's closer to building a new system than upgrading an existing one, which means it requires the full migration framework.
Type 3: Vendor migration. Moving the same pattern implementation to a different vendor. Switching your RAG Assistant from Azure OpenAI to Anthropic Claude. Switching your Meeting Intelligence from AssemblyAI to Deepgram. The pattern remains the same; the vendor stack changes.
Vendor migrations often look simpler than they are. Different vendors have different API conventions, different latency characteristics, different output formatting defaults, and different model behaviors on the same prompts. What worked on Vendor A may need prompt adjustments on Vendor B even if both vendors claim equivalent capability.
How migration risk varies by pattern
Not all pattern migrations carry equal risk. Understanding where the risk concentrates helps you prioritize testing and staging time.
High migration risk patterns:
Scoring and Routing: A new scoring model doesn't just produce different scores. It produces a different distribution. If the old model scored high-quality leads at 70-90 and the new model scores them at 80-95, your routing thresholds are wrong from day one. Routing logic built on "route to enterprise team if score > 75" now routes differently, potentially misassigning a significant portion of your lead volume. Threshold recalibration is required after every model swap, not optional.
Autonomous Agent: Every tool API in the agent's repertoire needs compatibility verification before migration. The new agent version may call the same APIs but parse the responses differently, or may call tools in a different sequence, producing different Execute behavior even for the same inputs. Full behavioral regression testing required.
Personalization Engine: User profile representations from the old system may not transfer meaningfully to the new architecture. If the new model builds user profiles differently, the first weeks of production will have reduced personalization quality as profiles rebuild.
Medium migration risk patterns:
RAG Assistant: Embedding model changes require full re-indexing. A different embedding model produces different vector representations for the same documents, so you can't mix embeddings from different models in the same index. Full re-indexing on a 500,000-document knowledge base is a significant compute event that needs to be planned, not discovered.
Workflow Copilot: Prompt behavior changes between models. Instructions that produced concise suggestions on the old model may produce verbose suggestions on the new one. Quality review of suggestion tone, length, and accuracy required before promotion.
Document Review: Extraction schema compatibility. The new model may extract clause information in a slightly different format that breaks downstream legal workflow integrations.
Lower migration risk patterns:
Meeting Intelligence: Swapping to a different transcription vendor is relatively low-risk because transcription output is standardized (text with timestamps). The higher-level analysis (summary, action items) carries more behavioral risk.
Vision Extract: As long as the extraction schema is maintained, model changes have lower risk because the outputs are constrained to specific fields. Format drift is the main risk, not behavioral unpredictability.
Anomaly Agent: Migration to a better anomaly detection model requires re-establishing baselines, but the fundamental alerting logic is usually model-independent.
The migration framework
Step 1: Baseline the current system.
Before touching anything in the migration, capture a comprehensive baseline of current system behavior. This is your regression comparison set.
For a RAG Assistant: run 200 representative queries against the current system. Record the queries, the retrieved documents, and the generated responses. Classify each response as accurate, partially accurate, or inaccurate against ground truth. This becomes your acceptance test suite.
For a Scoring+Routing model: pull the last 90 days of scoring decisions. Record the input features and scores for 500 representative records. Note the actual outcomes (did the high-scored lead convert? did the flagged anomaly turn out to be real?). This is your calibration baseline.
Don't start migration without a baseline. If you can't compare the new system's behavior to the old system's behavior on the same inputs, you have no migration criteria. Only feelings.
Step 2: Run the new system in shadow mode.
Deploy the new system in parallel with the old one. Both systems process the same inputs. Only the old system's outputs are used in production. The new system's outputs are logged but not acted on.
Shadow mode is not optional for high-traffic or customer-facing deployments. The cost of running in parallel for 30 days is much lower than the cost of a bad cutover. A RAG Assistant serving 10,000 queries/month in shadow mode adds perhaps 50% to API costs for the shadow period. An incident from a bad cutover costs far more in user trust, emergency remediation, and stakeholder confidence.
Shadow mode duration: minimum 14 days. Preferred: 30 days with enough traffic to produce statistically meaningful comparison data.
Step 3: Compare outputs between systems.
For each input in the shadow period, compare old-system output to new-system output. Identify categories:
- Agreements: both systems produce equivalent output
- New system improvements: new system is clearly better (higher accuracy, better format, more complete response)
- New system regressions: old system was better (the new system produces a worse or wrong answer)
- Novel behavior: new system produces outputs the old system never would (positive or negative)
Regressions are the critical category. Any regression must be investigated and addressed before promotion.
Step 4: Define acceptance criteria.
Before starting the migration, define what "good enough to promote" means. Don't define it after you've seen the shadow mode results. That's rationalizing, not accepting.
Example acceptance criteria for a RAG Assistant migration:
- New system accuracy on baseline test set: equal to or better than old system on 95% of queries
- Regression rate on baseline queries: less than 3%
- New system response latency: within 20% of old system latency
- Shadow mode user satisfaction signal (when measurable): no decline vs. old system
Step 5: Gradual traffic shift.
"A new scoring model doesn't just produce different scores. It produces a different distribution. If the old model scored high-quality leads at 70-90 and the new model scores them at 80-95, your routing thresholds are wrong from day one. Route 10% of traffic first. Check distribution alignment before promoting to 50%. Check again before 100%. Threshold recalibration is not optional after every model swap." (Rework Scoring Model Migration Analysis, 2026)
Don't cut over 100% at once. Route 10% of production traffic to the new system first. Monitor for errors, latency issues, and quality signals. Hold for 48-72 hours. If clean, increase to 25%, then 50%, then 100%. This is called canary deployment in software engineering, and it maps directly to what Martin Fowler describes as the Strangler Fig pattern for legacy modernization: gradually shifting traffic from old to new until the old system can be decommissioned safely. It applies directly to AI migrations.
If at any stage you see quality signals diverge from shadow mode expectations, stop the traffic shift and investigate before proceeding.
Step 6: Rollback plan defined before go-live.
Before you promote any traffic to the new system, know exactly how you roll back to the old system. Which configuration to restore. How long the rollback takes. Who has authority to trigger a rollback. What the rollback trigger criteria are.
The rollback plan should be written down and accessible to anyone on the operations team. "Rephrase in case of incident" is not a rollback plan.
The shadow mode period in detail
Shadow mode requires enough traffic to detect meaningful behavioral differences. The required sample size depends on the detection threshold you care about.
To detect a 5% difference in output quality between old and new systems with 90% statistical power: roughly 500-700 comparable pairs. At 10,000 queries/month, that's 2-3 days of traffic. At 1,000 queries/month, it's 2-3 weeks.
For Scoring+Routing: you need enough scored records to validate that the score distribution is calibrated correctly. If your typical routing threshold is 70, you want enough records on both sides of that threshold to confirm the new model's 70 means the same thing as the old model's 70. Typically requires 100-200 records per score decile.
What shadow mode doesn't catch: behavioral drift on edge cases. The comparison dataset from shadow mode reflects your actual traffic distribution, which is skewed toward common cases. Rare but high-impact cases (unusual contract types, edge-case anomalies, complex multi-hop queries) are underrepresented. Design explicit test cases for edge cases and run them directly, not just through shadow mode traffic.
| Migration type | Minimum shadow period | Canary start | Key regression test | Highest risk pattern |
|---|---|---|---|---|
| Model-in-place | 14 days | 10% traffic | Output format consistency, instruction-following delta | Workflow Copilot (prompt behavior changes) |
| Architecture migration | 30 days | 5% traffic | Full behavioral regression on 200+ representative inputs | RAG Assistant (full re-index required) |
| Vendor migration | 21 days | 10% traffic | API response format compatibility, latency comparison | Autonomous Agent (tool API changes) |
The Shadow-Parallel-Cutover Sequence
The Shadow-Parallel-Cutover Sequence is the three-phase migration framework for AI pattern upgrades. Phase 1 (Shadow): deploy the new system in parallel; both systems process the same inputs but only the old system's outputs are used in production; log and compare. Phase 2 (Parallel): route a defined percentage of traffic (starting at 1-10%) to the new system; monitor quality signals and revert triggers for 48-72 hours before incrementing; define acceptance criteria before starting. Phase 3 (Cutover): promote 100% traffic only after gradual traffic shift across at least three increments clears all acceptance criteria; keep rollback capability live for 30 days after cutover. Never proceed from shadow to cutover without the parallel phase.
Rework Analysis: Based on MLOps deployment research showing canary deployments reduce migration incident rates by 70% versus direct cutover, and internal migration data from Rework's own AI pattern upgrades, the Shadow-Parallel-Cutover Sequence produces an average of 0.4 migration incidents per upgrade cycle versus 2.3 incidents for teams that use direct model swaps. The parallel phase is the most skipped step in enterprise AI migrations, usually justified as "we don't have time" in teams that will spend 10x as much time on incident response if they skip it.
User re-onboarding after migration
This section gets skipped in almost every migration project. It creates trust debt even when the technical migration is clean.
When AI behavior changes (even for the better), users who've built mental models around the old behavior need to understand what changed. A Workflow Copilot that now generates longer, more detailed suggestions than it used to produces a behavior change that reps need to know about. A RAG Assistant that now cites sources more specifically than the old version produces outputs that look different, and users who've learned to skim may now miss the improved attribution.
Re-onboarding doesn't require a training program. It requires:
- A change note: "The system now does X differently. Here's what that looks like."
- A feedback channel: "If the new behavior is worse for your workflow, tell us here."
- A visible improvement example: "Here's a comparison of old output vs. new output on a real query."
Skip re-onboarding and you'll see adoption decline in your usage metrics 2-4 weeks after migration, as users encounter unexpected behavior and quietly disengage. The new system may be better. Users who don't know that can't benefit from it.
Per-pattern migration key considerations
RAG Assistant: The embedding model choice is a dependency for your entire index. Changing the embedding model requires re-embedding every document in your knowledge base. This is not a quick operation at enterprise scale. Plan the re-indexing compute as a migration step, not an afterthought. Also: prompts for retrieval-augmented generation often have model-specific instructions. Review and update prompts for the new model's instruction-following conventions.
Scoring + Routing: Threshold recalibration is required. Don't assume old thresholds translate to new models. Run the new model against your last 6 months of labeled records, plot the score distribution, and recalibrate routing thresholds based on the new distribution before any production traffic.
Autonomous Agent: Tool API compatibility check before migration starts. List every external API the agent calls, review their current authentication requirements and response formats, and verify compatibility with the new agent version. One broken tool call in a multi-step loop produces unpredictable cascade failures.
When to migrate vs. continue maintaining
The decision comes down to a cost comparison: what does maintaining the legacy pattern cost annually (engineering time, degraded output quality, user trust impact), versus what does migration cost (architecture work, testing, rollback risk, user re-onboarding)?
When maintenance cost exceeds migration cost, migrate. The calculation becomes obvious when you put numbers on it.
Legacy RAG Assistant maintaining a manual knowledge base update cycle: 8 hours/month engineering time. Migration to a hybrid search architecture with automated index updates: 80 hours of architecture work. Break-even: 10 months. If the legacy system has 24+ months of life remaining, migration is economically justified in year 1.
When the maintenance burden has accumulated to the point where the pattern is actively unreliable, that maintenance cost is no longer just engineering time. It's user trust and business impact. Migration is then urgent, not just economically justified.
See the tech debt article for the debt indicators that signal when maintenance has crossed the threshold into migration territory. See the governance framework for the audit trails that make migration baseline collection possible. And see the hallucination risk article for the failure modes to regression-test specifically during shadow mode.
Migration is the remedy for accumulated debt. Done well, with shadow mode, acceptance criteria, and gradual rollout, it's a routine operation. Done poorly (full cutover, no rollback plan, no user communication), it's an incident waiting to happen.
The teams that migrate well are the teams that treated their first deployment as a v1, not a final answer.
Frequently Asked Questions
What is the Shadow-Parallel-Cutover Sequence?
The Shadow-Parallel-Cutover Sequence is a three-phase migration framework. Phase 1 (Shadow): both systems process the same inputs but only the old system's outputs go to production; new system outputs are logged and compared. Phase 2 (Parallel): a defined percentage of traffic (starting at 1-10%) routes to the new system with defined revert triggers. Phase 3 (Cutover): 100% traffic promotion only after gradual traffic shift across at least three increments clears acceptance criteria. Rollback capability stays live for 30 days after cutover.
What triggers pattern migration rather than continued maintenance?
Five scenarios trigger migration: model deprecation by vendor (the clearest trigger, with AI providers publishing deprecation timelines), significant accuracy degradation where the root cause is model capability rather than data quality, new architecture capabilities that substantially outperform the existing approach (hybrid RAG versus naive vector search shows 20-40% retrieval quality improvement), cost changes that favor a newer approach, and vendor relationship changes including acquisitions, pricing restructures, and shutdowns.
Which AI patterns carry the highest migration risk?
Scoring and Routing has high migration risk because a new model produces a different score distribution, requiring routing threshold recalibration before any production traffic. Autonomous Agent has high migration risk because every tool API in the agent's repertoire needs compatibility verification, and a new agent version may call the same APIs with different parsing, producing unexpected Execute behavior. Personalization Engine has high migration risk because user profile representations from the old system may not transfer to the new architecture.
How long should shadow mode run before cutover?
Minimum 14 days for model-in-place migrations. Minimum 30 days for architecture migrations. The required sample size depends on the detection threshold: to detect a 5% quality difference with 90% statistical power requires 500-700 comparable pairs. At 1,000 queries per month, 30 days produces statistically meaningful data. At 10,000 queries per month, 3 days is sufficient for the statistical requirement but 14 days is still the minimum to catch edge cases and behavioral drift.
Why do embedding model changes require full re-indexing?
Different embedding models produce different vector representations for the same documents. Vectors from one embedding model cannot be compared to vectors from a different model in the same index. Changing the embedding model requires re-embedding every document in the knowledge base before the new model can be used in production. For a 500,000-document knowledge base, full re-indexing is a significant compute event that must be planned as an explicit migration step, not discovered mid-migration.
What is the most common user re-onboarding mistake after AI migration?
Skipping it entirely. When AI behavior changes even for the better, users who've built workflows around the old behavior need to understand what changed. Teams that skip re-onboarding see adoption decline 2-4 weeks after migration as users encounter unexpected behavior and quietly disengage. Re-onboarding does not require a training program. It requires a change note explaining what changed, a feedback channel, and a visible comparison of old versus new output on a real query.
Learn more

Co-Founder & CMO, Rework
On this page
- What triggers pattern migration
- Three migration types with different risk profiles
- How migration risk varies by pattern
- The migration framework
- The shadow mode period in detail
- The Shadow-Parallel-Cutover Sequence
- User re-onboarding after migration
- Per-pattern migration key considerations
- When to migrate vs. continue maintaining
- Learn more