Best Practices

AI Rules for Monolith to Microservices Migration

AI rewrites the entire monolith at once and calls it a migration. Rules for strangler fig pattern, incremental extraction, data migration strategies, dual-write periods, and rollback safety.

8 min read·February 26, 2025

18-month rewrite, 60% feature parity, monolith kept evolving, project cancelled

Strangler fig, incremental extraction, dual-write data sync, percentage cutover, rollback safety

AI Rewrites Instead of Migrating

AI approaches monolith-to-microservices as: a complete rewrite (build the new system from scratch, switch over on launch day), a big-bang cutover (all traffic moves to microservices at once — no rollback if it fails), no incremental path (the monolith and microservices cannot coexist), data migration as a one-time event (stop the monolith, migrate data, start microservices), and feature parity required before switching (the rewrite must match 100% of monolith features before any traffic moves). Every one of these patterns has a high failure rate. Most big-bang rewrites fail.

Modern migration is: incremental (strangler fig pattern — new functionality in microservices, old functionality migrated piece by piece), coexistent (monolith and microservices run simultaneously during transition), rollback-safe (traffic can be routed back to the monolith at any time), data-synchronized (dual-write or change data capture during transition), and feature-flag-controlled (percentage-based traffic routing for gradual cutover). AI generates none of these.

These rules cover: the strangler fig pattern, module extraction sequencing, data migration with dual-write, API gateway routing, feature flag cutover, and rollback safety during migration.

Rule 1: Strangler Fig Pattern for Incremental Migration

The rule: 'Place an API gateway in front of the monolith. Route new features to new microservices. Gradually migrate existing features one module at a time. The monolith shrinks as microservices grow — like a strangler fig tree growing around the host tree. At no point is there a big-bang switch. At every point, the system works. If a microservice has problems, route traffic back to the monolith.'

For the gateway routing: 'API gateway routes by path: /api/orders/* → order-service (migrated), /api/payments/* → payment-service (migrated), /api/users/* → monolith (not yet migrated), /api/products/* → monolith (not yet migrated). As each module is extracted, the gateway route changes. The client sees one API — it does not know which requests go to the monolith and which go to microservices. Migration is invisible to consumers.'

AI generates: "Let us rewrite the entire application in microservices and switch over when it is done." Timeline: 18 months. Outcome: the rewrite has 60% of features, the monolith has continued evolving, feature parity is impossible, the project is cancelled. Strangler fig: extract the first module in 2 weeks, route traffic, validate, extract the next module. Each extraction delivers value. The migration can pause or stop at any point without waste.

  • API gateway in front of monolith — routes requests to monolith or microservice
  • Migrate one module at a time — each extraction is independently valuable
  • New features go directly to microservices — stop adding to the monolith
  • Migration is invisible to API consumers — same URLs, different backends
  • Can pause or stop at any point — the system works at every stage
💡 First Module in 2 Weeks, Not 18 Months

Big-bang rewrite: 18 months, 60% features, cancelled. Strangler fig: extract the first module in 2 weeks, route traffic, validate, extract the next. Each extraction delivers value independently. The migration can pause at any point without waste.

Rule 2: Module Extraction Sequencing

The rule: 'Extract modules in order of: (1) least coupled (modules with few dependencies on other modules are easiest to extract), (2) highest change frequency (modules that change often benefit most from independent deployment), (3) clearest domain boundary (modules that map to a bounded context extract cleanly), (4) highest team pain (modules that cause deployment conflicts or slow down the team). Do not start with the most critical module — start with the easiest to build confidence and refine the extraction process.'

For dependency analysis: 'Before extracting, map module dependencies: which modules call this module? Which modules does this module call? What data does it share? Tools: code analysis (find cross-module imports), database analysis (find cross-module table joins), and runtime analysis (trace cross-module API calls). High-dependency modules are extracted later — after their dependencies are already extracted or wrapped behind stable interfaces.'

AI generates: "Let us start by extracting the user authentication module" — the module that every other module depends on. Extracting it first requires every other module to be updated simultaneously. Start with a leaf module (notifications, analytics, reporting) that other modules do not depend on. Extract, validate the process, then tackle higher-dependency modules with experience.

Rule 3: Data Migration with Dual-Write

The rule: 'During extraction, the monolith and microservice both need the data. Dual-write pattern: (1) the monolith writes to both the old database and the new service database, (2) validate that both databases stay in sync, (3) gradually shift reads from the old database to the new service API, (4) when all reads use the new service, stop writing to the old database, (5) remove the old database tables. The dual-write period ensures: no data loss, ability to rollback, and time to validate correctness.'

For change data capture (CDC): 'Alternative to dual-write: use CDC (Debezium, AWS DMS) to capture changes from the monolith database and replicate them to the new service database. Advantages over dual-write: no code changes in the monolith (CDC reads the database transaction log), guaranteed consistency (captures every change, including those from other systems), and lower risk (the monolith does not need to know about the new service). Disadvantage: more infrastructure to manage.'

AI generates: "Stop the monolith, run a migration script, start the microservice" — a maintenance window that could be hours. During which: the application is down, the migration script may fail halfway, and rollback means restoring a database backup. Dual-write: zero downtime, continuous validation, and instant rollback (just route traffic back to the monolith). The migration happens while users are actively using the system.

⚠️ Zero Downtime Migration

Stop monolith, run migration script, start microservice = hours of downtime and rollback means database restore. Dual-write: zero downtime, continuous validation, instant rollback by routing traffic back. The migration happens while users actively use the system.

Rule 4: Feature Flag and Percentage-Based Cutover

The rule: 'Route traffic to the new microservice gradually using feature flags or percentage-based routing. Phase 1: 1% of traffic to the microservice (canary — detect critical bugs with minimal impact). Phase 2: 10% (validate performance under moderate load). Phase 3: 50% (A/B comparison — compare latency, error rates, and business metrics). Phase 4: 100% (full cutover). At each phase: compare metrics between monolith and microservice. Rollback to previous phase if metrics degrade.'

For shadow traffic: 'Before routing any real traffic, use shadow mode: the gateway sends a copy of real requests to the microservice but discards the responses. Compare the microservice responses to the monolith responses. Discrepancies reveal: missing edge cases, data mapping errors, and behavior differences. Shadow testing validates the microservice with real production data patterns without affecting any users.'

AI generates: deploy the microservice, switch all traffic at once, hope it works. If it fails: every user is affected, rollback is manual and slow, and the team discovers issues under full production load. Percentage-based cutover: issues affect 1% of users (canary), are detected in minutes (metric comparison), and rollback is instant (route back to monolith). Risk is contained at every step.

  • Canary: 1% traffic to microservice — detect critical bugs with minimal blast radius
  • Gradual increase: 1% → 10% → 50% → 100% with metric validation at each step
  • Shadow traffic: duplicate requests, compare responses, zero user impact
  • Rollback = route change: instant, no redeployment, no data migration reversal
  • Compare: latency, error rate, and business metrics between old and new at each phase

Rule 5: Rollback Safety at Every Stage

The rule: 'At every point during migration, you must be able to rollback to the previous state within minutes. This means: the monolith is still running and receiving traffic (or can be restarted quickly), the monolith database still has current data (dual-write ensures this), the API gateway can route traffic back (one configuration change), and no irreversible changes have been made (data formats remain compatible). The rollback plan is not a document — it is a tested procedure run in staging before every migration phase.'

For data compatibility: 'During migration, both the monolith and microservice must understand the same data format. If the microservice writes data in a new format that the monolith cannot read, rollback is impossible. Rule: expand-then-contract. Phase 1 (expand): add new fields/formats while keeping old ones. Phase 2 (migrate): both systems work with both formats. Phase 3 (contract): remove old fields/formats after full cutover. The expand phase ensures rollback safety.'

AI generates: a migration that immediately changes database schemas, removes old code paths, and deploys incompatible data formats. Rollback requires: restoring database backups (hours), redeploying old code (if the branch still exists), and manually reconciling data written during the failed migration. With rollback safety: gateway route change (seconds), monolith processes traffic immediately, dual-written data is already current. Total rollback time: under 1 minute.

ℹ️ Rollback in Under 1 Minute

Without rollback safety: restore database backups (hours), redeploy old code (if it still exists), reconcile data manually. With rollback safety: change one gateway route (seconds), monolith processes traffic immediately, dual-written data is already current. Under 1 minute.

Complete Monolith to Microservices Rules Template

Consolidated rules for monolith to microservices migration.

  • Strangler fig: API gateway routes traffic, migrate one module at a time
  • Extract easiest modules first — build confidence before tackling high-dependency modules
  • Dual-write during transition: monolith and microservice databases stay in sync
  • CDC alternative: Debezium captures changes without monolith code changes
  • Percentage-based cutover: 1% canary → 10% → 50% → 100% with metrics at each step
  • Shadow traffic: compare microservice responses to monolith without affecting users
  • Rollback within minutes at every stage — tested in staging, not just documented
  • Expand-then-contract for data changes: ensure rollback compatibility at every phase