AI Rules for Database Migrations

AI Generates Migrations That Lock Tables and Lose Data

Database migrations are the most operationally dangerous code AI generates. A migration that adds a NOT NULL column with a default locks the entire table on PostgreSQL while backfilling every row. A migration that renames a column breaks the running application (it still queries the old name). A migration that drops a column permanently destroys data. AI generates all three casually — as if migrations are just code changes, not operations on live production data.

The fundamental principle: schema changes and application changes do not deploy simultaneously. The database changes first, the application adapts later (or vice versa). A migration that requires the new application code to work is a migration that breaks during the deploy window when old code is still running against the new schema.

These rules cover: zero-downtime migration patterns, safe column operations, index creation, data backfills, rollback strategies, and migration testing. They apply to any migration tool: Prisma Migrate, Drizzle Kit, Alembic, Rails Migrations, Flyway, or raw SQL.

Rule 1: Two-Phase Column Changes

The rule: 'Never add a NOT NULL column without a default in one step. Phase 1: add the column as nullable (ALTER TABLE users ADD COLUMN role TEXT). Deploy the application to write to the new column. Phase 2: backfill existing rows (UPDATE users SET role = default_value WHERE role IS NULL). Phase 3: add the NOT NULL constraint (ALTER TABLE users ALTER COLUMN role SET NOT NULL). Three deploys, zero downtime.'

For column renames: 'Never rename a column directly. Phase 1: add the new column, write to both old and new. Phase 2: backfill new column from old. Phase 3: update application to read from new column. Phase 4: stop writing to old column. Phase 5: drop old column. Five phases — but zero downtime and zero data loss. A direct rename breaks every query using the old name.'

For column drops: 'Never drop a column until the application no longer references it. Phase 1: stop reading the column in code (deploy). Phase 2: stop writing to the column (deploy). Phase 3: drop the column (migration). If you drop before the code stops referencing it, queries fail during the deploy window.'

Add column: nullable first → backfill → add NOT NULL — three phases
Rename column: add new → write both → backfill → read new → drop old — five phases
Drop column: stop reading → stop writing → drop — never drop while code references
Each phase is a separate deploy — zero downtime between them
Direct ALTER TABLE ... SET NOT NULL with default locks the table — avoid

⚠️ Three Phases, Not One

ALTER TABLE users ADD COLUMN role TEXT NOT NULL DEFAULT 'member' locks the entire table while backfilling every row. Three phases: add nullable → backfill in batches → add constraint. Zero downtime, zero lock.

Rule 2: Create Indexes Without Locking

The rule: 'For PostgreSQL: always use CREATE INDEX CONCURRENTLY — it builds the index without locking writes. Standard CREATE INDEX locks the table for the entire build duration (minutes to hours on large tables). For MySQL: ALTER TABLE ... ADD INDEX acquires a metadata lock — schedule during low traffic or use pt-online-schema-change. For SQLite: CREATE INDEX is fast (no concurrency concerns — single-writer).'

For large tables: 'On tables with millions of rows, even CONCURRENTLY takes time. Schedule during low traffic. Monitor: replication lag (index builds increase write amplification), query latency (index builds consume I/O), and lock waits (CONCURRENTLY still takes a brief lock at start and end). Cancel if impacting production: check pg_stat_activity for blocked queries.'

AI generates CREATE INDEX users_email_idx ON users(email) — which locks the table on PostgreSQL. One word (CONCURRENTLY) prevents the lock. This is the most impactful single-word addition in any migration rule.

💡 One Word: CONCURRENTLY

CREATE INDEX CONCURRENTLY prevents table locking during index build. Without it, writes are blocked for the entire build duration — minutes to hours on large tables. One word, one massive production impact.

Rule 3: Separate Data Migrations from Schema Migrations

The rule: 'Schema migrations (DDL: CREATE, ALTER, DROP) and data migrations (DML: UPDATE, INSERT, DELETE) are separate operations. Schema migrations run in the deployment pipeline — fast, reversible, no data transformation. Data migrations run as background jobs or scripts — batched, resumable, monitored. Never mix DDL and DML in one migration file — a slow UPDATE in a migration blocks the deployment pipeline.'

For batched backfills: 'Process data in batches: UPDATE users SET role = default WHERE role IS NULL AND id BETWEEN :start AND :end. Batch size: 1000-10000 rows per batch. Sleep between batches (100ms) to reduce lock contention. Log progress: processed 50000/1000000 rows. Make the script resumable: track the last processed ID so you can restart from where you left off.'

AI puts UPDATE ... SET ... in the migration file — which runs during deployment, blocks the migration runner, and can timeout on large tables. Separate the backfill into a script that runs after the migration, in batches, with monitoring.

Schema (DDL) in migration files — data (DML) in separate scripts/jobs
Batch updates: 1000-10000 rows — sleep between batches — log progress
Resumable: track last processed ID — restart from where you left off
Never UPDATE all rows in one transaction — timeout, lock contention, replication lag
Migration file: fast, reversible DDL only — data scripts: batched, monitored

Rule 4: Rollback Strategies

The rule: 'Every migration must have a rollback plan. Additive changes (add column, add table, add index) are safe — rollback means the column exists but is unused. Destructive changes (drop column, rename, change type) are not safely rollable — the data is gone. For destructive changes: take a backup before migrating, test on a staging copy of production data, and have a manual rollback SQL script ready.'

For migration tools: 'Prisma: prisma migrate resolve for fixing migration state. Drizzle: manual rollback SQL (Drizzle does not generate down migrations). Alembic: alembic downgrade -1 (generates and runs the reverse). Rails: rails db:rollback (runs the down method). Always test the rollback — an untested rollback is not a rollback plan.'

For the deploy sequence: 'Pre-deploy: take backup. Deploy: run migration. Verify: check application health, query correctness, and data integrity. If broken: rollback migration + rollback application. Post-verify: run data migration (if needed). The entire sequence is documented and practiced — never ad-hoc during an incident.'

Rule 5: Migration Testing on Production Data Copy

The rule: 'Test every migration on a copy of production data before running on production. A migration that passes on an empty development database may: lock production tables (millions of rows vs. hundreds), timeout (data backfill takes hours, not seconds), or fail on data that exists in production but not in development (NULLs, edge cases, Unicode, legacy data). Clone production → run migration → verify → measure duration → then run on production.'

For CI: 'Run migrations in CI: spin up a database, apply all migrations from scratch, verify the final schema matches expectations. This catches: migration ordering issues, missing dependencies between migrations, and syntax errors. Use Testcontainers for a real database in CI — never SQLite when production is PostgreSQL.'

For time estimation: 'Measure migration duration on the production data copy. If the migration takes >5 minutes, schedule a maintenance window and notify stakeholders. If it takes >30 minutes, reconsider the approach — can you batch? Can you use CONCURRENTLY? Can you split into smaller migrations? Never surprise-deploy a 2-hour migration.'

Test on production data copy — never just on empty dev database
Measure duration — if >5min, schedule maintenance window
CI: apply all migrations from scratch — verify final schema
Testcontainers for real DB in CI — never SQLite for PostgreSQL schemas
Staged rollout: staging (prod copy) → canary (subset) → production (full)

ℹ️ Test on Real Data

A migration that takes 1 second on your 100-row dev database takes 2 hours on your 50-million-row production table. Always test on a production data copy. Measure duration. If >5 minutes, schedule a maintenance window.

Complete Database Migration Rules Template

Consolidated rules for database migrations.

Two-phase columns: nullable → backfill → NOT NULL — never add NOT NULL in one step
Column rename: add new → write both → backfill → switch reads → drop old
Column drop: stop reading → stop writing → drop — never while code references
CREATE INDEX CONCURRENTLY on PostgreSQL — never standard CREATE INDEX on large tables
Schema (DDL) in migrations — data (DML) in separate batched scripts
Rollback plan for every migration — backup before destructive changes
Test on production data copy — measure duration — schedule if >5min
CI: all migrations from scratch — Testcontainers — verify final schema