AI Rules for Database Design

AI Designs Databases by Intuition

AI generates database schemas with: no normalization (duplicate data across tables, update anomalies), no indexes (full table scans on every query), no foreign keys (orphaned records, referential integrity violations), no migration strategy (ALTER TABLE in production with no rollback plan), and no query analysis (N+1 queries, missing joins, full scans on million-row tables). The schema works for 100 rows. At 100,000 rows, every page load takes 5 seconds.

Modern database design is: normalized (3NF minimum, denormalize intentionally for read performance), indexed (strategic indexes based on query patterns, not guesswork), constrained (foreign keys, unique constraints, NOT NULL — the database enforces data integrity), migration-managed (versioned migrations, reversible, tested in staging), and query-analyzed (EXPLAIN on every new query, index usage verified). AI generates none of these.

These rules cover: normalization to third normal form, strategic index design, foreign key and constraint enforcement, migration-first workflows, and EXPLAIN-driven query optimization.

Rule 1: Normalize to 3NF, Denormalize Intentionally

The rule: 'Start with third normal form (3NF): every non-key column depends on the key, the whole key, and nothing but the key. 1NF: no repeating groups (arrays in columns). 2NF: no partial dependencies (every column depends on the full primary key). 3NF: no transitive dependencies (column A depends on column B which depends on the key — move A to a separate table). Denormalize only when: a specific query needs it, you have measured the performance gain, and you accept the update complexity.'

For intentional denormalization: 'Denormalization trades write complexity for read speed. Examples: storing a computed total on the order (avoid SUM query on every read), storing the author name on the article (avoid JOIN on every read), caching a count on the parent (avoid COUNT query). Each denormalization requires: a mechanism to keep it in sync (trigger, application logic, or materialized view) and documentation explaining why it exists. Without documentation, the next developer normalizes it back.'

AI generates: a users table with: name, email, address_street, address_city, address_state, address_zip, company_name, company_phone, company_address — 3 entities crammed into 1 table. Adding a second address requires adding address2_street, address2_city columns. Normalized: users table, addresses table (one-to-many), companies table (many-to-one). Adding addresses: INSERT a new row. Clean, extensible, no column explosion.

3NF minimum: every column depends on the key, the whole key, nothing but the key
Denormalize intentionally: measured performance gain, documented reason, sync mechanism
Computed columns (order total, item count): acceptable denormalization with triggers or app logic
One-to-many: separate table with foreign key, not array columns or numbered columns
If you denormalize, document why — prevent the next developer from re-normalizing

💡 Adding Addresses: INSERT, Not ALTER

AI crams address fields into the users table: address_street, address_city, address2_street... Adding a third address means more columns. Normalized: addresses table with foreign key to users. Adding addresses: INSERT a new row. Clean, extensible, no column explosion.

Rule 2: Strategic Index Design

The rule: 'Create indexes based on query patterns, not table structure. Index: columns in WHERE clauses, columns in JOIN conditions, columns in ORDER BY, and columns in GROUP BY. Composite indexes: order columns by selectivity (most selective first) and include all columns needed for covering indexes. Do not index: columns with low cardinality (boolean, status with 3 values), columns rarely queried, and small tables (under 1000 rows — sequential scan is faster).'

For composite index design: 'An index on (user_id, created_at) supports: WHERE user_id = X (uses the index), WHERE user_id = X AND created_at > Y (uses the full index), and WHERE user_id = X ORDER BY created_at (index scan, no sort). It does NOT efficiently support: WHERE created_at > Y (leading column not specified). Column order matters: the leftmost column must be in the query for the index to be useful. Design composite indexes to match your most common query patterns.'

AI generates: no indexes, or an index on every column individually. No indexes: every query is a full table scan. Index on every column: write performance degrades (each INSERT updates every index), storage doubles, and the query planner may choose wrong indexes. Strategic indexes: 3-5 indexes per table covering the actual query patterns. EXPLAIN confirms they are used.

Rule 3: Foreign Keys and Constraint Enforcement

The rule: 'Enforce data integrity at the database level, not just in application code. Foreign keys: orders.user_id REFERENCES users(id) ON DELETE CASCADE — the database prevents orphaned orders. Unique constraints: users.email UNIQUE — the database prevents duplicate emails even if two requests arrive simultaneously. NOT NULL: every required field is NOT NULL — the database rejects incomplete records. CHECK constraints: age CHECK (age >= 0 AND age <= 150) — the database rejects invalid values.'

For why database-level constraints: 'Application-level validation can be bypassed: direct database access (admin tools, migrations, data fixes), race conditions (two concurrent requests both check uniqueness, both pass, both insert — duplicate), and bugs (a code path that skips validation). Database constraints are absolute — no code path can violate them. Application validation is for user-friendly error messages. Database constraints are for data integrity guarantees.'

AI generates: no foreign keys ("the ORM handles relationships"), no unique constraints ("we check in the application"), nullable everything ("we will validate later"). One direct database INSERT (migration script, admin tool, debugging session) creates an orphaned order, a duplicate email, or a null required field. The application validation was not there. The database constraint would have been.

Foreign keys with ON DELETE CASCADE or RESTRICT — prevent orphaned records
UNIQUE constraints on business identifiers: email, slug, api_key_hash
NOT NULL on every required field — the database rejects incomplete records
CHECK constraints for value ranges: age >= 0, price >= 0, status IN ('draft', 'published')
Database constraints are absolute — application validation is bypassable

⚠️ Application Validation Is Bypassable

Two concurrent requests both check email uniqueness in code, both pass, both insert — duplicate email. A UNIQUE constraint: the database rejects the second insert regardless of application logic. Database constraints are absolute; application validation is for user-friendly errors.

Rule 4: Migration-First Schema Changes

The rule: 'Every schema change goes through a versioned migration. Never ALTER TABLE directly in production. Migration tools: Drizzle Kit (drizzle-kit generate, drizzle-kit migrate), Prisma Migrate, Flyway, or Liquibase. Each migration: has a version number (sequential or timestamp), is forward-only in production (no editing applied migrations), has been tested in staging (run the migration against a copy of production data), and is reversible (down migration or documented rollback procedure).'

For safe schema changes: 'Expand-then-contract for non-breaking changes: (1) add the new column (nullable or with default), (2) deploy code that writes to both old and new columns, (3) backfill the new column from old data, (4) deploy code that reads from the new column, (5) drop the old column. This sequence has zero downtime and is rollback-safe at every step. Dangerous: renaming a column (breaks reads until code deploys), dropping a column (loses data), and changing a type (may lose precision).'

AI generates: ALTER TABLE users ADD COLUMN phone TEXT NOT NULL — fails if the table has existing rows (no default value). Or worse: ALTER TABLE users DROP COLUMN legacy_field — data gone, no rollback. Migration-first: the change is versioned, tested, and reversible. The expand-then-contract pattern ensures every step is safe to rollback.

Rule 5: EXPLAIN-Driven Query Optimization

The rule: 'Run EXPLAIN ANALYZE on every new query before deploying. Check: is an index being used? (Index Scan, not Seq Scan on large tables), what is the estimated vs actual row count? (large discrepancy means stale statistics), are there nested loops on large tables? (may need a join index), and what is the total execution time? (set a budget: under 50ms for API queries, under 500ms for reports). Fix before deploying, not after users complain.'

For N+1 query detection: 'The most common performance problem AI creates: fetch a list of orders (1 query), then for each order fetch the user (N queries). Total: N+1 queries. Fix: JOIN in the original query, or use the ORM include/with syntax: db.select().from(orders).leftJoin(users). Monitor with: query logging that counts queries per request. Alert if a single request executes more than 10 queries. N+1 detection tools: pganalyze, express-query-counter.'

AI generates: const orders = await db.select().from(ordersTable); for (const order of orders) { order.user = await db.select().from(usersTable).where(eq(usersTable.id, order.userId)); } — 100 orders = 101 queries. With a JOIN: 1 query. Same data, 100x fewer queries, 50x faster. EXPLAIN shows the difference: 101 sequential scans vs 1 index join. Always EXPLAIN before deploying.

ℹ️ 101 Queries to 1 Query

Fetch 100 orders, then fetch each user separately: 101 queries, 100 sequential scans. One JOIN: 1 query, 1 index scan. Same data, 100x fewer queries, 50x faster. EXPLAIN shows the difference instantly. Always EXPLAIN before deploying.

Complete Database Design Rules Template

Consolidated rules for database design.

3NF minimum: normalize first, denormalize intentionally with documentation
Strategic indexes: based on query patterns, composite with selectivity order
Foreign keys, UNIQUE, NOT NULL, CHECK: database-level integrity, not just application
Migration-first: versioned, tested in staging, reversible, never direct ALTER in production
Expand-then-contract: add column, dual-write, backfill, switch reads, drop old column
EXPLAIN ANALYZE on every new query: verify index usage, check execution time
N+1 detection: JOIN or include, monitor queries per request, alert on >10 queries
Query budget: under 50ms for API queries, under 500ms for reports