AI Rules for Data Modeling

AI Maps Objects to Tables Without Thinking

AI generates data models with: one-to-one class-to-table mapping (every TypeScript interface becomes a database table regardless of relationships), no cardinality analysis (is it one-to-one, one-to-many, or many-to-many? AI does not ask), no polymorphic strategy (different entity types crammed into one table or spread across many with no clear pattern), hard deletes (DELETE FROM users — data gone, audit trail broken, foreign keys cascading), and no temporal awareness (only current state stored, history lost on every UPDATE).

Modern data modeling is: relationship-aware (ER diagrams before schema, cardinality defined and constrained), polymorphism-patterned (single-table inheritance, class-table inheritance, or discriminated unions — chosen intentionally), soft-delete-capable (deleted_at timestamp, filtered by default, restorable), temporally-aware (valid_from/valid_to for historical queries, or event sourcing for full history), and evolution-planned (additive changes, backward compatibility, migration-first). AI generates none of these.

These rules cover: entity-relationship modeling, cardinality constraints, polymorphic data patterns, soft delete strategies, temporal data modeling, and schema evolution planning.

Rule 1: Entity-Relationship Analysis Before Schema

The rule: 'Draw an ER diagram before writing any CREATE TABLE. Identify: entities (nouns — User, Order, Product), relationships (verbs — User places Order, Order contains Product), cardinality (one User has many Orders, many Products belong to many Categories), and attributes (which fields belong to which entity — not which class they were in). The ER diagram is the blueprint; the schema implements it.'

For cardinality notation: 'One-to-one (1:1): User has one Profile — consider merging into one table unless they change at different rates. One-to-many (1:N): User has many Orders — foreign key on the many side (orders.user_id). Many-to-many (M:N): Product belongs to many Categories, Category has many Products — junction table (product_categories with product_id + category_id). Every relationship has a cardinality; every cardinality has a schema pattern.'

AI generates: tables based on the TypeScript interfaces with no relationship analysis. A User interface has an orders array — AI creates a users table with a JSON orders column instead of a separate orders table with a foreign key. The ER diagram would have caught this: User 1:N Order requires a separate table. Five minutes of diagramming prevents five hours of refactoring.

ER diagram before schema: entities, relationships, cardinality, attributes
1:1 — consider merging unless entities change at different rates
1:N — foreign key on the many side (orders.user_id references users.id)
M:N — junction table (product_categories with composite primary key)
JSON columns for truly unstructured data — not for relationships

💡 Five Minutes Saves Five Hours

AI creates a users table with a JSON orders column instead of a separate orders table with a foreign key. An ER diagram would catch this: User 1:N Order requires a separate table. Five minutes of diagramming prevents five hours of schema refactoring.

Rule 2: Polymorphic Data Patterns

The rule: 'When multiple entity types share common fields but have type-specific fields, choose a polymorphic pattern. Single-Table Inheritance (STI): one table, discriminator column (type), nullable type-specific columns. Best for: few types, mostly shared fields. Class-Table Inheritance (CTI): base table with shared fields, type-specific tables with foreign key to base. Best for: many type-specific fields, frequent type-specific queries. Discriminated union in application: base type with type field, type narrowing in code.'

For STI example: 'A notifications table with: id, user_id, type (email, sms, push), message, email_subject (null for sms/push), phone_number (null for email/push), device_token (null for email/sms). Three types in one table. Queries: all notifications for a user (one query, no joins), all email notifications (WHERE type = email). Trade-off: nullable columns for type-specific fields, wasted space, but simple queries and no joins.'

AI generates: separate email_notifications, sms_notifications, and push_notifications tables with duplicated shared columns. Querying all notifications for a user requires UNION across three tables. Adding a new notification type means a new table and updating every query. With STI: add a new type value and its specific columns. One table, one query, extensible without schema restructuring.

Rule 3: Soft Delete Patterns

The rule: 'Use soft deletes for data that may need recovery, has audit requirements, or is referenced by other records. Pattern: add deleted_at TIMESTAMP DEFAULT NULL to the table. Active records: WHERE deleted_at IS NULL. Deleted records: WHERE deleted_at IS NOT NULL. Restore: UPDATE SET deleted_at = NULL. The application default query filter excludes soft-deleted records. Admin queries can include them explicitly.'

For implementation: 'ORM/query builder level: create a default scope that filters deleted_at IS NULL. Drizzle: create a wrapper function: const activeUsers = () => db.select().from(users).where(isNull(users.deletedAt)). Every query uses activeUsers() instead of db.select().from(users). Deleted records are invisible by default but accessible when needed. Foreign key considerations: soft-deleted parents should not cascade — the child records remain active and may reference the deleted parent.'

AI generates: DELETE FROM users WHERE id = 123 — the user is gone. Their orders reference a non-existent user. Their activity history is lost. Compliance requires: who deleted what, when, and the ability to restore. Soft delete: the user record stays with deleted_at set. Orders still reference a valid row. History is preserved. Restoration is one UPDATE away.

deleted_at TIMESTAMP DEFAULT NULL — null means active, non-null means deleted
Default query filter: WHERE deleted_at IS NULL on every read query
Restore: UPDATE SET deleted_at = NULL — one query recovery
Index on deleted_at for efficient filtering of active records
Hard delete only for: GDPR erasure, test data cleanup, truly ephemeral data

⚠️ DELETE = Gone Forever

DELETE FROM users WHERE id = 123: the user is gone, orders reference nothing, activity history is lost, compliance is violated. Soft delete (deleted_at timestamp): record stays, references valid, history preserved, restoration is one UPDATE. Hard delete only for GDPR erasure.

Rule 4: Temporal Data Modeling

The rule: 'For data that changes over time and where history matters, use temporal columns: valid_from TIMESTAMP NOT NULL, valid_to TIMESTAMP DEFAULT NULL (null means current). A price change: INSERT new row with valid_from = now, UPDATE previous row with valid_to = now. Current price: WHERE valid_to IS NULL. Price on a specific date: WHERE valid_from <= date AND (valid_to IS NULL OR valid_to > date). Full price history: ORDER BY valid_from.'

For slowly changing dimensions: 'Type 1 (overwrite): UPDATE the value, lose history. Type 2 (add row): INSERT new row with version dates, keep full history. Type 3 (add column): add previous_value column, keep one level of history. Type 2 is the most common for: pricing (what was the price when the order was placed?), addresses (where did we ship to?), and roles (when was the user promoted to admin?). The temporal query answers: what was the state at time T?'

AI generates: UPDATE products SET price = 29.99 — the previous price is overwritten. A customer disputes a charge: what was the price when they ordered? Unknown. With temporal data: the price at order time is preserved. The current price is in the row with valid_to = NULL. Historical prices are in rows with valid_to set. Every question about past state has an answer.

ℹ️ What Was the Price on March 15th?

UPDATE products SET price = 29.99 overwrites the previous price. Customer disputes a charge: what was the price when they ordered? Unknown. Temporal columns (valid_from/valid_to): every past state is preserved. Every historical question has an answer.

Rule 5: Schema Evolution Planning

The rule: 'Design schemas for evolution: prefer additive changes (new tables, new nullable columns) over destructive changes (drop column, rename column, change type). Every schema change should pass the backward compatibility test: can the previous version of the application still work with this schema? If yes, deploy the schema first, then the code. If no, use expand-then-contract to make it backward-compatible.'

For naming conventions: 'Establish naming conventions from day one: snake_case for all identifiers (tables, columns, indexes), plural table names (users, orders, products), _id suffix for foreign keys (user_id, order_id), _at suffix for timestamps (created_at, updated_at, deleted_at), _count suffix for denormalized counts (comment_count), and is_ prefix for booleans (is_active, is_verified). Conventions prevent: naming inconsistencies that accumulate over years, confusion between similar fields, and schema documentation overhead.'

AI generates: inconsistent naming (userId in one table, user_id in another, UserID in a third), no convention for timestamps (created vs createdAt vs creation_date), and no evolution strategy (direct ALTER TABLE in production). Conventions established on day one and enforced in migrations: every developer writes consistent schemas, every query is predictable, and schema diffs in PRs are easy to review.

Complete Data Modeling Rules Template

Consolidated rules for data modeling.

ER diagram before schema: entities, relationships, cardinality, attributes
Polymorphic patterns: STI for few types + mostly shared fields, CTI for many type-specific fields
Soft delete: deleted_at timestamp, default filter, restorable, audit-preserving
Temporal data: valid_from/valid_to for historical queries, Type 2 SCD for full history
Additive schema evolution: new tables, nullable columns — backward compatible by default
Naming conventions from day one: snake_case, plural tables, _id/_at/_count suffixes
JSON columns for truly unstructured data — not a substitute for proper relationships
Five minutes of ER diagramming prevents five hours of refactoring