AI Rules for NoSQL Patterns

AI Uses NoSQL Like a Relational Database

AI generates NoSQL schemas with: relational thinking (separate collections for every entity, then joining in application code), no access pattern analysis (modeling data by entity relationships, not by how it will be queried), no denormalization (storing user IDs instead of embedding user data, then fetching users in a second query), schemaless chaos (documents with inconsistent shapes, missing fields, and type mismatches), and wrong consistency model (using eventual consistency for financial transactions, strong consistency for read-heavy analytics). NoSQL is not schema-less SQL — it is a fundamentally different paradigm.

Modern NoSQL patterns are: access-pattern-driven (design the data model around query patterns, not entity relationships), denormalized by design (embed related data to avoid multi-query lookups), schema-validated (Zod or JSON Schema enforced at the application boundary), consistency-aware (choose strong or eventual based on the use case, not by default), and single-table where appropriate (DynamoDB single-table design for related access patterns). AI generates none of these.

These rules cover: access-pattern-driven data modeling, embedding vs referencing decisions, single-table design for DynamoDB, schema validation for schemaless databases, consistency model selection, and SQL vs NoSQL decision criteria.

Rule 1: Model Data by Access Pattern, Not Entity

The rule: 'In NoSQL, the data model follows the queries — not the other way around. Start by listing every access pattern: (1) get user by ID, (2) get all orders for a user, (3) get order with items, (4) get recent orders across all users. Design the data shape to serve each pattern in a single query. If pattern 3 needs order + items in one query, embed items inside the order document. If pattern 4 needs a global index, create a GSI on created_at.'

For the modeling process: 'Step 1: list all access patterns with expected frequency and latency requirements. Step 2: group patterns that access related data. Step 3: design documents/items that serve each group in a single read. Step 4: identify patterns that require secondary indexes. Step 5: validate that no pattern requires a full table scan. This process is the opposite of relational design: relational starts with entities and normalizes; NoSQL starts with queries and denormalizes.'

AI generates: a users collection, an orders collection, and an items collection — three separate collections requiring three queries and application-level joins to display an order. In NoSQL: embed items inside the order document. One query returns the order with all its items. The data is duplicated (item names stored in both the catalog and the order) but the read is a single operation. Read optimization over write normalization.

List all access patterns first — before designing any data structure
Design documents to serve each pattern in a single read operation
Embed related data: order contains items, user contains recent addresses
Secondary indexes for patterns that cross document boundaries
No pattern should require a full table/collection scan in production

💡 Queries First, Schema Second

Relational: design entities, normalize, then figure out queries. NoSQL: list every access pattern first, then design documents to serve each in a single read. The data model follows the queries — not the other way around. This inversion is the core paradigm shift.

Rule 2: Embedding vs Referencing Decisions

The rule: 'Embed when: the data is always accessed together (order + items), the embedded data is bounded (an order has at most 50 items), and the embedded data belongs to the parent (items do not exist without the order). Reference when: the data is accessed independently (user profile accessed without orders), the referenced data is unbounded (a user may have 10,000 orders), or the referenced data is shared (a product referenced by many orders should not be duplicated in each).'

For the size boundary: 'MongoDB document limit: 16MB. DynamoDB item limit: 400KB. Embedding unbounded arrays eventually hits these limits. A user with 10,000 orders embedded: the document grows to megabytes, every read loads all 10,000 orders, and updates are expensive (rewrite the entire document). Reference: store order IDs on the user (bounded array of recent IDs) or query orders by user_id with a secondary index. Bounded embedding for small sets; referencing for large or growing sets.'

AI generates: everything embedded (document grows unbounded until it hits the size limit) or everything referenced (every read requires multiple queries, defeating the purpose of NoSQL). The decision framework: accessed together + bounded + owned = embed. Independent + unbounded + shared = reference. Apply per relationship, not globally.

Rule 3: Single-Table Design for DynamoDB

The rule: 'In DynamoDB, use single-table design: store multiple entity types in one table with structured partition and sort keys. Pattern: PK = USER#123, SK = PROFILE for the user profile. PK = USER#123, SK = ORDER#2026-03-29#456 for a user order. PK = ORDER#456, SK = ITEM#789 for an order item. One table serves: get user profile, get user orders (sorted by date), and get order items — all with single-table queries using PK and SK conditions.'

For GSI overloading: 'Global Secondary Indexes (GSIs) enable additional access patterns. GSI1: PK = ORDER#456, SK = timestamp for fetching orders by ID with timeline. GSI2: PK = STATUS#shipped, SK = timestamp for fetching all shipped orders. The same table and items serve multiple access patterns through different key projections. Design the key schema to support all access patterns without table scans.'

AI generates: separate DynamoDB tables for users, orders, and items — three tables, three query operations, no transactional consistency across tables. Single-table design: one table, one query for related data, TransactWriteItems for multi-entity consistency. The table design is more complex, but every query is a single operation with predictable performance at any scale.

Structured keys: PK = ENTITY#id, SK = TYPE#sort_value — enables range queries
Multiple entity types in one table — related data co-located for single queries
GSI overloading: different access patterns via different index key projections
TransactWriteItems for multi-entity consistency within one table
Predictable performance at any scale — no JOINs, no full scans, O(1) key lookups

⚠️ One Table, All Patterns

Separate DynamoDB tables for users, orders, items: three tables, three queries, no cross-table transactions. Single-table design: PK=USER#123 SK=ORDER#date serves user orders in one query. Same table, same items, different key projections via GSIs. Predictable O(1) performance at any scale.

Rule 4: Schema Validation for Schemaless Databases

The rule: 'Schemaless does not mean schema-free. Define schemas at the application boundary: Zod schemas validate documents before writing and after reading. MongoDB supports JSON Schema validation at the collection level: db.createCollection("orders", { validator: { $jsonSchema: { ... } } }). Every document conforms to a known shape. Schema evolution: add optional fields (backward compatible), never remove or rename fields without a migration, version documents for multi-schema support.'

For schema evolution: 'Version your document schemas: { schemaVersion: 2, ... }. On read: check the version, apply a migration function if needed. Lazy migration: documents are upgraded to the latest schema when read and written back. This avoids: bulk migration downtime (upgrade documents one at a time as they are accessed) and backward compatibility issues (old schema documents are readable and upgradeable). The schema version is the migration version for each document.'

AI generates: db.collection.insertOne({ name: 'Alice', age: '25' }) in one place and db.collection.insertOne({ username: 'Alice', age: 25 }) in another — inconsistent field names, inconsistent types. Three months later: querying by name returns partial results because some documents use username. Zod validation at the write boundary prevents this entirely. One schema definition, enforced on every write.

Rule 5: SQL vs NoSQL Decision Criteria

The rule: 'Choose NoSQL when: (1) access patterns are known and stable (design the schema around them), (2) horizontal scaling is needed (millions of reads/writes per second), (3) data is naturally document-shaped (nested, hierarchical, variable structure), (4) low-latency key-value access is critical (sub-millisecond reads), (5) schema flexibility is genuinely needed (not just laziness). Choose SQL when: (1) access patterns will evolve (ad-hoc queries, analytics, reporting), (2) data has complex relationships (many-to-many, recursive), (3) transactional integrity across entities is required, (4) the team has relational expertise, (5) data integrity constraints are critical.'

For the common mistake: 'Choosing NoSQL because it is trendy or because SQL seems old leads to: reimplementing JOINs in application code, building ad-hoc query layers on top of NoSQL, and losing transactional guarantees that the application actually needs. Most web applications: SQL (Postgres) is the right choice. Content management, IoT telemetry, user sessions, real-time analytics: NoSQL may be the right choice. The decision is use-case-specific, not technology-trend-driven.'

AI generates: MongoDB for a CRUD application with complex relationships and ad-hoc reporting needs — then builds application-level JOINs, aggregation pipelines for simple queries, and discovers that transactional consistency across collections is painful. Postgres with JSONB columns: relational structure where needed, document flexibility where needed, full SQL query power, and ACID transactions. The best of both worlds for most applications.

ℹ️ Postgres + JSONB = Best of Both

Most apps need relational structure somewhere and document flexibility somewhere. Postgres with JSONB columns: SQL JOINs for relationships, JSON for flexible nested data, ACID transactions, and full query power. One database engine, both paradigms.

Complete NoSQL Patterns Rules Template

Consolidated rules for NoSQL patterns.

Model by access pattern: list queries first, design documents to serve each in one read
Embed when: accessed together + bounded + owned. Reference when: independent + unbounded + shared
DynamoDB single-table: structured PK/SK, GSI overloading, TransactWriteItems
Schema validation: Zod at app boundary, JSON Schema at collection level — schemaless is not schema-free
Document versioning: schemaVersion field, lazy migration on read, no bulk migration downtime
SQL for: evolving queries, complex relationships, transactions, reporting
NoSQL for: known access patterns, horizontal scale, document-shaped data, sub-ms reads
Postgres + JSONB: relational structure + document flexibility for most applications