AI Rules for Serverless Patterns

AI Writes Serverless Like It Is a Server

AI generates serverless functions with: persistent database connections (connection pools that exhaust database limits), in-memory state between invocations (global variables that disappear on cold start), no cold start awareness (heavy imports and initialization on every invocation), no idempotency (retries cause duplicate operations), and synchronous long-running operations (functions that timeout before completing). Each of these patterns works on a traditional server but fails in serverless environments.

Modern serverless patterns are: stateless (no shared state between invocations — use external stores), cold-start-optimized (minimal imports, lazy initialization, provisioned concurrency), connection-pooled (HTTP-based drivers or external poolers like PgBouncer), idempotent (safe to retry — same input produces same result without side effects), and timeout-aware (break long operations into steps, use queues for async processing). AI generates none of these.

These rules cover: stateless function design, cold start mitigation, database connection strategies, idempotent operations, timeout handling, and cost-aware architecture.

Rule 1: Stateless Function Design

The rule: 'Every serverless function invocation is independent. Do not rely on: global variables persisting between invocations (they might, via warm instances, but this is not guaranteed), filesystem state (the /tmp directory may be shared across warm invocations but is ephemeral), or in-memory caches (lost on cold start). Use external stores for state: Redis for caching, DynamoDB/Postgres for persistence, S3/R2 for files.'

For warm instance reuse: 'Serverless platforms reuse warm instances for subsequent invocations — global variables and connections may persist. This is an optimization, not a guarantee. Code must work correctly on both: cold start (nothing persists) and warm invocation (globals from previous call exist). Pattern: initialize lazily and check validity: let db: DB | null = null; function getDB() { if (!db) db = createConnection(); return db; }.'

AI generates: let requestCount = 0; handler = () => { requestCount++; return { count: requestCount }; } — expecting requestCount to track across requests. On a warm instance: it increments. On a cold start: it resets to 0. On concurrent invocations: each instance has its own counter. Global mutable state in serverless is a race condition factory. External state (Redis, database) is the only reliable source of truth.

No global mutable state — each invocation is independent
External stores for state: Redis (cache), Postgres (persistence), S3 (files)
Warm instance reuse is an optimization, not a guarantee — code for cold starts
Lazy initialization: create connections on first use, reuse on warm invocations
Check connection validity on reuse — connections may have timed out between invocations

Rule 2: Cold Start Mitigation

The rule: 'Minimize cold start time: (1) reduce bundle size (tree-shake imports, avoid importing entire SDKs), (2) lazy-load heavy dependencies (import inside the handler, not at module level), (3) use lighter runtimes (Node.js 20 starts faster than Java; edge runtimes start in sub-milliseconds), (4) provisioned concurrency (keep N instances warm for latency-sensitive endpoints), (5) avoid VPC attachment unless necessary (VPC cold starts add 1-5 seconds on AWS Lambda).'

For import optimization: 'Module-level imports run on every cold start. import AWS from 'aws-sdk' imports the entire AWS SDK (3MB) even if you use only S3. Replace with: import { S3Client } from '@aws-sdk/client-s3' (200KB). Or lazy-import inside the handler: const { S3Client } = await import('@aws-sdk/client-s3'). The lazy import runs only when that code path is hit, not on every cold start.'

AI generates: import everything at the top level — 20 imports, 5MB of dependencies, 2-second cold start. After optimization: 5 essential imports (500KB), heavy dependencies lazy-loaded, cold start drops to 200ms. The function handles 95% of requests from warm instances (zero cold start); the 5% cold starts are 200ms instead of 2 seconds.

💡 5MB to 500KB, 2s to 200ms

20 top-level imports, 5MB of dependencies, 2-second cold start. After optimization: 5 essential imports (500KB), heavy deps lazy-loaded, cold start drops to 200ms. 95% of requests hit warm instances (zero cold start); the 5% cold starts are 10x faster.

Rule 3: Serverless Database Connection Strategies

The rule: 'Do not create a new database connection per invocation — databases have connection limits (100-500 typically). Strategies: (1) HTTP-based drivers (@neondatabase/serverless, PlanetScale serverless) — no persistent connections, each query is an HTTP request. (2) External connection pooler (PgBouncer, Neon pooler, Supabase pooler) — 100 serverless functions share 10 database connections. (3) Lazy singleton — reuse the connection on warm invocations, create on cold start.'

For Neon serverless: 'import { neon } from "@neondatabase/serverless"; const sql = neon(DATABASE_URL); const users = await sql("SELECT * FROM users"); — each query is an HTTP request. No connection to manage, no pool to configure, no limit to hit. Works at the edge (no TCP), scales to unlimited concurrency (no connection pool), and cold starts are instant (no connection establishment). The trade-off: slightly higher per-query latency (HTTP vs TCP) offset by zero connection overhead.'

AI generates: const pool = new Pool({ connectionString, max: 10 }) at module level. 1000 concurrent Lambda invocations = 10,000 connection attempts (10 per instance). The database allows 100. 9,900 connections fail. HTTP-based drivers: unlimited concurrency, zero connection management, zero database connection exhaustion.

HTTP-based drivers: @neondatabase/serverless, PlanetScale — no connection limits
External pooler: PgBouncer, Neon pooler — many functions share few DB connections
Lazy singleton: reuse on warm, create on cold — check connection validity
Never new Pool() per invocation — 1000 functions x 10 connections = database exhaustion
Edge-compatible: HTTP drivers work everywhere, TCP pools work only in Node.js runtime

⚠️ 10,000 Connections on a 100-Connection DB

new Pool({ max: 10 }) in 1000 concurrent Lambdas = 10,000 connection attempts. Database allows 100. HTTP-based drivers (@neondatabase/serverless): each query is an HTTP request. Unlimited concurrency, zero connection management, zero exhaustion.

Rule 4: Idempotent Operations

The rule: 'Every serverless function must be safe to retry. Serverless platforms retry on: timeout, infrastructure error, and explicit retry configuration. If your function charges a credit card and the platform retries, the customer is charged twice. Idempotency: use an idempotency key (unique request ID) to detect and skip duplicate operations. Pattern: check if the operation with this key was already completed; if yes, return the cached result; if no, execute and store the result with the key.'

For implementation: 'The caller sends an idempotency key (Idempotency-Key header or request body field). The function: (1) checks the idempotency store (Redis or database) for the key, (2) if found, returns the stored result (no re-execution), (3) if not found, executes the operation, stores the result with the key, and returns it. TTL on the idempotency key: 24-48 hours (long enough for retries, short enough to not grow unbounded). Stripe uses this pattern for all payment API calls.'

AI generates: handler = async (event) => { await chargeCard(event.amount); await sendEmail(event.email); } — no idempotency. Platform retries on timeout: customer charged twice, two confirmation emails sent. With idempotency key: the second invocation sees the key in Redis, returns the cached result, skips the charge and email. Same outcome as if the first invocation succeeded cleanly.

ℹ️ Charged Twice on Retry

Function charges a credit card, platform retries on timeout: customer charged twice. Idempotency key: second invocation checks Redis, finds the key, returns cached result, skips the charge. Stripe uses this pattern for all payment API calls. Your serverless functions should too.

Rule 5: Timeout-Aware Execution

The rule: 'Serverless functions have hard timeout limits: AWS Lambda 15 minutes, Vercel Functions 60 seconds (Hobby) or 300 seconds (Pro), Cloudflare Workers 30 seconds. Design for timeouts: (1) break long operations into steps (Step Functions, queues), (2) check remaining time and checkpoint progress (context.getRemainingTimeInMillis() on Lambda), (3) set client timeouts shorter than function timeouts (client retries before the function dies), (4) use queues for async processing (return 202 Accepted, process in background).'

For the queue pattern: 'For operations longer than your timeout: (1) accept the request, (2) enqueue the work (SQS, BullMQ, Inngest), (3) return 202 Accepted with a job ID, (4) process the work in a separate function invocation (or multiple invocations for steps), (5) client polls for completion or receives a webhook. This pattern handles: image processing, report generation, batch imports, and any operation that may exceed the timeout.'

AI generates: a single function that processes 10,000 records synchronously. At record 5,000, the function times out. No checkpoint, no progress saved, no partial result. On retry: starts from record 1 again. With checkpointing: save progress after each batch of 100, resume from the last checkpoint on retry. Or with a queue: each record is a separate job, processed independently, no timeout risk.

Complete Serverless Patterns Rules Template

Consolidated rules for serverless patterns.

Stateless design: no global mutable state — external stores for all state
Cold start optimization: minimal imports, lazy-load heavy deps, provisioned concurrency
HTTP-based database drivers: no connection pool exhaustion, edge-compatible
Idempotent operations: idempotency key prevents duplicate side effects on retry
Timeout-aware: break long work into steps, checkpoint progress, use queues
Return 202 Accepted for async work — process via queue, notify via webhook
Warm instance reuse is optimization, not guarantee — always code for cold starts
Cost awareness: pay per invocation + duration — optimize both execution time and memory