Scale-to-Zero vs Always-On Compute
Lambda (serverless functions): your code runs only when invoked. Between invocations: no compute is running, no cost is accruing. A request arrives: Lambda starts a new instance (cold start) or reuses a warm instance (warm start). After the response: the instance may stay warm for minutes, then shuts down. Billing: per invocation + execution duration. 1 million invocations at 200ms each: pay for 200,000 seconds of compute. Zero traffic: $0. Frameworks: AWS Lambda, Vercel Functions, Cloudflare Workers, Google Cloud Functions.
Containers: your code runs persistently in Docker containers orchestrated by ECS, EKS, or Kubernetes. The containers: are always running, always ready to handle requests (no cold start). Between requests: the container idles but still uses resources (CPU and memory allocated). Billing: per container-hour or per resource allocation (CPU + memory x uptime). A container running 24/7: costs the same whether it handles 0 or 10,000 requests. Frameworks: Docker + ECS/EKS, Kubernetes, Cloud Run, Fly.io.
Without compute model rules: the AI generates Lambda-style stateless handlers in a container project (missing the persistent connection advantage), container-style persistent state in Lambda (state is lost between invocations), heavyweight initialization in Lambda (slow cold starts), or always-on pricing assumptions in Lambda cost estimates (Lambda pricing is per-invocation, not per-hour). The compute model determines: initialization strategy, connection management, state handling, and cost optimization.
Cold Starts: Lambda Problem, Container Non-Issue
Lambda cold start: the first invocation after idle starts a new instance. Cold start includes: loading the runtime (Node.js: 50-100ms), loading your code (bundle size matters: 1MB = 100ms, 10MB = 500ms), and initialization (database connection, config loading: varies). Total cold start: 200ms-2s depending on language, bundle size, and initialization. Warm invocations: instant (the instance is already running). AI rule: 'Lambda: minimize bundle size (tree-shake, no unused imports). Lazy-load heavy dependencies. Initialize connections outside the handler (reuse on warm invocations). Cold start budget: under 500ms.'
Container cold start: effectively zero for running containers. The container: starts once (during deployment), stays running, and handles all requests without cold start. Container restart: only on deployment, crash, or scaling up new instances. Scaling up new instances: 5-30 seconds (container image pull + startup), but existing instances handle traffic during scale-up. AI rule: 'Containers: initialize everything at startup (database pools, config, caches). Cold start is a deployment concern, not a per-request concern. Optimize startup time for deployment speed, not invocation speed.'
The cold start rule prevents: the AI adding heavyweight initialization inside the Lambda handler (runs on every cold start โ move it outside the handler), lazy-loading dependencies in a container (unnecessary โ load everything at startup for fastest request handling), or assuming instant response in Lambda (the first request after idle has cold start latency). Cold start optimization is: critical for Lambda, irrelevant for containers.
- Lambda cold start: 200ms-2s (runtime + code + init). Minimize bundle, lazy-load, init outside handler
- Container cold start: zero for running containers. Initialize everything at startup
- Lambda: cold start is a per-request concern (first request after idle). Container: deployment concern only
- Lambda optimization: tree-shake imports, lazy-load heavy deps, connection outside handler
- Container optimization: startup speed for deployment, not per-request (already warm)
Lambda cold start: runs on every first invocation. Minimize bundle, lazy-load, init connections outside the handler. Container cold start: runs once at deployment. Initialize everything at startup (pools, caches, config) for fastest per-request handling. Different optimization targets for different compute models.
Database Connections: HTTP per-Query vs Persistent Pool
Lambda database connections: the biggest Lambda challenge. Each Lambda instance: may open a database connection. 1000 concurrent Lambda invocations: 1000 database connections (database limit typically 100-500). Solutions: HTTP-based drivers (@neondatabase/serverless โ no persistent connection, each query is HTTP), connection poolers (PgBouncer, Neon pooler, RDS Proxy โ many Lambda instances share few database connections), or connection reuse (initialize outside the handler, reuse on warm invocations โ but connection may timeout between invocations). AI rule: 'Lambda: HTTP database driver (@neondatabase/serverless) or external pooler. Never new Pool() per invocation. Lazy connection init outside the handler.'
Container database connections: standard connection pools. const pool = new Pool({ min: 2, max: 10 }) at application startup. The pool: persists for the container lifetime. 10 containers with max 10 connections each: 100 total connections (predictable, manageable). Connection reuse: handled by the pool (connections are: borrowed, used, returned). No per-request connection overhead. AI rule: 'Containers: connection pool at startup (min: 2, max: 10). Pool persists for container lifetime. Standard pg/mysql2/ioredis drivers. Connection reuse is automatic.'
The connection rule prevents: the AI creating new Pool() inside a Lambda handler (connection pool created and destroyed per invocation โ the pool never reuses connections), using HTTP database drivers in containers (unnecessary โ TCP pools are faster and standard), or assuming unlimited connections in Lambda (1000 Lambdas x new connection each = database overwhelmed). The connection strategy is: the most critical difference between Lambda and container architectures.
- Lambda: HTTP driver or external pooler. Never per-invocation Pool(). Connection outside handler
- Containers: TCP connection pool at startup. min: 2, max: 10. Persistent for container lifetime
- Lambda risk: 1000 instances x 1 connection = 1000 connections (DB limit: 100-500)
- Container predictable: 10 containers x 10 max = 100 connections (manageable)
- Most critical difference: connection management strategy determines reliability at scale
1000 concurrent Lambda invocations each creating a new database connection: 1000 connections. Database limit: 100-500. Database overwhelmed. Fix: HTTP database driver (each query = HTTP request, no persistent connection) or external pooler (PgBouncer, RDS Proxy). The connection strategy determines: Lambda reliability at scale.
State: Stateless by Design vs Persistent State
Lambda state handling: stateless by design. Each invocation: should be independent (no shared state between invocations guaranteed). Global variables: may persist on warm instances (useful for connection reuse) but are NOT guaranteed (cold start resets everything). State must be: external (Redis, DynamoDB, S3, database). AI rule: 'Lambda: stateless handlers. External state: Redis for cache, DynamoDB for session, S3 for files. Global variables: for connection reuse only (not reliable state). Each invocation: must work independently.'
Container state handling: in-memory state persists for the container lifetime. Caches (Map, LRU cache): persist across requests (useful for: config caching, frequently-accessed data). Session state: can be in-memory (if sticky sessions are used) or external (Redis for multi-container). File system: writable (temp files persist across requests). AI rule: 'Containers: in-memory caching at the process level (LRU cache, Map). File system writable for temp files. External state: Redis for multi-container consistency. In-memory state: fastest but lost on container restart.'
The state rule prevents: the AI relying on in-memory state in Lambda (lost on cold start, not shared between instances), using file system in Lambda for persistent data (ephemeral /tmp, lost on cold start, small size limit), or assuming external state in containers when in-memory would suffice (Redis for data that could be a process-level Map โ over-engineering). Lambda: always external state. Containers: in-memory for performance, external for durability and multi-container consistency.
- Lambda: stateless. External state: Redis, DynamoDB, S3. Global vars: connection reuse only
- Containers: in-memory state persists across requests. LRU cache, Map for process-level caching
- Lambda /tmp: ephemeral, small, lost on cold start. Container filesystem: writable, persistent
- Lambda: every invocation must work independently (no guaranteed warm instance reuse)
- Containers: in-memory for performance, external (Redis) for multi-container consistency
When to Choose Each Compute Model
Choose Lambda when: traffic is unpredictable or bursty (scale-to-zero during quiet periods, scale up during peaks), you want pay-per-use (zero traffic = $0 โ ideal for: side projects, staging environments, and infrequent APIs), the functions are: short-lived (under 15 minutes), stateless, and lightweight, or you use: Next.js on Vercel (Vercel Functions are Lambda-based, optimized for Next.js). Lambda is: the cost-optimized choice for variable traffic and stateless workloads.
Choose containers when: traffic is steady and predictable (always-on containers are: cheaper per request than Lambda at high, consistent traffic), you need: persistent connections (database pools, WebSocket connections, long-lived processes), the application has: startup overhead (heavy initialization that would cause Lambda cold starts), or you need: full control (custom runtimes, GPU access, persistent state, filesystem access). Containers are: the control-maximized choice for predictable, stateful workloads.
The break-even: Lambda is cheaper until approximately 1 million requests per day at typical durations. Above that: containers with reserved capacity are cheaper. Below that: Lambda's per-invocation pricing wins. The AI rule should specify: which compute model the project uses, so the AI generates: Lambda-appropriate handlers (stateless, connection-aware, cold-start-optimized) or container-appropriate code (stateful, pooled connections, startup-optimized).
- Lambda: unpredictable traffic, pay-per-use, stateless, under 15min execution, Vercel/Netlify
- Containers: steady traffic, persistent connections, heavy init, full control, GPU access
- Break-even: ~1M requests/day. Below = Lambda cheaper. Above = containers cheaper
- Lambda default: side projects, staging, APIs with variable traffic, Next.js on Vercel
- Container default: production APIs with steady traffic, WebSocket apps, stateful services
Below ~1M requests/day: Lambda is cheaper (pay per invocation, zero traffic = $0). Above: containers with reserved capacity are cheaper per request. Side projects and variable traffic: Lambda. Production APIs with steady traffic: containers. The compute model rule tells the AI: which billing model to optimize for.
Compute Model Rule Summary
Summary of Lambda vs container AI rules.
- Cold start: Lambda 200ms-2s (minimize bundle, lazy-load). Containers: zero (init at startup)
- Connections: Lambda HTTP driver or pooler (never per-invocation pool). Containers: TCP pool at startup
- State: Lambda stateless (external only). Containers: in-memory + external for durability
- Billing: Lambda per-invocation (zero = $0). Containers per-uptime (always running = fixed cost)
- Scaling: Lambda instant (per-invocation). Containers 5-30s (new instance provisioning)
- Lambda for: variable traffic, stateless, side projects. Containers for: steady traffic, stateful, persistent
- Break-even: ~1M requests/day. Lambda below, containers above
- Connection management is the critical rule: Lambda + wrong driver = database overwhelmed at scale