AI Exposes Every Service Directly
AI generates microservice architectures with: every service publicly accessible (users-api.company.com, orders-api.company.com, payments-api.company.com — 6 different endpoints for the client to manage), no centralized rate limiting (each service implements its own, inconsistent), no centralized auth (each service validates tokens independently — 6 implementations of the same logic), no response aggregation (client makes 4 API calls to render one dashboard page), and no request transformation (clients must know the internal API structure of each service).
An API gateway solves this by: providing a single entry point (api.company.com for all services), centralizing cross-cutting concerns (rate limiting, auth, logging, CORS — configured once), aggregating responses (one client request, gateway fans out to multiple services, returns a combined response), transforming requests (public API shape differs from internal service shape), and enabling service evolution (move endpoints between services without changing the client). AI generates none of these.
These rules cover: request routing to backend services, rate limiting at the gateway, authentication offloading, response aggregation, request/response transformation, and gateway selection criteria.
Rule 1: Centralized Request Routing
The rule: 'Route all client requests through a single API gateway endpoint. The gateway routes by path prefix: /api/users/* → users-service, /api/orders/* → orders-service, /api/payments/* → payments-service. The client knows one hostname (api.company.com); the gateway knows where each service lives. Service URLs can change (new deployment, different port, different cluster) without any client changes — update the gateway routing table.'
For path-based vs header-based routing: 'Path-based (/api/v1/users → users-service): simple, visible, cacheable by CDN. Header-based (X-Service: users → users-service): flexible but invisible in URLs and harder to debug. Version-based (/api/v1/* → v1-services, /api/v2/* → v2-services): enables version-level routing for API migrations. Use path-based as the default; header-based for advanced scenarios like canary deployments (route 5% of traffic to a new service version by header).'
AI generates: the client hardcodes 6 different service URLs. Service A moves to a new port: update the client, deploy, wait for users to refresh. With a gateway: update the routing table, zero client changes. The gateway is the indirection layer that decouples clients from service topology. Service refactoring (splitting one service into two) changes the routing table, not the client.
- Single entry point: api.company.com for all services — one hostname for clients
- Path-based routing: /api/users/* → users-service, /api/orders/* → orders-service
- Service URL changes: update gateway routing, zero client changes needed
- Version routing: /api/v1/* vs /api/v2/* for API migration periods
- Canary routing: 5% of traffic to new service version via header or weight
Rule 2: Rate Limiting at the Gateway
The rule: 'Implement rate limiting at the API gateway, not in individual services. The gateway is the first point of contact — it rejects excessive requests before they reach any service. Rate limit by: API key (100 requests/minute per key), IP address (fallback for unauthenticated requests), endpoint (different limits for search vs CRUD), and tier (free: 100/min, pro: 1000/min, enterprise: 10000/min). Return 429 Too Many Requests with Retry-After header.'
For distributed rate limiting: 'Use Redis for rate limit counters across multiple gateway instances. Pattern: INCR rate:{apiKey}:{minute} with EXPIRE 60. If count > limit, reject. Redis is shared across all gateway instances — a user cannot bypass the limit by hitting different instances. Sliding window algorithm: more accurate than fixed window (prevents burst at window boundaries). Token bucket: allows controlled bursts while enforcing average rate.'
AI generates: rate limiting in each service independently. Users-service allows 100/min, orders-service allows 100/min — a client making 100 user requests and 100 order requests is under each limit but sending 200 requests total. Gateway rate limiting: one limit per API key across all services. 100 total requests, regardless of which endpoints they hit. Consistent, centralized, impossible to game by spreading requests across services.
Per-service rate limiting: 100/min on users + 100/min on orders = 200 total requests bypass your intent. Gateway rate limiting: 100 total per API key across all services. One counter, impossible to game by spreading requests across endpoints.
Rule 3: Authentication Offloading
The rule: 'Validate authentication at the gateway, not in each service. The gateway: extracts the token from the Authorization header, validates it (JWT signature verification, expiration check, token denylist check), and forwards the request with a trusted header (X-User-Id, X-User-Role). Backend services trust the gateway headers — they do not validate the token themselves. This eliminates: token validation code in every service, shared secret distribution to every service, and inconsistent validation across services.'
For the trust boundary: 'The gateway is the trust boundary. Traffic that reaches a backend service has already been authenticated — the service can trust the gateway headers. Requirement: backend services are not publicly accessible (only the gateway can reach them, via internal network or service mesh). If a service is publicly accessible, an attacker can send fake X-User-Id headers directly. Network isolation: the gateway is public, services are private, headers are trusted within the private network.'
AI generates: every service validates the JWT independently. The JWT secret must be distributed to all 6 services. One service has a validation bug: security hole. JWT secret rotated: update all 6 services simultaneously. Gateway auth offloading: one validation implementation, one secret, one place to update. Services receive pre-validated user context in headers. Six implementations reduced to one.
- Gateway validates token: JWT signature, expiration, denylist check
- Forward trusted headers: X-User-Id, X-User-Role to backend services
- Services trust gateway headers: no token validation code in each service
- Network isolation: services are private, only gateway is publicly accessible
- One validation implementation, one secret, one place to update on rotation
Every service validates JWT independently: 6 implementations, 6 copies of the secret, one validation bug = security hole. Gateway auth offloading: one validation, one secret, services receive trusted X-User-Id headers. Secret rotation: update one place, not six.
Rule 4: Response Aggregation
The rule: 'Aggregate responses from multiple services into a single client response. The dashboard needs: user profile (users-service), recent orders (orders-service), and notification count (notifications-service). Without aggregation: the client makes 3 API calls, handles 3 loading states, and coordinates the display. With gateway aggregation: one client request (GET /api/dashboard), the gateway fans out to 3 services in parallel, combines the responses, and returns one JSON object.'
For the aggregation pattern: 'Gateway endpoint: GET /api/dashboard. Gateway logic: const [user, orders, notifications] = await Promise.all([fetch(usersService + "/me"), fetch(ordersService + "/recent"), fetch(notificationsService + "/count")]); return { user: await user.json(), recentOrders: await orders.json(), notificationCount: await notifications.json() }. Parallel fan-out: total latency = max(individual latencies), not sum. If user takes 50ms, orders takes 100ms, and notifications takes 30ms: aggregated response returns in 100ms, not 180ms.'
AI generates: the client fetches from 3 services sequentially. 50ms + 100ms + 30ms = 180ms total (waterfall). Or the client fetches in parallel but manages 3 Promise states, 3 error handlers, and 3 loading indicators. Gateway aggregation: one request, one response, one loading state, one error handler on the client. 100ms total (parallel fan-out at the gateway). Simpler client code, faster response, fewer round trips.
Client fetches 3 services sequentially: 50+100+30 = 180ms waterfall. Gateway aggregation with parallel fan-out: max(50,100,30) = 100ms. One request, one response, one loading state. Simpler client code, 44% faster response.
Rule 5: API Gateway Selection Criteria
The rule: 'Choose a gateway based on your infrastructure: Managed cloud (AWS API Gateway, Google Cloud Endpoints, Azure API Management): zero infrastructure to manage, pay per request, integrates with cloud IAM. Good for: cloud-native applications, serverless backends. Self-hosted (Kong, Traefik, APISIX): full control, plugin ecosystem, open-source options. Good for: on-premises, multi-cloud, or complex routing needs. Next.js middleware: lightweight gateway for Next.js applications (rewrites, auth checks, rate limiting). Good for: monolithic Next.js apps that need gateway-like features without a separate service.'
For the decision matrix: 'Under 10 services, cloud-native: managed API Gateway (least operational overhead). Over 10 services, complex routing: Kong or APISIX (rich plugin ecosystem, custom plugins). Serverless backend: AWS API Gateway + Lambda (native integration). Next.js full-stack: middleware.ts for auth, rewrites, and rate limiting (no additional infrastructure). Do not over-engineer: if you have 3 services behind a reverse proxy (Nginx, Caddy), that may be sufficient. A full API gateway is justified when you need: rate limiting, auth offloading, response aggregation, or API versioning at the routing layer.'
AI generates: either no gateway (services directly exposed) or an enterprise API gateway for a 2-service application (massive operational overhead for minimal benefit). The right gateway: matches the infrastructure complexity. 2 services: Nginx reverse proxy. 5 services: Next.js middleware or lightweight Traefik. 20 services: Kong or managed API Gateway. Match the gateway investment to the architecture complexity.
Complete API Gateway Rules Template
Consolidated rules for API gateway patterns.
- Single entry point: one hostname, path-based routing to backend services
- Rate limiting at gateway: per API key, per IP, per endpoint, per tier — Redis-backed
- Auth offloading: validate token once at gateway, forward trusted headers to services
- Network isolation: services private, only gateway public — trust boundary at the edge
- Response aggregation: parallel fan-out, combined response, one client request
- Managed (AWS API Gateway) for serverless, self-hosted (Kong) for complex routing
- Next.js middleware for lightweight gateway in monolithic Next.js apps
- Match gateway investment to architecture complexity: Nginx for 2 services, Kong for 20