AI Rules for Microservice Patterns

AI Builds Distributed Monoliths

AI generates microservices with: synchronous call chains (service A calls B calls C calls D — one failure cascades), shared databases (three services reading and writing the same tables), no circuit breakers (a slow downstream service makes every upstream service slow), no distributed tracing (an error occurs somewhere in a 6-service call chain — where?), and no contract testing (service A deploys a breaking change, service B discovers it in production). The result: all the operational complexity of microservices with all the coupling of a monolith.

Modern microservice patterns provide: domain-aligned boundaries (each service owns a bounded context), independent data stores (no shared databases), resilience patterns (circuit breakers, bulkheads, retries with backoff), observability (distributed tracing, centralized logging, health checks), and contract testing (consumer-driven contracts catch breaking changes before deployment). AI generates none of these.

These rules cover: service boundary design, API contract testing, circuit breaker resilience, distributed tracing and observability, health checks, and when microservices are actually appropriate.

Rule 1: Domain-Aligned Service Boundaries

The rule: 'Each microservice owns a bounded context from the domain model. The service owns: its data (private database or schema — no other service reads or writes directly), its API (the only way other services interact with it), and its deployment (independently deployable without coordinating with other teams). Boundaries align with: team ownership (one team per service), business capability (orders, payments, shipping), and data ownership (the service that creates the data owns the data).'

For the shared database anti-pattern: 'Two services reading the same database table are not microservices — they are a monolith with extra network hops. When service A changes the table schema, service B breaks. With owned data: service A exposes an API, service B calls the API. Service A can change its internal schema freely — the API is the contract, not the database schema. Data duplication across services is acceptable (eventual consistency) — shared tables are not.'

AI generates: three services sharing a users table, an orders table, and a products table. Changing the orders schema requires coordinating three service deployments. With owned data: the order service owns the orders table, exposes GET /orders/:id. The shipping service calls the API, not the database. The order service can migrate its schema at will. Zero coordination.

One service = one bounded context = one team = one private data store
No shared databases — the API is the contract, not the schema
Data duplication acceptable (eventual consistency) — shared tables are not
Deploy independently: changing one service never requires redeploying another
If two services must deploy together, they should be one service

💡 If They Deploy Together, They Are One Service

Two services sharing a database table: changing the schema requires coordinating both deployments. That is a monolith with extra network hops. True microservices deploy independently. If two services must deploy together, merge them — you get the simplicity back without losing anything.

Rule 2: Consumer-Driven Contract Testing

The rule: 'Use consumer-driven contracts (Pact or similar) to verify API compatibility between services before deployment. The consumer defines the contract: "I call GET /orders/123 and expect { id, status, total }". The provider verifies: "I satisfy all consumer contracts." If the provider makes a breaking change (removes the status field), the contract test fails before deployment — not in production at 3 AM.'

For the contract workflow: '(1) Consumer writes a contract (expected request and response). (2) Contract is shared via a Pact Broker or artifact. (3) Provider runs contract verification in CI. (4) If verification passes: both can deploy independently. (5) If verification fails: the provider knows which consumer will break and can negotiate the change. This replaces: integration environments where all services must be deployed together to test compatibility.'

AI generates: no contract testing. Service A adds a required field to its request body. Service B does not send it. Discovery: production 500 errors. With contracts: service A CI fails because the new required field breaks consumer B contract. Service A knows before deploying that the change is breaking. The conversation happens in a PR review, not a production incident.

Rule 3: Circuit Breaker Pattern

The rule: 'Wrap every external service call in a circuit breaker. States: Closed (normal — requests flow through), Open (tripped — requests fail immediately without calling the service, returning a fallback), Half-Open (testing — allow one request through to check if the service recovered). Libraries: opossum (Node.js), resilience4j (Java). Configuration: trip after 5 failures in 30 seconds, stay open for 60 seconds, then half-open.'

For cascading failure prevention: 'Without a circuit breaker: service A calls service B (slow, 30s timeout). Service A threads are all waiting on B. Service A stops responding. Service C calls service A (now slow). Service C threads fill up. Three services down from one slow service. With a circuit breaker: after 5 timeouts, service A circuit opens. Subsequent calls to B return a fallback immediately (10ms, not 30s). Service A stays healthy. Cascade prevented.'

AI generates: await fetch('http://service-b/api/data') with no timeout, no retry, no fallback. Service B goes down: service A hangs on every request. With a circuit breaker: 5 failures trip the circuit, subsequent requests return a cached response or degraded result in milliseconds. Service A continues working with reduced functionality instead of being completely unavailable.

Circuit breaker states: Closed (normal), Open (fail fast), Half-Open (testing recovery)
Trip threshold: 5 failures in 30 seconds — open for 60 seconds, then test
Fallback response: cached data, default values, or graceful degradation
Prevents cascading failures: one slow service does not bring down everything
opossum (Node.js), resilience4j (Java) — battle-tested libraries

⚠️ One Slow Service Kills Three

Without circuit breakers: service B is slow (30s timeout), service A threads fill up waiting, service A stops responding, service C calls slow A, cascade continues. Circuit breaker: after 5 timeouts, fail fast with fallback in 10ms. Service A stays healthy. Cascade prevented.

Rule 4: Distributed Tracing and Observability

The rule: 'Implement distributed tracing across all services: every request gets a trace ID (generated at the edge or first service), every service propagates the trace ID in headers (traceparent for W3C Trace Context), and every service reports spans (operation name, duration, status) to a tracing backend (Jaeger, Zipkin, Datadog, Honeycomb). A single trace shows the complete request journey across 6 services — which service was slow, which failed, and why.'

For the three pillars: 'Observability requires: (1) Distributed traces (request flow across services with timing), (2) Structured logs (JSON logs with traceId, serviceId, and context — correlatable across services), (3) Metrics (request rate, error rate, latency percentiles per service — RED metrics). All three share the trace ID: a latency spike in metrics leads to traces showing which service is slow, which leads to logs showing the error details. Without correlation, debugging microservices is searching haystacks.'

AI generates: console.log('Error processing order') in one of six services. Which service? Which request? What was the input? What failed? Without distributed tracing, debugging requires: searching logs across 6 services, guessing which request matches, and correlating timestamps manually. With tracing: click the error, see the full trace, identify the slow span, read the correlated logs. Minutes instead of hours.

Rule 5: When Microservices Are Appropriate

The rule: 'Use microservices when: (1) multiple teams need to deploy independently (team autonomy is the primary driver), (2) different services have different scaling requirements (compute-heavy image processing vs I/O-heavy API serving), (3) different services need different technology stacks (ML service in Python, API in Node.js), (4) the domain is large enough for meaningful bounded contexts (5+ distinct domains). Do not use for: small teams (under 10 developers), simple applications, startups validating product-market fit, or applications without clear domain boundaries.'

For the monolith-first approach: 'Start with a well-structured monolith. Extract services only when: a specific module needs independent scaling, a team boundary demands independent deployment, or technology requirements diverge. The modular monolith (clear module boundaries within one deployment) gives 80% of microservice benefits with 20% of the operational complexity. Extract modules into services when the monolith provably constrains you.'

AI defaults to either: microservices from day one (massive operational overhead for a 3-person team) or a tangled monolith (no module boundaries, impossible to extract later). The pragmatic path: modular monolith with clean boundaries. When team size, scale, or technology requirements demand it: extract modules into services. The boundaries are already clean — the extraction is mechanical, not architectural.

ℹ️ Modular Monolith = 80% of the Benefits

A well-structured modular monolith with clean boundaries gives 80% of microservice benefits (team autonomy, clear ownership) with 20% of the operational complexity (one deployment, one database, one log stream). Extract to services only when the monolith provably constrains you.

Complete Microservice Patterns Rules Template

Consolidated rules for microservice patterns.

Domain-aligned boundaries: one service = one bounded context = one private data store
No shared databases — API is the contract, data duplication over shared tables
Consumer-driven contract testing: catch breaking changes before deployment, not in production
Circuit breakers on every external call: fail fast, return fallback, prevent cascading failure
Distributed tracing: trace ID propagated through all services, visualize full request journey
Three pillars: traces + structured logs + metrics, correlated by trace ID
Monolith-first: start modular, extract services when team/scale/tech demands it
Not for small teams or simple apps — operational complexity must be justified by real constraints