AI Leaves Service Communication Unmanaged
AI generates microservice communication with: plain HTTP between services (no encryption in transit — anyone on the network can read inter-service traffic), no mutual authentication (any process on the network can call any service — no identity verification), no traffic management (cannot gradually shift traffic to a new version, cannot retry failed requests at the infrastructure level), no centralized observability (each service logs independently, no unified view of request flow), and retry logic reimplemented in every service (inconsistent, buggy, duplicated).
A service mesh solves this by: injecting a sidecar proxy next to each service (Envoy proxy handles all network communication), encrypting all inter-service traffic with mutual TLS (mTLS — both sides verify identity), managing traffic (canary deployments, A/B testing, circuit breaking at the infrastructure level), providing unified observability (distributed tracing, metrics, and access logs from every sidecar), and centralizing resilience policies (retries, timeouts, circuit breakers configured once, applied everywhere). AI generates none of these.
These rules cover: sidecar proxy architecture, mutual TLS for zero-trust networking, traffic splitting for deployments, observability integration, resilience policies, and criteria for when a service mesh is justified.
Rule 1: Sidecar Proxy Architecture
The rule: 'A service mesh deploys a sidecar proxy (typically Envoy) alongside each service instance. All inbound and outbound network traffic routes through the sidecar. The service talks to localhost; the sidecar handles: TLS termination and origination, load balancing across instances of the target service, retry logic and timeouts, circuit breaking, and metrics collection. The service code has zero networking logic — it makes plain HTTP calls to localhost, and the sidecar handles everything else.'
For the sidecar injection: 'In Kubernetes: the service mesh control plane automatically injects the sidecar container into every pod (Istio: istio-injection=enabled label on the namespace, Linkerd: linkerd.io/inject=enabled annotation). The application container is unaware of the sidecar — it makes normal HTTP calls. The sidecar intercepts the traffic transparently via iptables rules. No code changes, no library dependencies, no SDK. The mesh is infrastructure-level, not application-level.'
AI generates: each service implements its own HTTP client with retry logic, timeout handling, circuit breaking, and TLS configuration. 10 services: 10 implementations of the same networking patterns, each with slightly different bugs. Sidecar proxy: zero networking code in any service. The sidecar handles all cross-cutting networking concerns. Add a new service: inject the sidecar, get mTLS, retries, tracing, and circuit breaking automatically.
- Envoy sidecar: handles TLS, load balancing, retries, circuit breaking, metrics
- Auto-injection in Kubernetes: label namespace, sidecars injected automatically
- Zero code changes: service makes plain HTTP to localhost, sidecar intercepts
- No networking libraries in application code: sidecar handles everything
- New service: inject sidecar, immediately gets all mesh policies
10 services with custom retry, timeout, and circuit breaker code: 10 implementations with 10 different bugs. Sidecar proxy: zero networking code in any service. The sidecar handles all cross-cutting concerns. New service gets mTLS, retries, and tracing automatically on injection.
Rule 2: Mutual TLS for Zero-Trust Networking
The rule: 'Enable mutual TLS (mTLS) between all services. Regular TLS: the client verifies the server identity. mTLS: both sides verify each other. The sidecar proxy: presents a certificate proving its service identity (issued by the mesh certificate authority), verifies the caller certificate, and encrypts all traffic. Result: no unauthorized service can communicate within the mesh. A compromised container on the network cannot call other services — it does not have a valid mesh certificate.'
For certificate management: 'The service mesh control plane manages certificates automatically: issues short-lived certificates to each sidecar (24-hour validity), rotates certificates before expiry (no manual certificate management), and provides a trust chain (all certificates signed by the mesh CA). Istio: Citadel issues and rotates certificates. Linkerd: the identity controller manages certificates. The application never touches certificates — the sidecar handles all TLS operations transparently.'
AI generates: plain HTTP between services inside the Kubernetes cluster ("the network is trusted"). An attacker who compromises one container can: read all inter-service traffic (no encryption), impersonate any service (no authentication), and move laterally to any service (no authorization). mTLS: every service call is encrypted and authenticated. Lateral movement requires a valid mesh certificate — which compromised containers do not have.
Plain HTTP inside the cluster: a compromised container reads all traffic, impersonates any service, moves laterally everywhere. mTLS: every call encrypted and authenticated. Lateral movement requires a valid mesh certificate — which compromised containers cannot obtain.
Rule 3: Traffic Splitting for Canary Deployments
The rule: 'Use the service mesh for traffic splitting: route a percentage of traffic to a new service version while monitoring metrics. Canary: 5% of traffic to v2, 95% to v1. Monitor: error rate, latency, success rate. If v2 is healthy: increase to 25%, then 50%, then 100%. If v2 has issues: roll back to 0% instantly (no redeployment, just a config change). Istio VirtualService: spec: http: [{ route: [{ destination: { host: myservice, subset: v1 }, weight: 95 }, { destination: { host: myservice, subset: v2 }, weight: 5 }] }].'
For automated rollout: 'Tools like Flagger or Argo Rollouts automate canary analysis: deploy v2, Flagger shifts 5% traffic, monitors error rate for 5 minutes, if healthy shifts to 10%, monitors again, continues to 100%. If any metric exceeds threshold: automatic rollback to v1. The entire deployment is hands-free — metrics-driven promotion or rollback without human intervention. The developer merges the PR; the mesh handles the safe rollout.'
AI generates: kubectl apply the new deployment, all traffic switches immediately. If v2 has a bug: 100% of users affected. Rollback: kubectl apply the old deployment (30 seconds to minutes). With traffic splitting: 5% affected during canary, automatic rollback in seconds if metrics degrade. The blast radius is contained to the canary percentage at every stage.
- Canary: 5% to v2, monitor metrics, promote gradually to 100%
- Instant rollback: config change, not redeployment — seconds, not minutes
- Istio VirtualService weight-based routing: declarative traffic control
- Flagger / Argo Rollouts: automated canary analysis with metric-driven promotion
- Blast radius contained: 5% affected during canary vs 100% with direct deploy
kubectl apply new version: 100% of users on the new code instantly. Bug = everyone affected. Canary with traffic splitting: 5% on new code, metrics monitored, auto-rollback if degradation. The blast radius is always contained to the canary percentage.
Rule 4: Integrated Observability
The rule: 'The service mesh provides observability without application code changes. The sidecar proxy emits: RED metrics per service (Request rate, Error rate, Duration), distributed traces (each sidecar adds a span, full trace assembled across services), and access logs (every inter-service request logged with source, destination, status, latency). Integration: Prometheus scrapes sidecar metrics, Jaeger/Zipkin collects traces, Grafana dashboards visualize everything. Zero instrumentation code in the application.'
For service topology visualization: 'Mesh observability tools (Kiali for Istio, Linkerd dashboard) generate a live service topology map: which services talk to which, request rates on each edge, error rates highlighted in red, and latency percentiles. This map is generated automatically from sidecar traffic data — no manual documentation of service dependencies. When a new service is added, it appears on the map. When a service is removed, it disappears. The topology is always current.'
AI generates: each service with its own logging and metrics library, custom trace propagation, and no unified view. Debugging a request: check the gateway logs, then the auth service logs, then the order service logs, then the payment service logs — manually correlating by timestamp. Mesh observability: click the trace, see every service the request touched, with timing for each hop. Minutes of investigation reduced to one click.
Rule 5: When a Service Mesh Is Justified
The rule: 'Adopt a service mesh when: (1) you have 10+ services with inter-service communication (the operational overhead is justified), (2) zero-trust networking is required (compliance, regulated industry), (3) canary deployments are needed (gradual rollout with metric-driven promotion), (4) consistent resilience policies are needed across all services (centralized retries, timeouts, circuit breakers), (5) the team operates Kubernetes (meshes are Kubernetes-native). Do not adopt for: fewer than 5 services, simple architectures, non-Kubernetes deployments, or teams without Kubernetes expertise.'
For lightweight alternatives: 'Under 10 services: application-level retries (opossum circuit breaker), manual TLS certificates (cert-manager on Kubernetes), and direct Prometheus instrumentation. These cover 80% of mesh features with 20% of the operational complexity. Linkerd is lighter than Istio: simpler configuration, lower resource overhead, easier to debug. If you need a mesh, evaluate Linkerd first — it covers the common cases (mTLS, metrics, retries) without Istio complexity (VirtualServices, DestinationRules, EnvoyFilters).'
AI generates: either no service mesh (plain HTTP, no observability, no traffic management) or Istio for a 3-service application (massive operational overhead, 2-3 GB of memory for the control plane). The right choice: match the mesh investment to the architecture complexity. 3 services: application-level resilience. 10 services: Linkerd. 30+ services with complex traffic patterns: Istio. The mesh is infrastructure — invest when the infrastructure complexity justifies it.
Complete Service Mesh Rules Template
Consolidated rules for service mesh patterns.
- Sidecar proxy (Envoy): handles TLS, retries, circuit breaking, metrics — zero app code
- mTLS everywhere: both sides verify identity, all traffic encrypted, no lateral movement
- Auto-certificate management: mesh CA issues and rotates short-lived certs automatically
- Traffic splitting: canary 5% → 25% → 50% → 100% with metric-driven promotion
- Automated rollout: Flagger/Argo Rollouts monitor metrics, auto-rollback on degradation
- Integrated observability: RED metrics, distributed traces, access logs — zero instrumentation code
- Service topology map: auto-generated from sidecar traffic, always current
- Justified at 10+ services: Linkerd for simplicity, Istio for complex traffic patterns