Best Practices

AI Rules for Distributed Transactions

AI wraps calls to three services in one database transaction and hopes for consistency. Rules for saga patterns, compensating actions, outbox pattern, two-phase commit alternatives, and eventual consistency strategies.

8 min read·March 27, 2025

Payment charged but inventory not reserved — the database transaction cannot roll back a network call

Saga patterns, compensating actions, transactional outbox, 2PC trade-offs, eventual consistency

AI Pretends Distributed Transactions Work Like Local Ones

AI generates distributed transactions with: a single database transaction wrapping calls to multiple services (BEGIN, call payment API, call inventory API, COMMIT — if the payment succeeds but inventory fails, the payment is committed but inventory is not reserved), two-phase commit across services (locks held for the duration of the slowest participant — kills throughput), no compensation logic (if step 3 of 5 fails, steps 1 and 2 are not rolled back), and synchronous chaining (each service call waits for the previous one — total latency = sum of all service latencies).

Modern distributed transaction patterns are: saga-based (each step is a local transaction with a compensating action for rollback), outbox-enabled (local transaction writes to both the business table and an outbox table — a relay publishes the outbox events), eventually consistent (accept that cross-service consistency is achieved over time, not instantly), and compensation-aware (every forward action has a defined reverse action). AI generates none of these.

These rules cover: saga choreography vs orchestration, compensating action design, the transactional outbox pattern, two-phase commit trade-offs, eventual consistency strategies, and conflict resolution.

Rule 1: Saga Patterns Instead of Distributed Transactions

The rule: 'Replace distributed transactions with sagas: a sequence of local transactions where each step has a compensating action. Order saga: (1) Create order (compensate: cancel order), (2) Reserve inventory (compensate: release inventory), (3) Process payment (compensate: refund payment), (4) Confirm order (no compensation needed — final step). If step 3 fails: execute compensations for steps 2 and 1 in reverse order. Each step is a local ACID transaction within one service — no cross-service transaction coordination.'

For choreography vs orchestration: 'Choreography: each service listens for events and decides what to do next. Order service publishes order.created, inventory service hears it and reserves stock, publishes inventory.reserved, payment service hears it and charges the card. No central coordinator — each service knows its role. Orchestration: a saga orchestrator sends commands to each service in sequence, tracks the state, and triggers compensations on failure. Use choreography for simple linear flows (3-4 steps). Use orchestration for complex flows with conditional logic, branches, or many steps.'

AI generates: BEGIN; await paymentService.charge(amount); await inventoryService.reserve(items); COMMIT; — the payment is a network call, not a database operation. If the payment succeeds but the inventory call fails, the COMMIT rolls back the local database changes but cannot un-charge the payment. The payment service already committed its local transaction. Sagas: each service commits locally. On failure: compensating actions reverse previous steps. No cross-service transaction, no locks, no coordination.

  • Each saga step: local ACID transaction + defined compensating action
  • Choreography: event-driven, no central coordinator — simple linear flows
  • Orchestration: central coordinator sends commands, tracks state — complex flows
  • Compensation in reverse order: step 3 fails → compensate step 2 → compensate step 1
  • No cross-service transactions: each service owns its own ACID boundary
💡 No Cross-Service Locks

2PC: locks held across services for the duration of the slowest participant. Sagas: each service commits locally and immediately. No locks, no coordination, no blocking. Failure? Compensating actions reverse completed steps. Same consistency outcome, dramatically better throughput.

Rule 2: Compensating Action Design

The rule: 'Every saga step that has side effects must have a compensating action. The compensation is: semantically the reverse of the original action (charge → refund, reserve → release, create → cancel), idempotent (safe to execute multiple times — compensating a compensation that already ran does nothing), and recorded (logged with the saga state — which compensations were executed, which succeeded, which failed). Not all actions are perfectly reversible: a sent email cannot be un-sent. For irreversible actions: delay execution until the saga is committed (send the email in the final step, not step 2).'

For compensation failures: 'What if the compensation itself fails? Refund payment fails because the payment service is down. Options: (1) Retry the compensation with exponential backoff (most common — the compensation eventually succeeds). (2) Store in a compensation queue (dead letter queue for failed compensations — manual resolution). (3) Alert operations team (human intervention for unresolvable compensations). Compensation failures are rare (the service was healthy enough for the forward action) but must be handled — an uncompensated failed saga is an inconsistent state.'

AI generates: no compensation logic. Step 3 of 5 fails: steps 1 and 2 remain committed. Inventory is reserved for an order that will never be fulfilled. Payment is charged for an order that will never ship. The system is in an inconsistent state that requires manual intervention to fix. Compensating actions: inventory is released, payment is refunded, order is cancelled. The system returns to a consistent state automatically.

Rule 3: Transactional Outbox Pattern

The rule: 'Use the transactional outbox pattern to reliably publish events after a local transaction. Problem: the service commits to its database and then publishes an event to the message queue. If the publish fails (queue is down), the event is lost — other services never learn about the change. Outbox solution: write both the business data and the event to the database in the same transaction. A separate relay process reads the outbox table and publishes events to the queue. The event is guaranteed to be published because it is in the same ACID transaction as the business data.'

For the outbox table: 'Table: outbox { id, aggregate_type, aggregate_id, event_type, payload, created_at, published_at }. In the business transaction: INSERT INTO orders (...) and INSERT INTO outbox (event_type: "order.created", payload: {...}) in the same transaction. Relay process: SELECT * FROM outbox WHERE published_at IS NULL ORDER BY created_at. For each: publish to the message queue, UPDATE outbox SET published_at = now(). If the relay crashes: unpublished events remain in the outbox and are picked up on restart. No events are lost.'

AI generates: await db.insert(orders).values(order); await queue.publish('order.created', order); — two separate operations. The database insert succeeds, the queue publish fails (network error). The order exists in the database but no other service knows about it. With the outbox: both writes are in one transaction. The relay guarantees eventual publication. The only failure mode is: the event is published with a delay (relay lag), never that it is lost entirely.

  • Same transaction: business data + outbox event — both committed or neither
  • Relay process: reads unpublished outbox events, publishes to queue, marks as published
  • Crash recovery: unpublished events remain in outbox, picked up on relay restart
  • At-least-once delivery: the relay may publish duplicates — consumers must be idempotent
  • Debezium CDC: alternative to outbox relay — captures changes from the database WAL
⚠️ Event Lost = Permanent Inconsistency

db.insert(order) then queue.publish('order.created') — if the publish fails, the order exists but nothing else knows. Transactional outbox: both in one transaction. The relay guarantees eventual publication. Events can be delayed but never lost.

Rule 4: Two-Phase Commit Trade-Offs

The rule: 'Two-phase commit (2PC) provides strong consistency across services but at significant cost. Phase 1 (prepare): coordinator asks all participants to prepare (acquire locks, validate). Phase 2 (commit): if all prepared, coordinator tells all to commit. If any failed, coordinator tells all to abort. Trade-offs: locks held during both phases (throughput drops under contention), coordinator is a single point of failure (coordinator crash = participants blocked with locks held), and latency = 2 network round trips + slowest participant. Use 2PC only when: strong consistency is legally required AND the participant set is small (2-3 services) AND throughput is low.'

For when to avoid 2PC: 'Avoid 2PC when: (1) high throughput is needed (locks kill performance), (2) participants are unreliable (one slow participant blocks all), (3) the network is unreliable (coordinator-participant communication failures leave locks hanging), (4) the system must be available during partitions (2PC sacrifices availability for consistency — CAP theorem). For most web applications: sagas with eventual consistency are the right choice. The brief inconsistency window (milliseconds to seconds) is acceptable in exchange for: higher throughput, no locks, no coordinator dependency, and no blocking on slow participants.'

AI generates: either no consistency mechanism (cross-service state drifts silently) or 2PC for everything (performance collapses under load). The pragmatic middle ground: sagas with eventual consistency for 95% of use cases (orders, user actions, content updates), 2PC or database-level transactions for the 5% that require strong consistency (financial transfers between accounts in the same database, inventory count adjustments).

Rule 5: Eventual Consistency Strategies

The rule: 'Design for eventual consistency: accept that cross-service data is consistent within a bounded time window (typically milliseconds to seconds). Strategies: (1) Read-your-writes consistency: after a write, the user sees their own change immediately (local cache or sticky sessions), even if other users see the old data briefly. (2) Monotonic reads: a user never sees older data after seeing newer data (version numbers or timestamps). (3) Conflict resolution: when two services have conflicting data, resolve with: last-writer-wins (simplest, acceptable for most data), merge (combine both changes), or manual resolution (flag for human review).'

For UI patterns: 'Optimistic updates: show the change immediately in the UI, sync in the background. If the background sync fails: revert the UI change with a notification. The user perceives instant response. Behind the scenes: the saga is executing across services. 99% of the time: the saga completes and the optimistic update becomes the real state. 1% of the time: the saga fails, the UI reverts, and the user sees an error. This pattern makes eventual consistency invisible to the user in the common case.'

AI generates: either synchronous consistency (blocks until all services confirm — slow, fragile) or no consistency (services drift, data is contradictory, nobody knows the truth). Eventual consistency with optimistic UI: the user sees instant responses, the system converges to consistency within seconds, and the rare failure case is handled gracefully. The user experience is better than synchronous consistency (faster) and safer than no consistency (convergence is guaranteed).

ℹ️ 99% Instant, 1% Graceful Revert

Optimistic UI: show the change immediately, saga runs in background. 99% of the time: saga completes, optimistic update becomes real. 1%: saga fails, UI reverts with notification. The user perceives instant response. Eventual consistency is invisible in the common case.

Complete Distributed Transaction Rules Template

Consolidated rules for distributed transactions.

  • Sagas over 2PC: local transactions + compensating actions, no cross-service locks
  • Choreography for simple flows (3-4 steps), orchestration for complex branching logic
  • Compensating actions: idempotent, logged, retry on failure, alert on persistent failure
  • Delay irreversible actions: send email in final step, not mid-saga
  • Transactional outbox: business data + event in same transaction, relay publishes
  • 2PC only when legally required + small participant set + low throughput
  • Eventual consistency: read-your-writes, monotonic reads, bounded convergence window
  • Optimistic UI: show change immediately, sync in background, revert on saga failure