AI Governance for SaaS Platforms

Multi-Tenancy Is the Core SaaS Challenge

SaaS platforms serve multiple customers (tenants) from the same application and infrastructure. The fundamental challenge: every query, every API call, every background job must be scoped to the correct tenant. A missing WHERE tenant_id = ? clause: exposes one customer's data to another. This is the most critical bug class in SaaS — tenant data leakage. AI rule: 'Every database query that touches tenant data must include a tenant_id filter. No exceptions. The AI must never generate a SELECT * FROM orders without a tenant scope.'

Tenant isolation strategies: database-per-tenant (each tenant gets their own database — strongest isolation, highest cost), schema-per-tenant (shared database, separate schemas — good isolation, moderate cost), row-level security (shared tables with tenant_id column and RLS policies — efficient, requires careful implementation). Most SaaS platforms use row-level isolation for cost efficiency. AI rule: 'Detect the tenant isolation strategy from existing code. Match all new queries to that strategy. If row-level: always include tenant_id in WHERE clauses and INSERT statements.'

The tenant context: in a well-designed SaaS application, the tenant ID is established once (from authentication token, subdomain, or API key) and propagated through the entire request lifecycle. The AI should: use the existing tenant context mechanism (middleware, request context, AsyncLocalStorage), not pass tenant_id as a function parameter through every layer.

Subscription Tiers and Feature Gating

SaaS platforms have subscription tiers (free, pro, enterprise) with different feature sets and usage limits. The AI must enforce these boundaries in generated code. AI rule: 'Every feature has a tier requirement. Before enabling a feature: check the tenant's subscription tier. Use the project's existing feature flag or tier check mechanism. Never expose a paid feature without a tier check.'

Feature gating patterns: feature flags (boolean per feature per tier), entitlements (a list of permissions per subscription), usage quotas (numeric limits per tier — API calls, storage, users). The AI should: check the existing pattern. If the project uses feature flags (isFeatureEnabled('advanced-analytics', tenantId)): gate new features the same way. If entitlements (tenant.hasEntitlement('export-csv')): use that pattern.

Upgrade prompts: when a user hits a tier limit, the AI should generate a graceful degradation, not an error. Pattern: check the limit, show the current usage, explain what the limit is, and provide an upgrade path. AI rule: 'Tier limit reached: return a structured response with current usage, limit, and upgrade URL. Never return a raw 403 for a tier limit — that looks like a bug, not a business boundary.'

💡 Graceful Upgrade Prompts Convert Users

When a free-tier user hits the 3-project limit: showing 'HTTP 403 Forbidden' loses them. Showing 'You have reached your 3-project limit on the Free plan. Upgrade to Pro for unlimited projects.' converts them. The AI should generate tier limit responses with: current usage, limit, plan name, and upgrade URL. This is both better UX and better business.

Usage Metering and Billing Integration

Usage-based SaaS (API calls, storage, compute, seats) requires accurate metering. Every metered action must emit a usage event. AI rule: 'Metered features: emit a usage event after the action succeeds (not before, not on failure). Include: tenant_id, feature_id, quantity, timestamp. The billing system aggregates these events into invoices.'

Metering accuracy: usage events must be durable (written to a queue or database, not just in-memory). If the application crashes between the action and the usage event: the customer gets free usage. If the event fires before the action and the action fails: the customer is overcharged. AI rule: 'Usage event ordering: action first, then emit event. Use a transactional outbox pattern if the event must be guaranteed (write to the database in the same transaction as the action).'

Billing integration: most SaaS platforms use Stripe Billing, Chargebee, or similar. The AI should: integrate with the existing billing provider's SDK, not build custom billing logic. AI rule: 'Usage events → billing provider API (Stripe usage records, Chargebee usage events). Do not calculate invoices manually. The billing provider handles proration, tax, currency, and invoice generation.'

⚠️ Usage Events Must Fire After Success

If the usage event fires before the action and the action fails: the customer is charged for something that did not happen. If the action succeeds but the event is lost (app crash, queue failure): the customer gets free usage. The safest pattern: write the usage event to the database in the same transaction as the action (transactional outbox). A background worker then forwards events to the billing provider.

Rate Limiting and Tenant Security

Per-tenant rate limiting: prevents one tenant from consuming all shared resources (noisy neighbor problem). Limits should be: per-tenant (not per-IP, which breaks for enterprise tenants behind NAT), per-tier (enterprise gets higher limits than free), and per-endpoint (write endpoints have lower limits than read). AI rule: 'API rate limiting: per-tenant-per-endpoint. Use the existing rate limiter (Redis-based, middleware). New endpoints: inherit the tier's default rate limit unless specifically configured.'

Tenant data security: beyond query scoping, consider: API keys should be scoped to a single tenant, webhooks should include a signing secret per tenant, file uploads should be stored in tenant-namespaced paths, background jobs should log the tenant context. AI rule: 'Every system that handles tenant data: verify the tenant context. Background jobs: load tenant context at the start. Webhooks: validate the tenant-specific signing secret.'

Tenant offboarding: when a tenant cancels, their data must be handled according to the data retention policy. AI rule: 'Tenant deletion: soft-delete (mark as cancelled, retain data for the retention period) then hard-delete (remove all data after retention expires). Never immediately delete tenant data — they may reactivate. Never leave orphaned tenant data after the retention period.'

ℹ️ Noisy Neighbor = One Tenant Ruins It for All

Without per-tenant rate limiting: a single tenant running a data migration can consume all API capacity, causing timeouts for every other tenant. Per-tenant limits ensure: each tenant gets their fair share of resources. Enterprise tenants get higher limits (they pay more). Burst allowances handle legitimate traffic spikes. The rate limiter should return 429 Too Many Requests with a Retry-After header.

SaaS AI Governance Summary

Summary of AI governance rules for SaaS platform development teams.

Tenant isolation: every query scoped by tenant_id. No cross-tenant data access. Use existing isolation strategy
Subscription enforcement: feature gating by tier. Graceful upgrade prompts, not raw errors
Usage metering: emit events after successful actions. Transactional outbox for guarantees
Billing: integrate with existing provider (Stripe, Chargebee). Do not build custom billing
Rate limiting: per-tenant-per-endpoint. Tier-based limits. Prevent noisy neighbor
Tenant context: established at request start, propagated throughout. Never pass as parameter
Data security: tenant-scoped API keys, namespaced storage, signed webhooks
Offboarding: soft-delete then hard-delete after retention period. No orphaned data