AI Rules for Feature Flags

AI Hardcodes Feature Checks — No Flag System

AI generates feature toggles as hardcoded if-statements: if (process.env.ENABLE_NEW_CHECKOUT === "true") { ... }. These are scattered across the codebase, not centrally managed, not gradually rollable, and never cleaned up. After six months, the codebase has 50 if-statements checking environment variables — nobody knows which are still needed, which are fully rolled out, and which are dead code.

Feature flags are a deployment strategy: deploy code to production behind a flag, enable for internal users, gradually roll out to percentages of users, monitor for issues, and either fully enable or remove. AI treats them as environment variable checks — missing the entire lifecycle: creation, rollout, monitoring, and cleanup.

These rules cover: flag services (LaunchDarkly, Unleash, Flagsmith, or custom), flag lifecycle, gradual rollouts, kill switches for incidents, and the critical practice of cleaning up stale flags.

Rule 1: Centralized Flag Service, Not Environment Variables

The rule: 'Use a feature flag service for all flags: LaunchDarkly, Unleash, Flagsmith, PostHog, or a custom service. Never use environment variables as feature flags — they require a redeploy to change, cannot target specific users, and cannot do gradual rollouts. Flag services provide: instant toggle (no deploy), user targeting (enable for specific users), percentage rollouts (10% → 50% → 100%), and analytics (which users see which variant).'

For the SDK: 'Initialize the flag client at app startup: const flagClient = new FlagClient({ apiKey: env.FLAG_API_KEY }). Check flags in code: if (await flagClient.isEnabled("new-checkout", { userId })) { ... }. The flag client handles: caching (does not call the API per check), streaming updates (flags change in real-time without restart), and default values (flag evaluates to false if the service is unreachable).'

AI generates process.env.FEATURE_NEW_CHECKOUT — changing it requires: update the env var, redeploy, wait for rollout, hope nothing breaks. A flag service change: toggle in dashboard, instant propagation, no deploy. If something breaks: toggle off in 1 second vs. rollback deploy in 5 minutes.

Flag service: LaunchDarkly, Unleash, Flagsmith, PostHog — not env vars
Instant toggle — no deploy needed — changes propagate in seconds
User targeting: enable for userId, orgId, email domain, percentage
Caching + streaming: client handles caching, updates in real-time
Default values: flag evaluates to false if service is unreachable — safe degradation

💡 1 Second vs 5 Minutes

Flag service toggle: 1 second, no deploy, instant propagation. Environment variable change: update env, redeploy, wait for rollout, hope nothing breaks. During an incident, that 5-minute difference is the entire impact window.

Rule 2: Typed Flag Definitions and Defaults

The rule: 'Define all flags in a central registry: const FLAGS = { NEW_CHECKOUT: { key: "new-checkout", type: "boolean", default: false, description: "New checkout flow with Stripe Elements" }, PRICE_ALGORITHM: { key: "price-algorithm", type: "string", default: "v1", values: ["v1", "v2", "v3"], description: "Pricing algorithm version" } } as const. Use the registry for all flag checks — never hardcode flag keys as strings.'

For type safety: 'Create a typed wrapper: function isEnabled(flag: keyof typeof FLAGS, context: FlagContext): boolean. This ensures: only defined flags are checked (typos are compile errors), every flag has a default value (new environments work without configuration), and the flag purpose is documented (description field).'

AI scatters magic strings throughout the codebase: if (flags.isEnabled("new-chekout")) — a typo that silently evaluates to false. A typed registry catches the typo at compile time. It also serves as the flag inventory: every active flag is documented in one place.

Rule 3: Gradual Rollout with Monitoring

The rule: 'Roll out features gradually: internal users (dogfooding) → beta users (opted in) → 10% of all users → 25% → 50% → 100%. At each stage, monitor: error rate (does the new code break?), performance (is it slower?), business metrics (does conversion change?), and user feedback. If any metric degrades, pause the rollout. If it degrades significantly, toggle off (kill switch).'

For consistent assignment: 'Users must see the same variant consistently — not randomly different on each page load. Use the user ID as the hash input: hash(userId + flagKey) % 100 < rolloutPercentage. This ensures: user A always sees the new checkout (or always the old one), enabling meaningful A/B comparison and preventing confusing UX where the feature appears and disappears.'

AI deploys features as all-or-nothing — either everyone gets it or nobody does. Gradual rollout limits blast radius: if the new checkout has a payment bug, 10% of users are affected (pause and fix), not 100% (incident and rollback). The most important deployment safety mechanism after automated tests.

⚠️ 10% Limits Blast Radius

All-or-nothing deployment: if the new checkout has a payment bug, 100% of users are affected. 10% rollout: 10% affected, you catch it, pause, fix. Gradual rollout is the most important deployment safety mechanism after tests.

Rule 4: Kill Switches for Incidents

The rule: 'Every major feature behind a flag is a kill switch — toggle off instantly during an incident without deploying. Identify which flags are kill-switch-capable: features that can be safely disabled without data loss or user confusion. Test the kill switch before the rollout: enable the feature, disable it, verify the old behavior works correctly. A kill switch that has never been tested is not a kill switch.'

For incident response: 'Step 1: detect the issue (monitoring alerts). Step 2: toggle off the flag (1 second — no deploy). Step 3: confirm the issue is resolved. Step 4: investigate root cause. Step 5: fix, test, and re-enable gradually. The flag buys you time: the incident is mitigated in seconds, the fix can take hours without user impact.'

For operational flags: 'Some flags are permanent operational controls, not feature rollouts: maintenance mode (disable non-critical features during maintenance), circuit breakers (disable a flaky integration automatically), and rate limit overrides (increase limits for specific customers). These are never cleaned up — they are infrastructure.'

Every major feature = a kill switch — toggle off in 1 second
Test the kill switch before rollout — enable → disable → verify old behavior
Incident: detect → toggle off → confirm fixed → investigate → fix → re-enable
Kill switch buys time: mitigate in seconds, fix in hours — zero user impact
Permanent operational flags: maintenance mode, circuit breakers — never cleaned up

Rule 5: Clean Up Stale Flags

The rule: 'When a flag is fully rolled out (100% enabled for >2 weeks with no issues): remove the flag check from code, remove the flag from the registry, and delete from the flag service. Create a cleanup task when creating the flag — the flag has a planned end date. A codebase with 200 stale flags has 200 unnecessary if-statements — each one is a code path that can confuse developers and hide bugs.'

For the cleanup process: 'Phase 1: change the flag default to the winning variant (all new environments get the new behavior). Phase 2: remove the flag check from code (the if-branch becomes the only path). Phase 3: remove the flag definition from the registry. Phase 4: delete from the flag service dashboard. Phase 5: remove the losing code path (the else-branch).'

AI creates flags but never cleans them up. After a year, the codebase has if-statements for features that shipped 11 months ago. The flag check serves no purpose — the feature is enabled everywhere. But nobody removes it because nobody knows if it is still needed. A cleanup date at creation prevents this accumulation.

Clean up after 100% rollout for 2+ weeks — no issues = remove the flag
Flag has a planned end date — create cleanup task at creation time
Phase: default to winner → remove check → remove registry → remove service → remove dead code
200 stale flags = 200 unnecessary branches — confusion and hidden bugs
Review flags quarterly — any flag older than 3 months and 100% enabled = stale

ℹ️ Create with Expiry Date

When you create a flag, set its planned cleanup date. A flag at 100% for 3+ months with no issues is stale — the if-statement serves no purpose. Quarterly reviews + cleanup tasks prevent the 200-stale-flag codebase.

Complete Feature Flags Rules Template

Consolidated rules for feature flag management.

Flag service (LaunchDarkly/Unleash/Flagsmith) — never env vars as feature toggles
Typed flag registry: central definitions, defaults, descriptions — no magic strings
Gradual rollout: internal → beta → 10% → 25% → 50% → 100% — monitor at each stage
Consistent assignment: hash(userId + flagKey) — same user, same variant, always
Kill switches: toggle off in 1 second — test before rollout — incident mitigation
Permanent operational flags: maintenance mode, circuit breakers — never cleaned up
Cleanup after 100% for 2 weeks: remove check → remove registry → remove dead code
Quarterly flag review: stale flags older than 3 months at 100% = remove