How to Review AI Rule Changes in PRs

Reviewing Rules Is Not Like Reviewing Code

A code PR: changes one feature's behavior. A rule PR: changes how the AI generates code for every developer on every future prompt. The blast radius: much larger. A code bug: affects one feature until fixed. A rule bug: affects every AI-generated line of code until the rule is reverted. This asymmetry: requires a different review approach. Rule reviews: focus on impact, conflicts, and AI behavior — not just the quality of the text.

The reviewer's mindset: when reviewing a rule change, ask: (1) Will this improve AI-generated code? (Not just 'is this a good convention?' but 'will the AI follow this rule and produce better code?') (2) Does this conflict with any existing rule? (Search for contradictions.) (3) What is the blast radius? (How many projects and developers are affected?) (4) Is this reversible? (Can we rollback easily if the rule causes problems?) These questions: specific to rule reviews, not present in code reviews.

The review cost: a rule change PR should take 5-10 minutes to review (read the change, check for conflicts, run a test prompt). This is more time than most code PRs receive for the same number of changed lines. But the impact: justifies the investment. 5 minutes of review: prevents potential hours of debugging incorrect AI behavior across the team.

Step 1: The Rule Change Review Checklist

Check 1 — Problem validation: does the PR description explain what problem the rule change solves? Examples: 'The AI generates try-catch instead of our Result pattern because the rule was too vague.' 'Missing rule: the AI does not know about our custom error classes.' Without a problem statement: the rule change may be a personal preference, not a team need. Ask: 'What behavior does this fix?' AI rule: 'Every rule change PR: must include the problem it solves. No problem statement = no approval.'

Check 2 — Specificity assessment: is the new rule specific enough for the AI to follow? Read the rule and imagine: if you were the AI, would you know exactly what code to generate? If the rule says 'handle errors properly': too vague. If it says 'catch errors with the AppError class and return a structured error response with code, message, and details': specific enough. The specificity test: could two different developers interpret this rule and write the same code? If yes: specific. If no: needs more detail.

Check 3 — Conflict scan: does the new rule contradict any existing rule? Search the full rule file for: opposing keywords (the new rule says 'use classes' — does an existing rule say 'use functions'?), overlapping scope (the new rule covers error handling — does an existing rule also cover error handling differently?), and implicit conflicts (the new rule changes the import pattern — does this affect the file structure rules?). A 2-minute scan: prevents the most common rule quality issue.

💡 One Test Prompt Per Rule Change = 2 Minutes Well Spent

The PR changes the error handling rule. Test prompt: 'Create a function that fetches and validates user data.' Run it once with the new rule. Does the AI generate the expected error handling pattern? If yes: the rule works. If no: the rule needs revision. This 2-minute test: is the most valuable step in the entire review. A rule that reads well but that the AI does not follow: is worthless. A rule that reads awkwardly but that the AI follows perfectly: is valuable.

Step 2: Test Prompt Verification

The most effective rule review step: run a test prompt with the changed rule. Ask the AI to generate code that should be affected by the change. Does the AI follow the new rule? Does the output match the expected pattern? This verification: takes 2-3 minutes and confirms that the rule works as intended — not just that the text reads well. A well-written rule that the AI ignores: is a failed rule regardless of how good the prose is.

The reviewer's test: if the PR adds a new error handling rule, prompt: 'Create a function that fetches a user by ID and handles the case where the user is not found.' Check: does the AI use the new error handling pattern? If the PR updates a naming convention, prompt: 'Create a new service with CRUD operations.' Check: does the AI follow the updated naming. One prompt per rule change: sufficient for most reviews.

When the AI does not follow the rule: the rule needs revision before approval. Common reasons: the rule is too vague (the AI interprets it differently), the rule conflicts with another rule (the AI follows the other one), or the rule uses terminology the AI does not understand (project-specific jargon without definition). The fix: revise the rule in the PR, then re-test. AI rule: 'A rule that the AI does not follow: is not ready for approval. Test, iterate, then approve. Never merge an untested rule change.'

⚠️ Never Merge an Untested Rule Change

The PR looks good. The text is clear. The rationale makes sense. But: was it tested? Did anyone verify the AI actually follows the new rule? Merging untested rules: the equivalent of merging code without running tests. The rule may look correct but produce wrong AI behavior. The cost of testing: 2 minutes. The cost of debugging a bad rule deployed to 50 projects: hours. The rule: test before merge. Always.

Step 3: Blast Radius Assessment

Blast radius: how many developers and projects are affected by this rule change. For a team-level rule change: the team's 5-10 developers. For an organization-level rule change: all 200 developers across 50 projects. The larger the blast radius: the more carefully the change should be reviewed, tested, and communicated. Organization-level changes: should go through the governance board, not just a team PR review.

Communication requirement: after approval, the rule change should be communicated to affected developers. For team-level: a Slack message to the team channel. For organization-level: an announcement in the engineering Slack, the monthly newsletter, or the all-hands. The communication: includes what changed, why, and how it affects AI behavior. Developers who are surprised by a rule change (the AI suddenly generates different code): lose trust in the rules system.

Staged rollout for high-impact changes: if the blast radius is large (50+ projects) and the change is significant (new pattern, removed rule), consider: deploy to 2-3 canary projects first, monitor for 2-3 days, then deploy to all projects. The canary: catches issues at small scale before they affect the entire organization. AI rule: 'Blast radius determines review rigor. Team change: team review + Slack message. Org change: governance board + canary rollout + engineering announcement.'

ℹ️ Surprised Developers Lose Trust in Rules

Monday morning: a developer opens their IDE. They prompt the AI: it generates a different error handling pattern than last week. They did not change anything. Nobody told them the rules changed. They think: the AI is broken. Or: the rules cannot be trusted. They start overriding. Trust: eroded. Fix: communicate rule changes before deployment. A Slack message: 'The error handling rule was updated — the AI now uses AppError instead of generic Error. See PR #42 for details.' The developer: understands and trusts the change.

Rule Review Summary

Complete rule change review checklist.

Mindset: rules have larger blast radius than code. 5-10 minutes of review prevents hours of debugging
Check 1: problem validation. What behavior does this fix? No problem = no approval
Check 2: specificity. Would two people interpret the rule the same way? If not: needs detail
Check 3: conflict scan. Search for opposing keywords and overlapping scope. 2 minutes
Test prompt: run one prompt per rule change. Does the AI follow the new rule? If not: revise
Untested rules: never merge. A rule the AI ignores is a failed rule regardless of prose quality
Blast radius: team change (team review). Org change (governance board + canary rollout)
Communication: announce changes before deployment. Surprised developers lose trust in rules