How to Review AI-Generated Code Effectively

The Review Shift: From Conventions to Logic

Before AI rules: 40-50% of review comments were about conventions (naming, formatting, patterns). Reviewers spent half their time on decisions that should have been made once and encoded in a rule. After AI rules: convention comments drop to near zero. The AI handled them. The reviewer's time: freed to focus on what actually matters — logic correctness, edge case handling, architectural decisions, security implications, and performance characteristics.

The shifted review checklist: (1) Is the logic correct? (Does the code do what the PR description says?) (2) Are edge cases handled? (What happens with null input, empty arrays, concurrent requests, network failures?) (3) Are there security implications? (New endpoints: authenticated? User input: validated? Sensitive data: not logged?) (4) Are there performance concerns? (N+1 queries, unbounded loops, missing indexes, large payloads?) (5) Does the architecture make sense? (Is this the right abstraction level? Does it belong in this module?)

What NOT to review when AI rules are in place: naming conventions (the rules handle this), import ordering (the rules handle this), formatting (Prettier/ESLint handle this), error handling pattern (the rules specify this), and test structure (the rules specify this). If you find yourself commenting on conventions that should be in the rules: add the rule instead of making the comment. AI rule: 'Every convention comment in a review: is a missing rule. Add the rule so the comment never needs to be made again.'

AI-Specific Pitfalls to Watch For

Pitfall 1 — Plausible but wrong logic: AI-generated code looks correct at a glance. Variable names are good. The structure is clean. But: the sorting algorithm sorts in the wrong direction. The date comparison uses > instead of <. The filter condition excludes records it should include. These bugs are harder to catch because the code reads well. Reviewer technique: trace through the logic with a specific example. If the PR adds a filter: mentally apply the filter to 3 test records and verify the expected results.

Pitfall 2 — Hallucinated APIs: the AI generates code calling a function or API that does not exist. It looks like it should exist: the name is reasonable, the parameters make sense. But the function was never defined. Common in: library API calls (the AI uses a method that existed in a previous version or a different library), internal function calls (the AI invents a helper function that it expects to exist but does not), and configuration options (the AI uses a config key that looks valid but is not supported). Reviewer technique: verify that every function call, import, and config reference actually exists.

Pitfall 3 — Subtly incomplete error handling: the AI adds try-catch blocks but does not handle all error types correctly. The happy path works. The error path: catches the error, logs it, but does not propagate it correctly (swallowing errors that should bubble up) or returns a generic error when a specific error would help the client. Reviewer technique: trace the error path. If a database query fails: what does the user see? Is it a helpful error message or a generic 500?

⚠️ AI Code Looks Clean — That Makes Bugs Harder to Spot

Human-written buggy code often has obvious tells: inconsistent indentation, unclear variable names, long functions. You instinctively review more carefully. AI-generated buggy code: perfectly formatted, clear variable names, well-structured functions. Your brain relaxes. But the sort direction is wrong, or the filter excludes instead of includes, or the date comparison uses greater-than instead of less-than. Review AI code with the same scrutiny you would give code from a confident junior developer — clean surface, potentially flawed logic.

The AI Code Review Checklist

Logic verification: read the PR description. Does the code match the described behavior? Trace through the main flow with a concrete example (specific input → expected output). Trace through the error flow (what happens when each external dependency fails?). Check boundary conditions (first item, last item, empty collection, maximum value).

Completeness check: does the PR include everything it should? Tests that cover the new behavior (not just the happy path). Updated documentation if the change affects public APIs. Database migration if the change affects the schema. Updated types if the change affects the data model. AI rule: 'AI often generates the feature code correctly but forgets: updating related tests, adding the migration, or updating the type exports. Check for completeness beyond the primary files.'

Security scan: new endpoints: are they authenticated and authorized? User input: validated before processing? Database queries: parameterized (not string concatenation)? Sensitive data: not appearing in logs, responses, or error messages? New dependencies: from trusted sources with active maintenance? AI rule: 'Security review applies to every PR that: adds an endpoint, accepts user input, queries a database, or introduces a dependency. With AI rules encoding security patterns: most security issues are prevented. But verify that the AI followed the security rules correctly.'

💡 Trace Logic with Specific Examples, Not by Reading

Reading AI-generated code: 'This looks correct.' Tracing with an example: 'Input: user with role=viewer. Step 1: permission check returns false. Step 2: the function continues anyway because the check result is not used. Bug found.' Tracing forces you to verify each step with concrete values. Reading lets you accept the code's apparent correctness at face value. For every non-trivial function: pick 2-3 input examples and trace through mentally or on paper.

Review Efficiency Tips

Read the PR description first: the description should explain what changed, why, and how to test. If the description is missing or unclear: ask for it before reviewing the code. Reviewing code without context: wastes time guessing the intent. AI rule: 'AI-generated PR descriptions should include: what changed (summary), why (motivation or ticket), and how to test (verification steps). Review the description before the code.'

Review in order of risk: start with the highest-risk changes (new endpoints, database migrations, authentication changes). Then review business logic. Then review UI changes. Then review tests and documentation. If you run out of time: the high-risk areas were reviewed first. AI rule: 'Risk-ordered review: ensures the most important code is always reviewed, even under time pressure.'

Use the AI as a review assistant: ask the AI to explain complex code sections. Prompt: 'Explain what this function does and identify any potential issues.' The AI can: summarize the logic (saving you the mental effort of tracing through it), flag potential issues (it may notice things you would miss on a quick scan), and verify that the code matches the rules (it can check its own output against the rules). AI rule: 'The AI that generated the code can also help review it. Use it as a second pair of eyes, not a replacement for human judgment.'

ℹ️ Every Convention Comment = A Missing Rule

You write a review comment: 'Please use our logger instead of console.log.' This is a convention, not a logic issue. The AI should have generated the correct logger. Why did it not? Because the rule is missing or vague. Instead of making the comment: add the rule ('Logging: use logger from @/lib/logger. Never console.log in production code.'). Now: the comment never needs to be made again. For any developer. On any PR. The 30 seconds to add the rule: saves minutes of review comments forever.

AI Code Review Summary

Summary of effective AI-generated code review practices.

Shifted focus: with rules, skip convention comments. Focus on logic, edge cases, security, performance
Plausible but wrong: trace logic with specific examples. AI code looks clean but may be subtly incorrect
Hallucinated APIs: verify every function call, import, and config reference actually exists
Incomplete error handling: trace the error path. What does the user see when the database fails?
Completeness: check for tests, migrations, type updates, documentation beyond the main feature code
Security: authenticated endpoints, validated input, parameterized queries, no sensitive data in logs
Risk-ordered: review high-risk changes first (endpoints, migrations, auth). Then logic, UI, tests
AI as assistant: ask the AI to explain complex sections and flag potential issues