How to Debug Bad AI Output Systematically

The AI Is Following Instructions — Just the Wrong Ones

When the AI generates wrong code: the instinct is to blame the AI ('the AI is not smart enough'). The reality: the AI is following the instructions it has — the rules file, the context in the conversation, and its training data. Bad output usually means: a rule is missing (the convention is not encoded), a rule is vague (the AI interprets it differently than intended), rules conflict (two rules give contradictory guidance), or the prompt is ambiguous (the developer's request was unclear). Debugging AI output: is debugging the instructions, not the AI.

The debugging flowchart: (1) Is the output wrong in convention or in logic? Convention wrong: rule issue. Logic wrong: prompt issue. (2) Does a rule exist for this convention? No: add the rule. Yes: go to 3. (3) Is the rule specific enough? No: make it more specific. Yes: go to 4. (4) Does the rule conflict with another rule? Yes: resolve the conflict. No: go to 5. (5) Is the context window too large (rule file very long)? Yes: prioritize or restructure rules. No: the issue may be a genuine AI limitation — work around with a more explicit prompt.

The 80/20 of debugging: 80% of bad AI output is caused by missing or vague rules. Fixing the rule: fixes the issue permanently for everyone. The remaining 20%: prompt-specific issues that require rephrasing the request. AI rule: 'Always fix the rule before working around with a prompt hack. A rule fix: permanent. A prompt hack: single-use.'

Cause 1: Missing Rule (Most Common)

Symptom: the AI generates code that works but does not follow the team's convention. Example: the team uses a custom Result type for error handling, but the AI generates try-catch blocks. Diagnosis: search the rules file for the convention. If not found: the rule is missing. The AI does not know about the convention and falls back to generic patterns.

Fix: add the missing rule with: what to do (use the Result type from @/lib/result), why (consistent error handling across all services), when (all business logic functions — try-catch only for framework boundaries), and an example (showing the correct pattern). Verify: re-prompt the AI with the same request. The output should now use the Result type.

Prevention: every time you encounter a missing rule, ask: is this a one-off or will the AI get this wrong every time? If it is recurring: add the rule immediately. Track missing rules in a list and add the top 3 at each quarterly review. AI rule: 'A missing rule is the most common cause of bad AI output and the easiest to fix. One line in the rules file: fixes the issue for every developer on every future prompt.'

💡 One Line in the Rules File Fixes It Forever

The AI generates try-catch instead of your Result pattern. You could: fix it manually every time (minutes per occurrence, forever). Or: add one rule ('Error handling: use Result<T> from @/lib/result. Never throw in business logic.'). Time to add: 30 seconds. Effect: every future AI generation uses the Result pattern. Every developer benefits. The fix is permanent. Manual correction is temporary. Always fix the rule.

Cause 2: Vague Rule

Symptom: the AI generates code that partially follows the convention but gets details wrong. Example: the rule says 'handle errors properly.' The AI generates error handling — but not the team's specific pattern. Diagnosis: the rule exists but is too vague for the AI to interpret consistently. Different prompts produce different error handling approaches.

Fix: make the rule more specific. Before: 'Handle errors properly.' After: 'Error handling: wrap business logic in try-catch. Catch specific error types (ValidationError, NotFoundError, DatabaseError). Return structured error response: { success: false, error: { code: string, message: string, details?: unknown } }. Log the full error server-side. Never expose stack traces to the client.' The specific version: leaves no room for interpretation.

The specificity test: if two developers read the rule and write different code: the rule is too vague. If both write essentially the same code: the rule is specific enough. Test with a colleague: show them the rule, ask what code they would write. If their answer matches your expectation: the rule passes. AI rule: 'The colleague test: the simplest way to validate rule specificity. If a human interprets the rule differently: the AI will too.'

⚠️ If a Colleague Interprets the Rule Differently: It Is Too Vague

The rule says: 'Handle errors properly.' You think: Result pattern with structured error responses. Your colleague thinks: try-catch with console.error. The AI: picks one approach randomly each time. The rule is too vague for humans AND for AI. The colleague test: show the rule to a teammate. Ask: 'What code would you write?' If their answer does not match your expectation: the rule needs more specificity. This takes 60 seconds and catches 90% of vague rules.

Cause 3: Conflicting Rules and Context Issues

Symptom: the AI generates code that follows one convention but violates another. Example: Rule A says 'use functional components.' Rule B (in a NestJS section) says 'use class-based controllers with decorators.' The AI encounters a NestJS controller and is confused: functional or class? Diagnosis: two rules give contradictory guidance for the same context.

Fix: add scope to conflicting rules. Before: 'Use functional components.' After: 'React components: functional only (no class components). NestJS: class-based controllers with decorators (framework requirement).' The scoped rules: apply to their respective contexts without conflict. AI rule: 'When adding a new rule: check if it conflicts with any existing rule. Search for contradictory terms (class vs function, sync vs async, throw vs return error). Resolve conflicts by adding context scope.'

Context window issues: very long rule files (100+ rules, 5,000+ words) may exceed the AI's effective context. Rules at the end of a long file: may receive less attention than rules at the beginning. Fix: put the most important rules first (security, error handling, architecture), group related rules under clear headings, and keep the total file under 3,000 words if possible. If more rules are needed: split into base rules (in the rule file) and reference documentation (linked but not in the main file). AI rule: 'If the rule file is very long and rules near the end are not followed: restructure. Most important rules first. Consider splitting into a concise rule file and a detailed reference document.'

ℹ️ Most Important Rules Go First in the File

AI tools process rule files sequentially. Rules at the top of the file: receive the most attention. Rules at the bottom of a very long file: may receive less emphasis. Structure your CLAUDE.md with: security rules first (most critical), architecture patterns second (most impactful), coding conventions third (most frequent), and nice-to-have preferences last. If the AI is ignoring a rule: try moving it higher in the file before rewriting it.

Debugging Summary

Quick reference for diagnosing and fixing bad AI output.

Step 1: convention wrong or logic wrong? Convention → rule issue. Logic → prompt issue
Missing rule (80% of cases): convention not in rules file. Fix: add the rule. One-line fix, permanent effect
Vague rule: rule exists but AI interprets it differently each time. Fix: add specifics and examples
Specificity test: show the rule to a colleague. If they write different code: too vague
Conflicting rules: two rules give contradictory guidance. Fix: add context scope to each rule
Context window: very long rule files lose effectiveness at the end. Fix: prioritize, restructure, or split
Prompt issues (20%): the request was ambiguous. Fix: rephrase with more specific intent
Always fix the rule, not the prompt. Rule fix: permanent for all developers. Prompt fix: single-use