How to Extract Rules from an Existing Codebase

Your Codebase Already Has Rules — They Are Just Implicit

Every established codebase has conventions: naming patterns (functions are camelCase, components are PascalCase), file organization (tests next to source, routes in a specific directory), error handling approach (the team settled on try-catch or Result pattern through practice), and architectural patterns (services call repositories, controllers call services). These conventions: exist implicitly. Developers: learn them by reading code and absorbing patterns over weeks. AI tools: cannot learn implicitly — they need the conventions written explicitly in a rules file.

The extraction process: read the codebase systematically, identify the patterns that are consistent across files, and write each pattern as a rule. The result: a CLAUDE.md that encodes what the codebase already does. New AI-generated code: matches the existing codebase perfectly because the rules were extracted from it. This is the opposite of writing rules from scratch — instead of deciding what the conventions should be, you document what they already are.

When to extract: when adopting AI rules for an existing project (the rules should match the existing code, not impose a new style), when onboarding a new team to maintain a codebase (the extracted rules capture the original team's conventions), and when creating a template from a successful project (the project's conventions become a reusable ruleset). AI rule: 'Extraction is for existing codebases. From-scratch writing is for new projects. Extraction preserves existing consistency. From-scratch writing creates new consistency.'

Step 1: Analyze Code Patterns (30 Minutes)

Read 5 representative files from different parts of the codebase: an API endpoint, a service function, a UI component, a test file, and a utility function. For each file: note the patterns. Naming: how are variables, functions, types, and files named? Error handling: try-catch, Result pattern, or something else? Imports: how are they grouped and ordered? Types: explicit annotations or inferred? Tests: which framework, what naming convention, how are assertions structured?

Look for consistency: if all 5 files use the same error handling pattern: that is a convention (write a rule). If 3 of 5 use one pattern and 2 use another: the majority pattern is the convention (write a rule for the majority). If all 5 use different patterns: there is no convention for this area (decide whether to establish one or leave it flexible). The key: you are documenting existing practice, not inventing new conventions.

AI-assisted extraction: ask the AI to analyze the codebase. Prompt: 'Read these 5 files [paste or reference them]. Identify the coding conventions they share: naming patterns, error handling, import ordering, type annotations, file structure, and testing patterns. List each convention as a rule.' The AI: identifies patterns across the files and outputs a draft rule set. Review the AI's output: some conventions it identifies are real (keep). Others: coincidental patterns that are not intentional conventions (remove). AI rule: 'AI-assisted extraction: the fastest approach. The AI identifies patterns in minutes that would take a human 30+ minutes. Human review: validates which patterns are intentional conventions.'

💡 Ask the AI to Identify Patterns Across Your Files

Prompt: 'Read these 5 files from our codebase. Identify the shared coding conventions: naming, error handling, import ordering, and test patterns. List each as a rule.' The AI: analyzes 5 files in seconds and outputs 15-20 draft rules. You review: 12 are real conventions (keep). 3 are coincidental patterns (remove). Total time: 5 minutes. Manual extraction of the same 12 conventions: 30+ minutes of reading and comparing files. The AI: 6x faster for the initial extraction. You: validate and refine.

Step 2: Analyze File Structure and Commit History (15 Minutes)

File structure analysis: the directory layout is a convention. Where do tests live (co-located or in a separate directory)? How are routes organized (by feature or by HTTP method)? Where are types defined (per-file, per-module, or in a central types directory)? Where are utilities (src/utils/ or src/lib/)? Each structural decision: becomes a rule. 'Test files: co-located with source (user-service.test.ts next to user-service.ts).' 'Routes: organized by feature (src/routes/users/, src/routes/orders/).'

Commit history conventions: git log --oneline -50 reveals the commit message format. Does the team use conventional commits (feat:, fix:, refactor:)? Are commit messages descriptive or terse? Is there a ticket number convention (PROJ-123)? The commit format: becomes a rule. 'Commit messages: conventional format (feat, fix, refactor, test, docs, chore). Include ticket number: feat(auth): add MFA support [AUTH-456].'

Configuration file analysis: the project's ESLint, Prettier, TypeScript, and bundler configurations encode conventions. ESLint: which rules are enabled (these are the team's code quality conventions). TypeScript: is strict mode on (the team values type safety)? Prettier: what formatting preferences are set? These configs: already documented in machine-readable format. Translate the most important ones into AI rules. AI rule: 'Config files are machine-readable conventions. Translate the important ones into AI rules so the AI follows the same conventions the linter enforces.'

⚠️ Not Every Consistent Pattern Is an Intentional Convention

The AI identifies: 'All functions in the codebase use 2 parameters or fewer.' This looks like a convention. But: it is accidental — the functions happen to be simple. It is not a rule the team decided on. If you encode it: the AI refuses to generate functions with 3+ parameters, breaking legitimate use cases. The team validation step: catches these accidental patterns. For each extracted rule: ask 'Did we intentionally decide this, or did it just happen to be consistent?' Intentional: keep. Accidental: remove.

Step 3: Compile and Validate the Extracted Rules

Compile: organize the extracted conventions into CLAUDE.md format. Categories: project context (tech stack, architecture), naming conventions, error handling, file structure, testing, imports, and any other patterns identified. Each convention: written as a clear rule with the what-why-when format. The why: 'because this is the existing codebase convention' (sufficient for extracted rules — the codebase itself is the justification).

Validate with a comparison test: generate code with the new rules. Compare the AI output against existing code in the codebase. Do they look the same? Same naming? Same error handling? Same file structure? Same test style? If yes: the extraction was successful — the rules accurately describe the codebase. If no: identify which convention was missed or incorrectly extracted. Refine and re-test. AI rule: 'The validation test: generate new code and compare it to existing code. If they match: the rules correctly describe the codebase. If they differ: a convention was missed.'

Team validation: share the extracted rules with the team. Ask: 'Do these rules accurately describe how we code? Is anything missing? Is anything incorrect?' The team: provides the final validation. They may identify: conventions the extraction missed (patterns that are consistent but not obvious from reading 5 files), conventions that are accidental (a pattern that happened to be consistent but was not intentional), and conventions that are outdated (the codebase follows a pattern from 2 years ago that the team has since moved away from). AI rule: 'Team validation: the final step. The team confirms the extracted rules match their actual practice. Without team validation: the rules may encode accidental patterns as intentional conventions.'

ℹ️ The Comparison Test: Does AI Output Match Existing Code?

Extract rules from the codebase. Generate a new API endpoint using the rules. Compare: does the generated endpoint look like the existing endpoints in the codebase? Same naming? Same error handling? Same response format? If yes: the extraction captured the conventions accurately. If no: something was missed. The comparison: the most practical validation. You are not checking if the rules are theoretically correct — you are checking if they produce code that matches what already exists.

Codebase Extraction Summary

Summary of extracting AI rules from an existing codebase.

Concept: document existing conventions, do not invent new ones. The codebase is the source of truth
Pattern analysis: read 5 representative files. Note naming, error handling, imports, types, tests
Consistency check: all 5 files use the same pattern = convention. Mixed = majority pattern is the convention
AI-assisted: ask the AI to identify patterns across files. Human review validates which are intentional
File structure: directory layout, test location, route organization. Each decision becomes a rule
Commit history: commit message format from git log. Config files: ESLint/TypeScript/Prettier conventions
Validation: generate code with extracted rules. Compare to existing code. Do they match?
Team validation: share with team. Confirm accuracy. Identify missed, accidental, or outdated patterns