AI Rules for Testing: Getting the AI to Write Tests You Actually Want

The AI Testing Problem

Ask an AI to 'write tests for this function' and you'll get one of two extremes. Either the tests are trivially shallow — testing that a function returns its input, that true is truthy, that an empty array has length 0 — or they're painfully over-mocked, replacing every dependency with a mock so the test verifies nothing about real behavior.

Neither extreme is useful. Shallow tests give false confidence. Over-mocked tests break on every refactor without catching real bugs. What you want is meaningful tests that verify behavior through realistic scenarios — and the AI can write these, but only with the right rules.

Rule 1: What to Test (and What Not To)

The most impactful testing rule is defining what deserves a test. Without this rule, the AI either tests everything (including trivial getters and simple pass-through functions) or tests nothing beyond the happy path.

The rule: 'Write tests for business logic, data transformations, and edge cases. Do not test framework behavior, simple property access, or functions that only delegate to a single dependency. Test the behavior, not the implementation — tests should verify what the function does, not how it does it.'

Add specific guidance for your project: 'Test all functions in src/services/ and src/lib/. Do not test React component rendering unless it contains conditional logic. Test API route handlers through HTTP-level integration tests, not unit tests.'

ℹ️ The Two Extremes

AI tests are either trivially shallow (testing that true is true) or painfully over-mocked (replacing every dependency). Rules push the AI toward the useful middle ground: meaningful behavior tests.

Rule 2: Mock Boundaries, Not Internals

AI assistants default to mocking everything because it's the safe choice — mocked tests never fail due to environment issues. But they also never catch integration bugs, which are the bugs that matter most.

The rule: 'Mock external boundaries only: HTTP APIs, third-party services, and time-dependent operations. Never mock your own code (internal functions, services, repositories). For database tests, use a real test database — not mocked queries. Use dependency injection to make boundaries replaceable, not to mock internals.'

This rule prevents the 'all green, nothing works' anti-pattern where tests pass because they're testing mocks, not code.

⚠️ Mock Boundaries Only

Mock external services and APIs. Never mock your own code — it creates tests that pass while the real integration is broken. Use a real test database, not mocked queries.

Rule 3: Test File Organization

How tests are organized affects whether the AI generates tests at all. If your project has a clear pattern, the AI follows it. If there's no pattern, the AI invents one — and it'll be different every time.

The rule: 'Place test files adjacent to source files with a .test.ts (or _test.go, test_.py) suffix. Use describe blocks for the function/class being tested. Use it/test blocks for specific behaviors. Name tests as sentences: "returns user when email exists", not "test1" or "should work".'

💡 Name Tests as Sentences

'returns 404 when user not found' tells you exactly what broke. 'test1' or 'should work' tells you nothing. Sentence-style test names are self-documenting.

Testing Rules Template

Consolidated testing rules for any project. Adapt the framework-specific references to your stack.

Test business logic and edge cases — not trivial wrappers or framework behavior
Mock external boundaries only — never mock your own functions or database queries
Test files adjacent to source with .test.ts suffix — one test file per source file
Describe/it structure with sentence-style names: 'returns 404 when user not found'
Integration tests for API routes — test through HTTP, not by calling handler directly
No snapshot tests for dynamic content — only for stable UI components
Test error paths explicitly — every try/catch should have a test that triggers the catch
Use factories for test data — never hardcode user objects across multiple tests