AI-Generated Tests: Coverage vs Quality
AI can generate tests instantly. The problem: AI-generated tests often optimize for coverage (every line is executed) instead of quality (real bugs are caught). Common AI test anti-patterns: empty assertions (the test runs but asserts nothing meaningful), testing implementation instead of behavior (the test breaks when you refactor even though behavior is preserved), and happy-path-only testing (the test covers the success case but misses error handling, edge cases, and boundary conditions).
Good tests answer: if this test fails, what bug did it catch? If the answer is unclear: the test is not useful. AI rules for testing encode: what to test (behavior, not implementation), how to name tests (describe expected behavior), what to assert (specific values, not just 'does not throw'), and what edge cases to cover (empty inputs, null values, boundary conditions, concurrent access).
The goal: AI-generated tests that a senior developer would approve without modification. Not tests that pass CI and inflate the coverage number. Tests that: document the expected behavior, catch regressions when code changes, and verify edge cases that a developer might forget.
Prompting Techniques for Better Tests
Technique 1 — Specify what to test, not how: Bad prompt: 'Write tests for the createUser function.' (The AI writes tests that exercise the function but may not verify meaningful behavior.) Better prompt: 'Write tests for createUser that verify: valid user is created and returned, duplicate email returns a specific error, missing required fields return validation errors, and the password is hashed before storage.' The specific behaviors: guide the AI to test what matters.
Technique 2 — Request edge cases explicitly: The AI defaults to happy-path tests unless asked otherwise. Prompt addition: 'Include edge cases: empty string name, email without @ symbol, password shorter than minimum length, and concurrent creation with the same email.' AI rule: 'When generating tests: always include edge cases. Prompt pattern: describe the happy path, then list 3-5 edge cases to test. The AI generates both.'
Technique 3 — Specify the assertion style: Prompt: 'Assert specific values, not just truthiness. Not: expect(result).toBeTruthy(). Instead: expect(result.id).toBeDefined(), expect(result.email).toBe(input.email), expect(result.createdAt).toBeInstanceOf(Date).' This prevents the most common AI test anti-pattern: tests that pass but assert nothing meaningful. AI rule: 'Tests must assert specific values. No toBeTruthy() on objects. No toEqual({}) for empty responses. Every assertion verifies a specific, meaningful property.'
The AI generates: expect(result).toBeTruthy(). This passes if result is any non-null, non-undefined value. An object with wrong data: truthy. An empty string: falsy (catches this one case). A number 0: falsy (catches this one case). For 95% of results: toBeTruthy passes regardless of correctness. Replace with: expect(result.email).toBe('alice@test.com'), expect(result.role).toBe('admin'). Specific assertions catch specific bugs. Truthy assertions catch almost nothing.
AI Rules for Test Generation
Rule 1 — Test behavior, not implementation: 'Tests verify what the function does (returns correct data, throws correct errors), not how it does it (which internal methods are called, in what order). Tests should not break when the implementation is refactored but the behavior is preserved.' This rule prevents: tests that are tightly coupled to the current implementation and break during refactoring.
Rule 2 — Name tests with expected behavior: 'Test names describe what should happen: it("returns empty array when no users match the filter"). Not: it("test getUsersByFilter"). The test name is the documentation — when it fails, the name explains what broke.' This rule: makes test failures immediately understandable without reading the test code.
Rule 3 — Cover the error path: 'Every test suite includes: happy path (correct input → correct output), error path (invalid input → specific error), edge cases (empty input, null, boundary values), and authorization (unauthorized access → 401/403). The AI generates all four categories by default, not just the happy path.' This rule addresses the most common gap in AI-generated tests: missing error and edge case coverage.
Rule 4 — Use factories for test data: 'Create test data with factories (createTestUser({ role: "admin" })). Never hardcode test data across multiple tests. The factory generates complete, valid data with sensible defaults. Tests override only the fields relevant to the test case.' This rule: prevents hidden dependencies between tests and makes test data management sustainable.
Prompt: 'Write tests for createUser.' AI generates: 3 tests, all happy path (valid input → success). Missing: what happens with invalid email? Missing required fields? Duplicate email? Empty password? Prompt addition: 'Include edge cases: invalid email format, missing name field, duplicate email address, password under 8 characters.' Now the AI generates: 3 happy path tests + 4 edge case tests. The edge case tests: catch the bugs that happy-path tests miss.
Reviewing AI-Generated Tests
The review checklist for AI-generated tests: (1) Does each test assert something meaningful? (Watch for: toBeTruthy, toEqual(expect.anything()), and tests that just verify 'no error thrown'.) (2) Are edge cases covered? (If only the happy path is tested: the AI did the minimum.) (3) Would this test catch a real bug? (Imagine changing the function's behavior — would this test fail?) (4) Is the test independent? (Does it depend on another test running first? Does it share state with other tests?) (5) Is the test deterministic? (Does it use fixed dates, seeded random values, and mocked external dependencies?)
Common AI test issues to fix during review: test that calls the function but does not assert the return value (coverage without verification), test that mocks the thing it is testing (the test passes regardless of the real implementation), test with assertion expect(result).not.toBeNull() when it should be expect(result.name).toBe('Alice') (too weak), and test that depends on database state from a previous test (breaks when tests run in parallel or shuffled order).
The ultimate test quality check: delete the function being tested. Do the tests fail? If yes with clear failure messages: the tests are meaningful. If some tests still pass: those tests are not actually verifying the function (they are testing mocks, not code). AI rule: 'A test that passes when the code it tests is deleted: is not a test. It is an illusion of coverage.'
The ultimate test quality check: comment out or delete the function being tested. Run the tests. If all tests fail with clear messages: they are testing real behavior. If some tests still pass: those tests are testing mocks or testing nothing (assertions that pass regardless of the function's existence). This 30-second check reveals which tests provide real coverage and which are theater. Run it on every AI-generated test suite before approving.
Test Generation Summary
Summary of generating good tests with AI.
- Quality > coverage: tests that catch bugs > tests that increase the coverage number
- Prompt technique: specify behaviors to test, request edge cases, demand specific assertions
- Rule 1: test behavior, not implementation. Tests survive refactoring
- Rule 2: name tests with expected behavior. Failure messages are documentation
- Rule 3: cover happy path + error path + edge cases + authorization. All four by default
- Rule 4: factory-generated test data. No hardcoded values across tests
- Review checklist: meaningful assertions, edge cases, would catch a real bug, independent, deterministic
- Ultimate test: delete the function. Do the tests fail? If not: the tests are illusions