E2E vs Unit Tests: AI Rules for Each

Different Confidence Levels from Different Tests

Unit tests give confidence that: individual functions work correctly in isolation. formatCurrency(49.99) returns "$49.99". calculateDiscount(100, 0.2) returns 80. validateEmail('test@example.com') returns true. This confidence is: narrow (one function, specific inputs) and fast (milliseconds per test). Unit tests do NOT give confidence that: the login flow works end-to-end, the checkout process charges the correct amount, or the dashboard loads after authentication. These are: user journeys that span multiple components, API calls, and state changes.

E2E tests give confidence that: the application works as a user would experience it. The user: opens the login page, enters credentials, clicks submit, sees the dashboard, navigates to settings, changes their email, and sees the confirmation. This confidence is: broad (the entire stack from browser to database) and slow (seconds per test). E2E tests catch: integration bugs (the API returns data but the frontend does not render it), deployment issues (the environment is misconfigured), and workflow bugs (step 3 depends on step 2 which fails silently).

Without both types of rules: the AI generates only unit tests (they are easier to generate, faster to run, and produce high coverage numbers) while leaving the critical user journeys untested. A project with 500 unit tests and zero E2E tests: has high coverage but no confidence that the login works, the checkout completes, or the onboarding flow succeeds. The AI needs rules for both: unit tests for logic and E2E tests for critical paths.

⚠️ High Coverage, Zero Journey Confidence

500 unit tests, zero E2E: coverage report says 90%. But does login work? Does checkout complete? Does onboarding succeed? Unknown. Unit tests verify: individual functions. E2E tests verify: what the user actually experiences. Without E2E: the most critical bugs are the ones untested.

Unit Test Rules: Logic in Isolation

Unit test rule: "Generate unit tests for: utility functions (formatCurrency, slugify, calculateTotal), validators (isValidEmail, parseFormInput), transformers (mapUserToDTO, formatApiResponse), hooks with pure logic (useDebounce return value, useMediaQuery result), and state reducers (given state + action = new state). Unit tests: no browser, no API calls, no database, no file system. Test: input → output. Fast (< 5ms per test). Run: on every file save."

What unit tests verify: the smallest units of logic work correctly with various inputs (happy path, edge cases, error cases, boundary conditions). What they cannot verify: whether those units work together correctly (integration), whether the UI renders properly (component tests), or whether the full user journey succeeds (E2E). Unit tests are: the foundation of the test pyramid. They catch: logic bugs quickly and cheaply. They miss: integration, rendering, and workflow bugs entirely.

The AI unit test generation advantage: the AI reads a function and generates tests for every branch and edge case. A 20-line utility function: the AI generates 8-12 test cases covering null, empty, boundary, error, and happy paths in 30 seconds. A human: may generate 4-5 test cases in 5 minutes. AI-generated unit tests: are more comprehensive than human-written ones for pure functions. The AI rule ensures: every utility gets thorough unit tests. The coverage is: genuinely valuable for logic correctness.

Unit for: utilities, validators, transformers, pure hooks, reducers — input → output
No browser, no API, no database: pure function testing only
AI advantage: 8-12 test cases per function in 30 seconds (humans: 4-5 in 5 minutes)
Catches: logic bugs (wrong calculation, missed edge case, incorrect condition)
Cannot catch: integration bugs, rendering issues, workflow failures

E2E Test Rules: Critical User Journeys

E2E test rule: "Generate E2E tests for critical user journeys: authentication (register, login, logout, password reset), core workflows (create resource, edit, delete, list with pagination), checkout/payment (add to cart, enter payment, confirm order), onboarding (first-time user setup, tutorial completion), and data-critical operations (export, import, backup, restore). Use Playwright. Test the full stack: browser → frontend → API → database. E2E tests: in e2e/ or tests/e2e/ directory, separate from unit tests."

E2E test pattern rule: "Playwright pattern: test('user can login and see dashboard', async ({ page }) => { await page.goto('/login'); await page.fill('[name=email]', 'test@example.com'); await page.fill('[name=password]', 'password123'); await page.click('button[type=submit]'); await expect(page.getByText('Dashboard')).toBeVisible(); }). Test: real browser, real API calls, real database state. Use: test fixtures for user data, API seeding for test state, and cleanup after each test."

Why the AI skips E2E: E2E tests are harder to generate (the AI needs to know: the page URLs, the form field names, the expected text, and the user flow sequence), slower to run (seconds vs milliseconds), and require infrastructure (Playwright, a running app, a seeded database). The AI defaults to: the easier unit test. The E2E rule: explicitly tells the AI to generate E2E tests for critical paths. Without the rule: the AI generates zero E2E tests. With the rule: the AI generates E2E tests for login, checkout, and core workflows.

E2E for: auth flow, checkout, onboarding, core CRUD, data operations — critical paths
Playwright: real browser, real API, real database — tests what the user experiences
Pattern: goto, fill, click, expect visible — follows user actions step by step
AI skips E2E by default: harder to generate, slower to run, needs infrastructure
The rule explicitly requests: E2E for critical paths. Without it: zero E2E tests generated

💡 AI Must Be Explicitly Told to Generate E2E

AI defaults to unit tests (easier, faster, higher coverage numbers). Without the E2E rule: zero E2E tests generated. With: 'Generate E2E tests for login, checkout, and onboarding using Playwright' — the AI creates browser-based tests for critical paths. Explicit prompting needed because AI skips the harder test type.

The Test Pyramid: How Many of Each

The test pyramid rule: "Test distribution: many unit tests (fast, cheap, catch logic bugs) → fewer integration tests (moderate speed, catch component interaction bugs) → few E2E tests (slow, expensive, catch user journey bugs). Ratio guideline: 70% unit, 20% integration, 10% E2E. The pyramid ensures: fast feedback (most tests are fast unit tests) with confidence (the few E2E tests verify that the system works end-to-end). Never invert the pyramid: 10% unit + 90% E2E = slow test suite, expensive to maintain, flaky."

What each layer catches uniquely: unit tests catch: calculation errors, edge case handling, and type mismatches (fast, run on every save). Integration tests catch: API contract violations, database query errors, and component rendering issues (moderate, run on commit). E2E tests catch: broken user flows, missing UI elements, deployment misconfigurations, and cross-system integration failures (slow, run in CI before merge). Each layer: catches bugs the others cannot. Removing any layer: leaves a category of bugs untested.

The AI generation strategy: "When generating tests for a new feature: (1) Unit test every new utility function and validator (AI does this well). (2) Integration test the API route and database query (AI does this with real DB rules). (3) E2E test the primary user journey if it is a critical path (AI does this with Playwright rules and explicit prompting). The AI should: suggest E2E tests for new features that affect critical paths, not just generate unit tests."

Pyramid: 70% unit (fast, many) + 20% integration (moderate, fewer) + 10% E2E (slow, few)
Unit catches: logic bugs. Integration catches: API/DB bugs. E2E catches: workflow bugs
Never invert: 90% E2E = slow, expensive, flaky. 90% unit = no workflow confidence
Each layer unique: removing any layer leaves a bug category entirely untested
AI strategy: unit for utilities, integration for API/DB, E2E for critical user journeys

Identifying Critical E2E Paths

Not every feature needs an E2E test. Critical paths for E2E: authentication (if login breaks, nothing else works — always E2E test), payment/checkout (if checkout breaks, revenue stops — always E2E test), user registration/onboarding (if new users cannot complete onboarding, growth stops — always E2E test), core CRUD for the primary resource (the main thing users do in the app — always E2E test), and data integrity operations (export/import, if data is lost, trust is lost — always E2E test).

Non-critical paths (unit/integration only): settings page (important but not critical — integration test the API, unit test the form validation), admin dashboard (low traffic, admin can work around issues), notification preferences (nice to have, not revenue-impacting), and cosmetic features (dark mode toggle, animation preferences). These paths: are tested by unit and integration tests. E2E for them: adds test maintenance without proportional value.

The AI rule for identifying critical paths: "Generate E2E tests for: any path involving money (checkout, billing, subscription), any path involving authentication (login, register, password reset, 2FA), the primary user action (the one thing most users do most often), and any path involving data integrity (export, import, delete with consequences). Suggest E2E tests to the developer when implementing these paths. Do not wait to be asked — proactively suggest: 'This checkout flow should have an E2E test.'"

Always E2E: auth (login breaks = nothing works), payments (checkout breaks = revenue stops)
Always E2E: onboarding (new users blocked = growth stops), primary CRUD (core usage)
Skip E2E: settings, admin dashboards, cosmetic features — unit/integration sufficient
Money + auth + primary action + data integrity = critical paths that need E2E
AI rule: proactively suggest E2E for critical paths, do not wait to be asked

ℹ️ Money + Auth + Primary Action = Always E2E

If the path involves: money (checkout, billing, subscription), authentication (login, register, 2FA), or the primary user action (the one thing most users do) — it needs an E2E test. Settings page, admin dashboard, cosmetic toggles: unit/integration sufficient. E2E for everything: expensive and flaky. E2E for critical paths only: high value, manageable cost.

E2E vs Unit Test Summary

Summary of E2E vs unit test AI rules.

Unit: logic in isolation (utilities, validators). Fast, many, run on save. Catches: logic bugs
E2E: user journeys (login, checkout, onboarding). Slow, few, run in CI. Catches: workflow bugs
Pyramid: 70% unit + 20% integration + 10% E2E. Never invert (90% E2E = slow, flaky, expensive)
AI default: generates only unit tests (easy). E2E rule: explicitly requests critical path testing
Critical paths: auth, payments, onboarding, primary CRUD, data integrity — always E2E
Non-critical: settings, admin, cosmetic — unit/integration sufficient, E2E is over-testing
AI advantage for unit: 8-12 test cases per function, comprehensive edge cases in 30 seconds
AI must be prompted for E2E: proactively suggest E2E for money, auth, and primary actions