AI Rules for Input Sanitization

AI Trusts Every Input It Receives

AI generates code that: interpolates user input into SQL queries (string concatenation, not parameterized), renders user input as raw HTML (no encoding, no escaping), passes user input to shell commands (child_process with unsanitized args), accepts any string length (no max length, enabling DoS via mega-payloads), and validates nothing (no type checking, no format validation, no allowlists). Each of these is a critical vulnerability ranked in the OWASP Top 10.

Modern input handling is: parameterized (queries use placeholders, never string concatenation), encoded (output is context-aware — HTML-encoded for HTML, URL-encoded for URLs), validated (allowlists for expected values, regex for formats, Zod schemas for structure), length-limited (max length per field, max payload size), and typed (string input parsed and validated as the expected type). AI generates none of these.

These rules cover: parameterized queries for SQL injection prevention, context-aware output encoding for XSS prevention, allowlist validation, input length limits, and command injection prevention.

Rule 1: Parameterized Queries — Never Concatenate

The rule: 'Use parameterized queries for all database operations. Never concatenate user input into query strings. Parameterized: db.query("SELECT * FROM users WHERE email = $1", [email]). The database engine treats $1 as data, never as SQL. String concatenation allows input like ' OR 1=1 -- to return all users. Parameterized queries make SQL injection structurally impossible.'

For ORMs and query builders: 'Drizzle, Prisma, TypeORM, and Sequelize all use parameterized queries internally — but only when you use their APIs correctly. Dangerous: raw query with string concatenation. Safe: db.select().from(users).where(eq(users.email, email)). The ORM method is safe; the raw query escape hatch is not. Avoid raw queries. When unavoidable, always use parameter placeholders ($1, ?, :param).'

AI generates direct interpolation of request parameters into SQL strings. One crafted parameter drops your database. Parameterized queries are the same number of lines, the same readability, and complete immunity to SQL injection. There is zero reason to concatenate.

Always $1, $2 placeholders — never string concatenation into queries
ORM query builders are safe — raw query methods are not (escape hatch danger)
Parameterized queries: structurally impossible to inject SQL
Same number of lines, same readability, complete injection immunity
Validate input type before querying — numeric ID should be parsed as number first

💡 Zero Reason to Concatenate

db.query('SELECT * FROM users WHERE email = $1', [email]) is the same number of lines as string concatenation. Same readability. But complete immunity to SQL injection. The database treats $1 as data, never as executable SQL.

Rule 2: Context-Aware Output Encoding

The rule: 'Encode output based on the context where it appears. HTML context: escape < > & " characters (prevents tag injection). HTML attribute context: escape quotes and encode special chars. JavaScript context: JSON-encode and escape (prevents script injection). URL context: encodeURIComponent (prevents parameter injection). CSS context: escape special chars (prevents expression injection). Each context has different dangerous characters — one encoding does not fit all.'

For framework defaults: 'React JSX escapes by default — {userInput} is safe in JSX. Assigning to innerHTML is raw, unescaped. Server-side templating (EJS, Handlebars): use the escaping syntax (<%= %> in EJS, double curly braces in Handlebars). The raw syntax bypasses encoding — use only for trusted, pre-sanitized content. Know your framework defaults and where they stop protecting you.'

AI generates raw HTML insertion of user input. A crafted img tag with an onerror handler runs JavaScript. React JSX would have escaped this automatically. When working outside React (server-side rendering, email templates, admin tools), explicit encoding is mandatory. The framework default is your first line of defense — know when it applies and when it does not.

Rule 3: Allowlist Validation, Not Blocklist

The rule: 'Validate input against an allowlist of expected values — never a blocklist of bad values. Allowlist for enums: const VALID_ROLES = ["viewer", "editor", "admin"]; if (!VALID_ROLES.includes(role)) reject. Allowlist for formats: email regex validation. Blocklists fail because: attackers find new bypass characters, encoding tricks defeat string matching, and the list of dangerous inputs is infinite. The list of valid inputs is finite.'

For Zod validation: 'Use Zod schemas as structural allowlists: const schema = z.object({ role: z.enum(["viewer", "editor", "admin"]), email: z.string().email(), age: z.number().int().min(0).max(150) }). Zod validates type, format, range, and enum membership in one declaration. schema.parse(input) returns typed data or throws. The parsed output is guaranteed to match the schema — downstream code can trust it without re-checking.'

AI generates blocklist checks like looking for script tags in input — missing uppercase variants, event handlers, encoded variants, and hundreds of other vectors. An allowlist says: this field must be one of [viewer, editor, admin] — three valid values, everything else rejected. Finite, complete, immune to bypass tricks.

Allowlist (finite valid values) not blocklist (infinite bad values)
Zod schemas for structural validation — type, format, range, enum
Validate at system boundary — API entry point, form submission, file upload
Reject early: invalid input never reaches business logic or database
Parsed output is trusted: downstream code works with validated types

⚠️ Blocklists Are Infinite

Blocking script tags misses uppercase variants, event handlers, encoded variants, and hundreds of other injection vectors. An allowlist says: this field must be one of [viewer, editor, admin]. Three valid values, everything else rejected. Finite, complete, bypass-proof.

Rule 4: Input Length and Size Limits

The rule: 'Set maximum lengths for every input field: name (100 chars), email (254 chars — RFC 5321), comment (5000 chars), bio (500 chars). Set maximum request body size at the middleware level (1MB default, higher for file uploads). Without limits: a 10MB name field fills your database, a 100MB JSON payload exhausts server memory, and a billion-character regex input causes catastrophic backtracking (ReDoS).'

For defense in depth: 'Length limits at three layers: (1) Client-side: maxLength attribute on inputs — UX hint, not security. (2) API middleware: express.json({ limit: "1mb" }) — rejects oversized payloads before parsing. (3) Validation schema: z.string().max(100) — rejects oversized fields after parsing. All three are needed — client-side is bypassable, middleware catches payloads, schema catches individual fields.'

AI generates: text input fields with no maxLength, API endpoints with no body size limit, and database columns with no length constraint. A malicious user sends 1GB of data in a name field. The server tries to parse it, store it, and render it — crashing at some point. Three constraints (middleware, schema, database) prevent this entirely.

Rule 5: Command Injection Prevention

The rule: 'Never pass user input to shell commands. If you must run a system command with user-provided arguments, use: (1) execFile (from child_process) instead of the shell-invoking alternative — execFile does not invoke a shell, arguments are passed as an array, (2) allowlist validation for input characters, (3) no shell expansion or template literals in command strings. With the shell-invoking method, a filename containing "; rm -rf /" deletes the filesystem. With execFile, the entire filename is treated as one argument — no shell interpretation.'

For common scenarios: 'File processing (image resize, PDF generation): use libraries (sharp, puppeteer) instead of shell commands. If a shell command is unavoidable: use execFile with an argument array, validate the input against an allowlist of permitted characters (alphanumeric + limited punctuation), and run in a sandboxed environment (Docker container with limited filesystem access). The safest shell command is the one you do not run.'

AI generates shell commands with user-provided filenames via string concatenation. One semicolon in the filename and the attacker runs arbitrary commands on your server. execFile with an argument array treats the filename as one argument — shell metacharacters are not interpreted. Same result, zero injection risk.

ℹ️ Use Libraries, Not Commands

sharp for image processing, puppeteer for PDF generation, node-ffmpeg for video. Libraries call native code directly without a shell. The safest shell command is the one you replace with a library call.

Complete Input Sanitization Rules Template

Consolidated rules for input sanitization.

Parameterized queries: $1 placeholders — never string concatenation into SQL
Context-aware output encoding: HTML, attribute, JS, URL, CSS — each context different
Allowlist validation: finite valid values — never blocklist of bad values
Zod schemas at API boundary: type + format + range + enum in one declaration
Length limits at three layers: client maxLength, middleware body limit, schema max
execFile over shell-invoking alternatives: argument array, no shell interpretation
Libraries over shell commands: sharp not imagemagick CLI, puppeteer not wkhtmltopdf
Validate early, reject early: invalid input never reaches business logic