Best Practices

AI Rules for File Upload Handling

AI stores uploads on the server filesystem and trusts client filenames. Rules for object storage, file validation, signed URLs, virus scanning, and size limits.

7 min read·February 28, 2025

AI saves uploads to the local filesystem with the client filename — four vulnerabilities in one line

Object storage, safe filenames, MIME validation, size limits, signed URLs, and virus scanning

How AI Handles File Uploads (Every Pattern Is Insecure)

AI generates file upload code with four consistent vulnerabilities: storing files on the local filesystem (lost on server restart, not shared across instances), trusting the client filename (path traversal — ../../../etc/passwd), no MIME type validation (upload a .exe renamed to .jpg), and no file size limit (upload a 10GB file, crash the server). Each vulnerability is well-known and preventable with a single rule.

The correct pattern: upload to object storage (S3, R2, GCS) not the filesystem, generate a safe filename server-side (UUID — never use the client filename), validate MIME type by reading file headers (not by checking the extension), set a size limit in the upload middleware, and scan for viruses before making the file available.

These rules apply to any file upload scenario: user avatars, document uploads, image galleries, CSV imports, and file attachments. The patterns are the same regardless of the file type.

Rule 1: Object Storage, Not Filesystem

The rule: 'Upload files to object storage (AWS S3, Cloudflare R2, Google Cloud Storage, Vercel Blob) — never the local filesystem. Object storage provides: persistence across deploys, shared access across server instances, CDN distribution, and virtually unlimited capacity. Local filesystem fails when: the server restarts (files lost), you scale to multiple instances (files on one server, not the other), and disk fills up (crashes the server).'

For the upload flow: 'Client → server validates + generates key → server uploads to S3 → server stores the key in the database → client accesses via CDN URL or signed URL. Never let the client upload directly to your server filesystem. For large files, use presigned upload URLs: the server generates a signed S3 URL, the client uploads directly to S3, bypassing your server entirely.'

AI generates fs.writeFile(path.join(__dirname, 'uploads', file.name), file.data) — files saved to the server disk with the client-provided filename. This is: a path traversal vulnerability, a disk space bomb, a data loss on restart, and incompatible with horizontal scaling. Object storage eliminates all four.

  • S3/R2/GCS/Vercel Blob — never local filesystem
  • Presigned URLs for large files — client uploads directly to S3, skips server
  • Store the object key in the database — not the file itself
  • CDN URL for public files — signed URL for private files with expiration
  • Local filesystem: lost on restart, not shared, fills disk — never in production
⚠️ Filesystem = Data Loss

Files on the local filesystem are: lost on restart/redeploy, not shared across instances, and fill the disk until it crashes. Object storage (S3/R2) is persistent, shared, CDN-distributed, and virtually unlimited.

Rule 2: Safe Filenames and MIME Validation

The rule: 'Never use the client-provided filename for storage. Generate a safe filename server-side: const key = `uploads/${crypto.randomUUID()}.${extension}`. Validate the MIME type by reading file headers (magic bytes) — not by checking the file extension. An .exe can be renamed to .jpg — the extension lies, the magic bytes do not. Use file-type library (Node.js) or python-magic (Python) for header-based detection.'

For allowed types: 'Define an allowlist of accepted MIME types: const ALLOWED = ["image/jpeg", "image/png", "image/webp", "application/pdf"]. Reject any file whose detected MIME type is not in the allowlist. Never use a denylist (block .exe, .bat) — attackers find extensions you did not think of. An allowlist only permits what you explicitly support.'

For the original filename: 'Store the original filename as metadata in the database — for display to the user. Use the generated UUID key for storage and retrieval. This decouples the display name from the storage path: the user sees "quarterly-report.pdf", the storage sees "uploads/a1b2c3d4.pdf". Path traversal is impossible because the storage key is a UUID.'

💡 UUID, Not Client Filename

Client filename '../../../etc/passwd' is a path traversal attack. UUID filename 'a1b2c3d4.pdf' is safe by construction. Store the original name as display metadata in the database — never use it for storage.

Rule 3: File Size Limits

The rule: 'Set file size limits at every layer: web server (nginx: client_max_body_size 10m), application middleware (multer: limits: { fileSize: 10 * 1024 * 1024 }), and object storage (S3 bucket policy). Reject oversized files before reading the entire body — stream and abort when the limit is exceeded. Never read the entire upload into memory before checking size — a 10GB upload will OOM your server.'

For streaming: 'Process uploads as streams, not buffers. Multer (Node.js) streams to disk or S3 — never store the entire file in memory. For S3: use multipart upload for files >5MB — S3 handles chunking. For presigned URLs: set Content-Length-Range in the presigned URL policy to enforce size on the client-to-S3 upload.'

For limits by type: 'Avatar images: 5MB max. Document uploads: 25MB max. Video: 500MB max (use presigned URLs — never through your server). CSV imports: 50MB max. Set realistic limits per endpoint — a single global limit is either too generous (allows abuse) or too restrictive (blocks legitimate large uploads).'

  • Limits at every layer: nginx, middleware, S3 — defense in depth
  • Stream uploads — never buffer entire file in memory
  • Multipart upload for >5MB — S3 handles chunking
  • Presigned URLs with Content-Length-Range for client-direct uploads
  • Per-endpoint limits: avatar 5MB, document 25MB, video 500MB

Rule 4: Signed URLs for Private Files

The rule: 'Use signed URLs for private file access: const url = await s3.getSignedUrl(new GetObjectCommand({ Bucket, Key }), { expiresIn: 3600 }). The URL is valid for 1 hour — after that, it returns 403. Never make private files publicly accessible through a guessable URL. Signed URLs provide: time-limited access, no auth header needed (the signature is in the URL), and auditable access (server generates the URL, you know who requested it).'

For public files: 'Set the S3 bucket (or specific prefix) to public read: avatars are public (anyone can view), documents are private (signed URL required). Use CloudFront or CDN for public files — cache at the edge for fast delivery. Use signed URLs for private files — each access goes through your auth check before generating the URL.'

AI generates public URLs for all uploads — including private documents, financial records, and user data. One missing auth check = every uploaded file is publicly accessible to anyone who guesses the URL. Signed URLs enforce auth at the access point, not just the upload point.

Rule 5: Virus Scanning and Post-Upload Processing

The rule: 'Scan uploaded files for malware before making them available to other users. Use ClamAV (open source) or a cloud scanning service (AWS GuardDuty, Cloudflare). Upload flow: receive file → store in quarantine bucket → scan → if clean, move to public bucket → if infected, delete and notify. Never serve unscanned user uploads directly — one infected PDF affects every user who downloads it.'

For image processing: 'Process images after upload: resize to standard dimensions, strip EXIF metadata (contains GPS coordinates, camera info), convert to WebP for web delivery, and generate thumbnails. Use sharp (Node.js), Pillow (Python), or a CDN with image transformation (Cloudflare Images, ImageKit). Never serve user-uploaded images at original size — they can be 20MB+.'

For the quarantine pattern: 'Uploads go to a quarantine bucket/prefix first. A background job (Lambda, queue worker) scans and processes. If clean, move to the serving bucket. If infected, delete and log. The user sees a processing state until the file is cleared. Never skip quarantine for any file type — even images can carry malware.'

  • Scan before serving: ClamAV or cloud scanner — quarantine → scan → serve
  • Strip EXIF from images — GPS coordinates, camera info are PII
  • Resize images: standard dimensions, WebP format, thumbnails
  • Quarantine bucket → process → clean → serving bucket — never direct serving
  • Background processing: scan + resize + convert in a queue worker, not in the request
ℹ️ Quarantine First

Upload → quarantine bucket → virus scan → if clean, move to serving bucket. Never serve user uploads directly — one infected PDF affects every downloader. Scanning takes seconds in a background job. Worth every millisecond.

Complete File Upload Rules Template

Consolidated rules for file upload handling.

  • Object storage (S3/R2/GCS) — never local filesystem — presigned URLs for large files
  • UUID filenames server-side — never client filename — store original as display metadata
  • MIME validation by magic bytes — allowlist of accepted types — never trust extension
  • Size limits at every layer: nginx, middleware, S3 — stream, never buffer in memory
  • Signed URLs for private files — public bucket only for public assets (avatars)
  • Virus scanning: quarantine → scan → serve — never serve unscanned uploads
  • Image processing: resize, strip EXIF, WebP convert — never serve at original size
  • Background processing in queue workers — not in the upload request handler