AI Exports Synchronously and Imports Blindly
AI generates data export with: synchronous response (SELECT all rows, serialize to JSON, send in one response — times out at 50,000 rows), in-memory generation (load entire dataset into memory — crashes at 500MB), no progress tracking (user clicks Export, waits, sees nothing, clicks again, waits again), no format options (JSON only, no CSV, no Excel). Import is worse: accept the entire file, parse at once, fail on the first error with no indication of which row failed, and no way to resume.
Modern data export/import is: async (background job generates the file, notifies when ready), streaming (rows written to file as they are read from the database — constant memory usage), progress-tracked (job progress available via API or WebSocket), format-flexible (CSV, JSON, Excel — user chooses), and error-resilient (imports validate per row, report all errors, allow partial success with an error report). AI generates none of these.
These rules cover: async export job pattern, streaming file generation, chunked import with per-row validation, progress tracking, error reporting, and GDPR data portability compliance.
Rule 1: Async Background Export Jobs
The rule: 'Export is a background job, not a synchronous API response. Flow: (1) user clicks Export, (2) API creates a job record and returns 202 Accepted with a job ID, (3) background worker streams data from database to file (S3/R2), (4) job status updated to completed with the download URL, (5) user is notified (email, push, or polling). The user does not wait — they can continue using the app while the export generates.'
For the job record: 'Table: export_jobs { id, userId, status (pending/processing/completed/failed), format (csv/json/xlsx), filters (JSON — what data to export), fileUrl (S3 URL when completed), rowCount, createdAt, completedAt, expiresAt }. The download URL expires after 24 hours (signed S3 URL). The job record is the contract between the user request and the background worker. Polling: GET /api/exports/:jobId returns status and progress.'
AI generates: app.get('/api/export', async (req, res) => { const data = await db.select().from(orders); res.json(data); }) — loads all orders into memory, serializes, sends. At 100,000 orders: 500MB in memory, 30-second response time, gateway timeout. Async job: constant memory (streaming), no timeout (background worker has no HTTP timeout), and the user gets a download link instead of waiting.
- 202 Accepted with job ID — not synchronous response with data
- Background worker streams to S3/R2 — constant memory, no timeout
- Job record: status, format, filters, fileUrl, rowCount, expiresAt
- Signed download URL: expires after 24 hours for security
- Notify on completion: email, push, or client polls job status
Rule 2: Streaming File Generation
The rule: 'Generate export files with streaming — never load the full dataset into memory. Pattern: open a write stream to S3 (using multipart upload), open a database cursor (read rows in batches of 1000), for each batch write to the stream, close the stream when done. Memory usage: constant (one batch in memory at a time), regardless of dataset size. A 10GB export uses the same memory as a 10KB export.'
For CSV streaming: 'Stream CSV rows as they are read: const csvStream = new Transform({ transform(row, enc, cb) { cb(null, formatCSVRow(row)); } }); databaseCursor.pipe(csvStream).pipe(s3UploadStream). For JSON: use JSON streaming (json-stream-stringify) to write array elements one at a time instead of JSON.stringify(entireArray). For Excel: use streaming XLSX libraries (exceljs with streaming workbook writer) that write rows without buffering the entire sheet.'
AI generates: const data = await db.select().from(orders); const csv = data.map(row => formatCSV(row)).join('\n'); — entire dataset in memory twice (once as objects, once as string). At 1 million rows: 2GB in memory, Node.js heap overflow. Streaming: 10MB memory usage regardless of row count. Same output file, 200x less memory.
AI loads the entire dataset into memory: 1 million rows = 2GB, Node.js crashes. Streaming with database cursor + S3 multipart upload: 10MB memory usage regardless of dataset size. A 10GB export uses the same memory as a 10KB export.
Rule 3: Chunked Import with Per-Row Validation
The rule: 'Import files in chunks, validate each row independently, and report all errors — do not fail on the first error. Flow: (1) upload file to S3 (presigned URL for direct upload), (2) create import job, (3) background worker reads the file in chunks (1000 rows), (4) validate each row against a Zod schema, (5) insert valid rows, (6) collect errors with row numbers, (7) generate an error report. Result: 9,500 rows imported successfully, 500 rows failed with specific errors per row.'
For the error report: 'Generate a downloadable error report: { totalRows: 10000, imported: 9500, failed: 500, errors: [{ row: 42, field: "email", value: "not-an-email", error: "Invalid email format" }, { row: 156, field: "amount", value: "abc", error: "Must be a number" }] }. The user downloads the error report, fixes the 500 rows in their spreadsheet, and re-imports only the failed rows. Partial success is always better than complete failure.'
AI generates: parse the entire CSV at once, INSERT all rows in one transaction — one invalid row rolls back the entire import. 9,999 valid rows lost because row 42 has a bad email. With chunked import: 9,999 rows imported, row 42 reported as failed with the specific error. The user fixes one row instead of re-importing 10,000.
- Upload to S3 first (presigned URL) — do not stream upload through your API server
- Validate each row against Zod schema — collect all errors, not fail-fast
- Partial success: import valid rows, report invalid rows with specific errors
- Error report: row number + field + value + error message — downloadable
- Re-import only failed rows — do not force re-import of the entire file
One transaction for all rows: row 42 has a bad email, entire import rolls back. 9,999 valid rows lost. Chunked import with per-row validation: 9,999 rows imported, row 42 reported with specific error. User fixes one row, re-imports only the failures.
Rule 4: Progress Tracking and Cancellation
The rule: 'Track and expose job progress: { status: "processing", progress: { current: 45000, total: 100000, percentage: 45 }, estimatedTimeRemaining: "2 minutes" }. Update progress every 1000 rows (not every row — too many updates). Expose via: polling endpoint (GET /api/exports/:jobId), Server-Sent Events (real-time updates without polling), or WebSocket (bidirectional, allows cancellation). Support cancellation: user can abort a long-running export/import, the worker checks a cancelled flag between chunks.'
For estimated time: 'Track rows per second during processing. Estimated time = (total - current) / rowsPerSecond. Display as: "Processing: 45,000 of 100,000 rows (45%) — about 2 minutes remaining." The estimate improves over time as the average stabilizes. For the first 10%: show "Calculating..." instead of a wildly inaccurate estimate. Progress bars without estimates feel slower than progress bars with estimates — even if the estimate is approximate.'
AI generates: the user clicks Export, sees a spinner, waits 3 minutes with no feedback. Are they at 10% or 90%? Is it stuck? Should they click again? They click again: now two export jobs are running. With progress tracking: the user sees real-time progress, knows the estimate, and can cancel if they started the wrong export. Confidence replaces anxiety.
Rule 5: GDPR Data Portability (Article 20)
The rule: 'GDPR Article 20 gives users the right to receive their personal data in a structured, commonly used, machine-readable format. Implementation: a "Download My Data" button in account settings that triggers an async export of all user data: profile, orders, preferences, activity history, uploaded files. Format: JSON (machine-readable) with a ZIP containing the JSON plus any uploaded files. Generate within 30 days (GDPR deadline), but aim for under 24 hours.'
For the data package: 'Structure the export as: user-data-export.zip containing: profile.json (name, email, address, preferences), orders.json (all orders with items and payments), activity.json (login history, actions taken), uploads/ (any files the user uploaded). Each JSON file includes a schema description header explaining the fields. The export is self-contained — a user can understand their data without documentation. Include a README.txt explaining what each file contains.'
AI generates: no data portability feature. A GDPR request arrives: a developer manually queries the database, exports to CSV, emails it to the user. Two developer-hours per request. At 100 requests per month: a full-time job. Automated export: user clicks a button, receives a ZIP in 24 hours, zero developer involvement. Compliance at scale without operational burden.
Manual GDPR export: developer queries DB, exports CSV, emails user. Two hours per request, 100 requests/month = full-time job. Automated: user clicks Download My Data, receives ZIP in 24 hours, zero developer involvement. Compliance at scale.
Complete Data Export and Import Rules Template
Consolidated rules for data export and import.
- Async export: 202 Accepted + background job + signed download URL — not synchronous response
- Streaming generation: constant memory, database cursor, multipart S3 upload
- Chunked import: 1000-row batches, per-row Zod validation, partial success with error report
- Error report: row number + field + value + error — user fixes and re-imports only failed rows
- Progress tracking: current/total/percentage + estimated time + cancellation support
- GDPR Article 20: Download My Data button, JSON + ZIP, all user data, under 24 hours
- Signed URLs expire after 24 hours — export files are not permanently accessible
- Support CSV, JSON, and XLSX formats — user chooses on export