AI Generates PDFs That Look Like 1998
AI generates PDFs with: raw HTML string concatenation (template literals with no escaping, no styling, no structure), no CSS (the PDF looks like an unstyled HTML page printed from a browser), no page break control (tables split mid-row, headings appear at the bottom of a page with no content following), no headers or footers (no page numbers, no document title, no company logo), and synchronous generation blocking the request (user waits 10 seconds while a 50-page invoice renders). The resulting PDF is unprofessional and unusable for business documents.
Modern PDF generation is: template-driven (Handlebars, React, or JSX templates with proper styling), headless-browser-rendered (Puppeteer or Playwright converts styled HTML to pixel-perfect PDF), page-aware (CSS page break rules control where pages split), header/footer-equipped (repeating headers with logo, footers with page numbers), and async-generated (background job for large documents, signed URL for download). AI generates none of these.
These rules cover: headless browser rendering, React-to-PDF pipelines, template engines for business documents, CSS page break control, header/footer management, and async generation for large documents.
Rule 1: Headless Browser Rendering with Puppeteer
The rule: 'Use Puppeteer or Playwright to render HTML to PDF: const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.setContent(htmlString); const pdf = await page.pdf({ format: "A4", printBackground: true, margin: { top: "20mm", bottom: "20mm", left: "15mm", right: "15mm" } }). Headless browser rendering: supports full CSS (flexbox, grid, custom fonts), matches what you see in the browser, and produces pixel-perfect output.'
For resource management: 'Launch one browser instance and reuse it across requests (browser pool). Do not launch a new browser per PDF — browser launch takes 1-3 seconds, the actual PDF rendering takes 200ms. Pool pattern: const pool = generic.createPool({ create: () => puppeteer.launch(), destroy: (browser) => browser.close() }, { min: 2, max: 10 }). Each request borrows a browser from the pool, generates the PDF, and returns the browser. Pool size limits memory usage.'
AI generates: a library like pdfkit or jspdf with manual coordinate-based layout: doc.text('Invoice', 50, 50); doc.text('Item 1', 50, 100); doc.text('$29.99', 400, 100); — pixel coordinates for every element, no CSS, no responsive layout, and changes require recalculating every coordinate. Headless browser: write HTML + CSS (which you already know), render to PDF. Same skill set as building web pages, pixel-perfect output.
- Puppeteer/Playwright: HTML + CSS to PDF — full CSS support including flexbox and grid
- Browser pool: reuse instances, do not launch per request (1-3s launch vs 200ms render)
- page.pdf() options: format (A4/Letter), margins, printBackground, landscape
- Custom fonts: load via @font-face in the HTML — same as web font loading
- Same skills as web development — no coordinate-based layout, no PDF-specific API
pdfkit: doc.text('Invoice', 50, 50) — manual pixel coordinates for every element. Headless browser: write HTML + CSS (which you already know), call page.pdf(). Same skills as web development, pixel-perfect output, full flexbox/grid support. No coordinate math.
Rule 2: React-to-PDF Pipeline
The rule: 'For complex documents, use React components as PDF templates. Pipeline: (1) render the React component to an HTML string with renderToStaticMarkup, (2) inject the HTML into the Puppeteer page, (3) render to PDF. The React component is: type-safe (props define the document data), composable (InvoiceHeader, InvoiceLineItems, InvoiceFooter), and reusable (same component renders in the browser preview and the PDF). Design the PDF in the browser, generate it server-side.'
For the component pattern: 'function InvoicePDF({ invoice, company }: InvoiceProps) { return (<div className="invoice"><InvoiceHeader company={company} invoiceNumber={invoice.number} date={invoice.date} /><InvoiceLineItems items={invoice.items} /><InvoiceTotals subtotal={invoice.subtotal} tax={invoice.tax} total={invoice.total} /><InvoiceFooter company={company} /></div>); }. The same component: renders in the browser (preview), renders to PDF (download), and renders in email (inline). Three outputs from one component.'
AI generates: const html = '<html><body><h1>Invoice #' + invoice.number + '</h1><table>' + items.map(i => '<tr><td>' + i.name + '</td></tr>').join('') + '</table></body></html>'; — string concatenation with no escaping (XSS in PDF), no type safety, and no reuse. React components: type-safe data binding, composable structure, browser preview, and server-side PDF generation from the same code.
Rule 3: CSS Page Break Control
The rule: 'Use CSS break rules to control page splits: break-before: page (force a page break before this element), break-after: page (force after), break-inside: avoid (prevent splitting this element across pages). Apply to: section headings (break-before: page — each section starts on a new page), tables (break-inside: avoid on table rows — rows do not split across pages), and signature blocks (break-inside: avoid — the signature stays together).'
For table pagination: 'Long tables that span multiple pages: <thead> repeats on each page automatically (the browser handles this). Ensure: break-inside: avoid on <tr> (rows do not split — the entire row moves to the next page), and <tfoot> for totals that appear at the bottom of the last page. For very long tables (100+ rows): consider splitting into multiple smaller tables with sub-totals, each with break-after: page.'
AI generates: a 50-row table that splits a row between pages (half the data on page 3, half on page 4), a heading at the bottom of page 5 with no content following it (the content is on page 6), and a signature block split across two pages. Three CSS properties fix all three: break-inside: avoid on rows, break-after: avoid on headings, break-inside: avoid on signatures. The PDF looks professional instead of accidental.
- break-before: page — force new page before sections
- break-inside: avoid — prevent splitting tables, signatures, and key blocks
- break-after: avoid on headings — heading stays with its following content
- <thead> repeats on each page for multi-page tables — browser handles automatically
- Three CSS properties turn an accidental layout into a professional document
Table rows split across pages, headings orphaned at page bottom, signatures torn in half. break-inside: avoid on rows, break-after: avoid on headings, break-inside: avoid on signatures. Three properties transform accidental layout into professional output.
Rule 5: Async Generation for Large Documents
The rule: 'For documents over 10 pages or taking more than 3 seconds to generate: use the same async job pattern as data export. (1) API returns 202 Accepted with job ID, (2) background worker generates the PDF, (3) uploads to S3 with signed URL, (4) notifies the user. For real-time preview: generate a low-resolution preview synchronously (first page only), then generate the full document asynchronously. The user sees a preview immediately and downloads the full PDF when ready.'
For caching generated PDFs: 'If the source data has not changed, serve the cached PDF. Cache key: document type + data hash. An invoice PDF: cache key = invoice-{invoiceId}-{dataHash}. If the invoice data changes (line item added), the hash changes, the cache misses, and a new PDF is generated. If the same invoice is downloaded 10 times without changes: 1 generation, 9 cache hits. Store cached PDFs in S3 with a TTL matching your data change frequency.'
AI generates: synchronous PDF generation in the API handler. A 50-page report takes 15 seconds to render. The user waits, the request may timeout, and other requests queue behind it. Async generation: the user gets an immediate response, the PDF generates in the background, and a download link arrives via notification. For frequently-accessed documents: the cached PDF serves in milliseconds.
Same invoice downloaded 10 times: without caching, 10 Puppeteer renders (15 seconds each). With cache by data hash: 1 render + 9 instant S3 downloads. Cache key = invoice-{id}-{dataHash}. Data changes invalidate the cache automatically.
Complete PDF Generation Rules Template
Consolidated rules for PDF generation.
- Headless browser (Puppeteer): HTML + CSS to PDF — full CSS support, pixel-perfect output
- Browser pool: reuse instances, do not launch per request (200ms render vs 3s launch)
- React-to-PDF: renderToStaticMarkup → Puppeteer — same component for preview and PDF
- CSS page breaks: break-inside: avoid on rows/blocks, break-before: page on sections
- Header/footer templates: logo + title + page X of Y + date on every page
- Async for large docs: 202 Accepted + background job + signed S3 download URL
- Cache by data hash: same data = same PDF, skip regeneration on repeated downloads
- No string concatenation HTML — use templates with escaping and type-safe data binding