Best Practices

AI Rules for PDF Generation

AI generates PDFs with string concatenation of raw HTML and no page breaks. Rules for headless browser rendering, React-to-PDF pipelines, template engines, page break control, and async generation for large documents.

7 min read·March 10, 2025

Tables split mid-row, headings orphaned at page bottom, no page numbers, 15-second synchronous render

Puppeteer rendering, React-to-PDF, CSS page breaks, headers/footers, async generation with caching

AI Generates PDFs That Look Like 1998

AI generates PDFs with: raw HTML string concatenation (template literals with no escaping, no styling, no structure), no CSS (the PDF looks like an unstyled HTML page printed from a browser), no page break control (tables split mid-row, headings appear at the bottom of a page with no content following), no headers or footers (no page numbers, no document title, no company logo), and synchronous generation blocking the request (user waits 10 seconds while a 50-page invoice renders). The resulting PDF is unprofessional and unusable for business documents.

Modern PDF generation is: template-driven (Handlebars, React, or JSX templates with proper styling), headless-browser-rendered (Puppeteer or Playwright converts styled HTML to pixel-perfect PDF), page-aware (CSS page break rules control where pages split), header/footer-equipped (repeating headers with logo, footers with page numbers), and async-generated (background job for large documents, signed URL for download). AI generates none of these.

These rules cover: headless browser rendering, React-to-PDF pipelines, template engines for business documents, CSS page break control, header/footer management, and async generation for large documents.

Rule 1: Headless Browser Rendering with Puppeteer

The rule: 'Use Puppeteer or Playwright to render HTML to PDF: const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.setContent(htmlString); const pdf = await page.pdf({ format: "A4", printBackground: true, margin: { top: "20mm", bottom: "20mm", left: "15mm", right: "15mm" } }). Headless browser rendering: supports full CSS (flexbox, grid, custom fonts), matches what you see in the browser, and produces pixel-perfect output.'

For resource management: 'Launch one browser instance and reuse it across requests (browser pool). Do not launch a new browser per PDF — browser launch takes 1-3 seconds, the actual PDF rendering takes 200ms. Pool pattern: const pool = generic.createPool({ create: () => puppeteer.launch(), destroy: (browser) => browser.close() }, { min: 2, max: 10 }). Each request borrows a browser from the pool, generates the PDF, and returns the browser. Pool size limits memory usage.'

AI generates: a library like pdfkit or jspdf with manual coordinate-based layout: doc.text('Invoice', 50, 50); doc.text('Item 1', 50, 100); doc.text('$29.99', 400, 100); — pixel coordinates for every element, no CSS, no responsive layout, and changes require recalculating every coordinate. Headless browser: write HTML + CSS (which you already know), render to PDF. Same skill set as building web pages, pixel-perfect output.

  • Puppeteer/Playwright: HTML + CSS to PDF — full CSS support including flexbox and grid
  • Browser pool: reuse instances, do not launch per request (1-3s launch vs 200ms render)
  • page.pdf() options: format (A4/Letter), margins, printBackground, landscape
  • Custom fonts: load via @font-face in the HTML — same as web font loading
  • Same skills as web development — no coordinate-based layout, no PDF-specific API
💡 Write HTML, Get PDF

pdfkit: doc.text('Invoice', 50, 50) — manual pixel coordinates for every element. Headless browser: write HTML + CSS (which you already know), call page.pdf(). Same skills as web development, pixel-perfect output, full flexbox/grid support. No coordinate math.

Rule 2: React-to-PDF Pipeline

The rule: 'For complex documents, use React components as PDF templates. Pipeline: (1) render the React component to an HTML string with renderToStaticMarkup, (2) inject the HTML into the Puppeteer page, (3) render to PDF. The React component is: type-safe (props define the document data), composable (InvoiceHeader, InvoiceLineItems, InvoiceFooter), and reusable (same component renders in the browser preview and the PDF). Design the PDF in the browser, generate it server-side.'

For the component pattern: 'function InvoicePDF({ invoice, company }: InvoiceProps) { return (<div className="invoice"><InvoiceHeader company={company} invoiceNumber={invoice.number} date={invoice.date} /><InvoiceLineItems items={invoice.items} /><InvoiceTotals subtotal={invoice.subtotal} tax={invoice.tax} total={invoice.total} /><InvoiceFooter company={company} /></div>); }. The same component: renders in the browser (preview), renders to PDF (download), and renders in email (inline). Three outputs from one component.'

AI generates: const html = '<html><body><h1>Invoice #' + invoice.number + '</h1><table>' + items.map(i => '<tr><td>' + i.name + '</td></tr>').join('') + '</table></body></html>'; — string concatenation with no escaping (XSS in PDF), no type safety, and no reuse. React components: type-safe data binding, composable structure, browser preview, and server-side PDF generation from the same code.

Rule 3: CSS Page Break Control

The rule: 'Use CSS break rules to control page splits: break-before: page (force a page break before this element), break-after: page (force after), break-inside: avoid (prevent splitting this element across pages). Apply to: section headings (break-before: page — each section starts on a new page), tables (break-inside: avoid on table rows — rows do not split across pages), and signature blocks (break-inside: avoid — the signature stays together).'

For table pagination: 'Long tables that span multiple pages: <thead> repeats on each page automatically (the browser handles this). Ensure: break-inside: avoid on <tr> (rows do not split — the entire row moves to the next page), and <tfoot> for totals that appear at the bottom of the last page. For very long tables (100+ rows): consider splitting into multiple smaller tables with sub-totals, each with break-after: page.'

AI generates: a 50-row table that splits a row between pages (half the data on page 3, half on page 4), a heading at the bottom of page 5 with no content following it (the content is on page 6), and a signature block split across two pages. Three CSS properties fix all three: break-inside: avoid on rows, break-after: avoid on headings, break-inside: avoid on signatures. The PDF looks professional instead of accidental.

  • break-before: page — force new page before sections
  • break-inside: avoid — prevent splitting tables, signatures, and key blocks
  • break-after: avoid on headings — heading stays with its following content
  • <thead> repeats on each page for multi-page tables — browser handles automatically
  • Three CSS properties turn an accidental layout into a professional document
⚠️ Three CSS Properties, Professional Document

Table rows split across pages, headings orphaned at page bottom, signatures torn in half. break-inside: avoid on rows, break-after: avoid on headings, break-inside: avoid on signatures. Three properties transform accidental layout into professional output.

Rule 4: Repeating Headers and Footers

The rule: 'Puppeteer page.pdf() supports headerTemplate and footerTemplate: HTML strings that render on every page. Header: company logo + document title. Footer: page number + total pages + date. Template variables: <span class="pageNumber"></span>, <span class="totalPages"></span>, <span class="date"></span>. Set displayHeaderFooter: true and configure margins to accommodate header/footer height.'

For styling headers and footers: 'Header and footer templates run in a separate context with limited CSS. Inline all styles: <div style="font-size: 10px; width: 100%; text-align: center;"><span class="pageNumber"></span> of <span class="totalPages"></span></div>. External stylesheets and classes do not apply. Font size must be explicitly set (default is very small). Width: 100% is required for full-page-width headers. Test the header/footer rendering separately — they are the most common source of PDF layout bugs.'

AI generates: PDFs with no page numbers (the reader does not know where they are in a 30-page document), no company branding (the PDF looks like a random printout), and no date (when was this generated?). Header template with logo + title, footer template with page X of Y + generation date: the PDF is a branded, navigable, datable document. Three templates transform an amateur PDF into a professional one.

Rule 5: Async Generation for Large Documents

The rule: 'For documents over 10 pages or taking more than 3 seconds to generate: use the same async job pattern as data export. (1) API returns 202 Accepted with job ID, (2) background worker generates the PDF, (3) uploads to S3 with signed URL, (4) notifies the user. For real-time preview: generate a low-resolution preview synchronously (first page only), then generate the full document asynchronously. The user sees a preview immediately and downloads the full PDF when ready.'

For caching generated PDFs: 'If the source data has not changed, serve the cached PDF. Cache key: document type + data hash. An invoice PDF: cache key = invoice-{invoiceId}-{dataHash}. If the invoice data changes (line item added), the hash changes, the cache misses, and a new PDF is generated. If the same invoice is downloaded 10 times without changes: 1 generation, 9 cache hits. Store cached PDFs in S3 with a TTL matching your data change frequency.'

AI generates: synchronous PDF generation in the API handler. A 50-page report takes 15 seconds to render. The user waits, the request may timeout, and other requests queue behind it. Async generation: the user gets an immediate response, the PDF generates in the background, and a download link arrives via notification. For frequently-accessed documents: the cached PDF serves in milliseconds.

ℹ️ 1 Generation, 9 Cache Hits

Same invoice downloaded 10 times: without caching, 10 Puppeteer renders (15 seconds each). With cache by data hash: 1 render + 9 instant S3 downloads. Cache key = invoice-{id}-{dataHash}. Data changes invalidate the cache automatically.

Complete PDF Generation Rules Template

Consolidated rules for PDF generation.

  • Headless browser (Puppeteer): HTML + CSS to PDF — full CSS support, pixel-perfect output
  • Browser pool: reuse instances, do not launch per request (200ms render vs 3s launch)
  • React-to-PDF: renderToStaticMarkup → Puppeteer — same component for preview and PDF
  • CSS page breaks: break-inside: avoid on rows/blocks, break-before: page on sections
  • Header/footer templates: logo + title + page X of Y + date on every page
  • Async for large docs: 202 Accepted + background job + signed S3 download URL
  • Cache by data hash: same data = same PDF, skip regeneration on repeated downloads
  • No string concatenation HTML — use templates with escaping and type-safe data binding
AI Rules for PDF Generation — RuleSync Blog