A low-code platform blending no-code simplicity with full-code power 🚀
Get started free
Complete Guide to PDF Generation with Puppeteer: From Simple Documents to Complex Reports
March 14, 2025
•
10
min read

Complete Guide to PDF Generation with Puppeteer: From Simple Documents to Complex Reports

George Miloradovich
Researcher, Copywriter & Usecase Interviewer
Table of contents

Puppeteer is a Node.js library that automates Chrome or Chromium browsers, making it a powerful tool for creating PDFs. Whether you're building simple documents or complex reports, Puppeteer offers features like native PDF support, custom layouts, and automation integration to streamline the process.

Key Features of Puppeteer for PDF Generation:

Puppeteer

  • Easy Setup: Use Puppeteer-Based Headless Browser and start generating PDFs with a few lines of code.
  • Customizable Layouts: Adjust page size, orientation, margins, and more.
  • Dynamic Content: Render JavaScript-heavy pages, apply custom styles, and include headers, footers, and page numbers.
  • Performance: Faster than alternatives like Selenium for PDF creation.
  • Scalability: Handles large-scale PDF generation, even for thousands of documents daily.

Why It’s Useful:

Puppeteer is perfect for automating workflows, generating reports, or exporting web content. From businesses creating branded documents to developers handling data visualizations, Puppeteer simplifies the process and ensures high-quality results.

If you're ready to dive in, the article walks you through setup, customization, and advanced features like handling large reports and troubleshooting common issues.

sbb-itb-23997f1

Start using Headless Browser on Latenode to Convert Files, Automate Web Monitoring, and More!

Latenode has an direct integration a Puppeteer-based Headless Browser, which allows you to integrate this library into your automation scenarios to scrape data from websites, take screenshots, convert files, and even automate services that do not have an API.

You can add code of any complexity, including scripts for converting HTML to PDF. Once the node is configured, you can link it to many other integrations to enhance your automation: AI models such as ChatGPT, databases like Airtable, CRM systems such as Webflow, and many other integrations.

Start using Headless Browser now to speed up, enhance, and simplify your work!

Basic PDF Setup with Puppeteer

Getting started with Puppeteer for creating PDFs involves setting it up correctly and understanding its basic configuration options. Here's a quick guide to help you begin.

Creating Your First PDF

To generate a basic PDF from a webpage, create an app.js file with the following code:

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto('https://www.freecodecamp.org/', {
        waitUntil: 'networkidle0'
    });
    await page.pdf({ 
        path: 'example.pdf',
        format: 'Letter'
    });
    await browser.close();
})();

This script outlines the essential workflow: launching a browser, opening a page, navigating to a URL, generating the PDF, and closing the browser.

Page Settings and Layout

Puppeteer allows you to adjust various settings for PDF creation. Below are some key options you can modify:

Setting Description Example Value
Format Paper size 'Letter', 'A4', 'Legal'
Width Custom page width '8.5in', '215.9mm'
Height Custom page height '11in', '279.4mm'
Landscape Page orientation true/false
Margin Page margins { top: '1in', right: '1in', bottom: '1in', left: '1in' }

When generating PDFs from web pages, use the waitUntil: 'networkidle0' option. This ensures all network activity is completed before the PDF is generated.

For custom HTML content, you can create PDFs using local files. This is particularly helpful for templated documents or batch processing. Update the page.goto() function like this:

await page.goto(`file://${absolutePath}`, {
    waitUntil: 'networkidle0'
});

Because Puppeteer relies on Chrome's rendering engine, any CSS styles or formatting supported by Chrome will appear accurately in your PDFs. For more complex needs, you can explore advanced formatting techniques to build detailed PDF reports.

PDF Formatting Options

Puppeteer provides plenty of options to customize your PDFs. Check out the sections below to learn how to set up headers, control page layout, and keep your design consistent.

Headers and Footers

You can include headers and footers by modifying the PDF options with HTML templates. Here's an example:

await page.pdf({
    displayHeaderFooter: true,
    headerTemplate: `<div style="font-size: 10px; padding: 10px; width: 100%; text-align: center;">
        <span class="title"></span> | Generated on <span class="date"></span>
    </div>`,
    footerTemplate: `<div style="font-size: 10px; padding: 10px; width: 100%; text-align: center;">
        Page <span class="pageNumber"></span> of <span class="totalPages"></span>
    </div>`,
    margin: { top: '1.25in', bottom: '1in' }
});

Here are the dynamic classes you can use:

  • date: Adds the current timestamp.
  • title: Displays the document title.
  • url: Shows the page URL.
  • pageNumber: Inserts the current page number.
  • totalPages: Indicates the total number of pages.

Page Breaks and Layout Control

To manage page breaks and ensure content flows smoothly, use CSS rules like these:

.no-break {
    page-break-inside: avoid;
}

.force-break {
    page-break-after: always;
}

You can also adjust paper settings directly in the PDF options:

Setting Options Example
Format Letter, A4, Legal format: 'Letter'
Dimensions Custom width/height width: '8.5in', height: '11in'
Orientation Portrait/Landscape landscape: true
Margins Custom spacing margin: { top: '1in', bottom: '1in' }

Fonts and Visual Elements

Fine-tune fonts and visuals to align with your branding:

  • Custom web fonts:
@font-face {
    font-family: 'CustomFont';
    src: url('path/to/font.woff2') format('woff2');
}
  • Enable background colors and images:
await page.pdf({ printBackground: true });
body {
    -webkit-print-color-adjust: exact;
    background-color: #f5f5f5;
}
  • Font configuration for Linux or AWS Lambda environments:
export FONTCONFIG_FILE=/path/to/fonts.conf

These options allow you to create PDFs that look polished and professional.

Building Complex PDF Reports

Creating advanced PDF reports requires careful attention to detail, especially when it comes to data visualization, layout design, and performance. By enhancing basic formatting techniques, you can produce professional-quality documents that stand out.

Tables and Data Charts

Boost your reports with dynamic visuals by combining Puppeteer, D3.js, and Handlebars. Here's a sample setup for integrating data charts:

const template = Handlebars.compile(`
  <div class="report-container">
    {{> dataTable}}
    <div id="chart"></div>
  </div>
`);

// D3.js chart configuration
const chartConfig = {
  container: '#chart',
  data: salesData,
  width: 800,
  height: 400
};

To ensure your PDF looks polished, configure the output settings as follows:

await page.pdf({
  printBackground: true,
  format: 'Letter',
  margin: {
    top: '0.75in',
    right: '0.5in',
    bottom: '0.75in',
    left: '0.5in'
  }
});

"D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS."

Once your visuals are in place, organize the report with contents and page numbers.

Contents and Page Numbers

Follow these steps to add a table of contents and ensure accurate page numbering:

  1. Generate Initial PDF Create the first version of the PDF with headers and footers:
    const mainPdf = await page.pdf({
      format: 'Letter',
      displayHeaderFooter: true
    });
    
  2. Parse and Extract Page Numbers Extract page numbers and generate the table of contents dynamically:
    const pageMapping = await extractPageNumbers(mainPdf);
    const tocHtml = generateTocHtml(pageMapping);
    
  3. Merge Final Document Combine the table of contents with the main document:
    const finalPdf = await mergePdfs([tocPdf, mainPdf]);
    

For larger documents, you'll need to take additional steps to maintain performance.

Large Report Handling

Handling extensive reports efficiently requires specific optimizations. Here are some effective techniques:

Technique Benefit
Temporary File Usage Cuts memory usage by 20%
CPU Core Limiting Speeds up processing by balancing tasks

For example, Carriyo's implementation in April 2024 generated 10,000 PDFs daily with a 95th percentile latency of 365ms on AWS Lambda. Here's how temporary file handling can be applied:

// Optimize file handling
const tempFile = await saveTempHtml(content);
await page.goto(`file://${tempFile}`, {
  waitUntil: 'networkidle0',
  timeout: 30000
});

"By using temporary files to bypass protocol limitations, we improved both performance and reliability." - Danindu de Silva

If you encounter issues during navigation, make sure to log errors and reload the page:

try {
  await page.goto(url, {
    waitUntil: 'networkidle0',
    timeout: 30000
  });
} catch (error) {
  console.error('Navigation failed:', error);
  await page.reload();
}

These methods ensure your large-scale report generation remains reliable and efficient.

Common Issues and Solutions

Tackling common problems is key to ensuring your automated PDF workflows run smoothly and reliably. The following solutions build on the configuration basics discussed earlier.

Browser Compatibility

In Q3 2023, Acme Corp's development team tackled "Failed to launch chrome" errors on Ubuntu servers by installing the necessary dependencies. This reduced error rates by 95% and saved 10 hours per week in debugging time.

For Chrome launch issues on Windows, use the following configuration:

const browser = await puppeteer.launch({
    ignoreDefaultArgs: ['--disable-extensions'],
    args: ['--disable-features=HttpsFirstBalancedModeAutoEnable']
});

Error Resolution Guide

Here’s how to address common errors efficiently:

Error Type Solution Impact
Module Not Found Update Node.js to v14+ Fixes dependency conflicts
Navigation Timeout Set custom timeout values Prevents premature script failures
Sandbox Issues Adjust permissions Ensures secure execution

For example, to handle navigation timeouts, you can use this retry logic:

const navigateWithRetry = async (page, url) => {
    try {
        await page.goto(url, {
            waitUntil: 'networkidle0',
            timeout: 30000
        });
    } catch (error) {
        console.error('Navigation error:', error);
        await page.reload();
    }
};

Once errors are resolved, you can shift attention to improving speed for better workflow efficiency.

Speed Optimization

After handling errors, the next step is boosting PDF generation speed. For instance, Carriyo generates 10,000 PDFs daily with a p95 latency of 365ms on AWS Lambda.

Here’s a code snippet to optimize performance by serving static assets locally:

await page.setRequestInterception(true);
page.on('request', request => {
    if (request.resourceType() === 'font' || request.resourceType() === 'image') {
        request.respond({
            body: localFileContent,
            headers: { 'Cache-Control': 'public, max-age=31536000' }
        });
    }
});

For large-scale operations, consider these strategies:

  • Resource Management: Keep browser instances warm on serverless platforms to avoid cold starts that increase latency.
  • System Configuration: On a 4-core system, limit concurrent PDF generation to three processes for better stability.
  • Network Optimization: Use Puppeteer's network interception APIs to serve static assets locally.

"By using temporary files to bypass protocol limitations, we improved both performance and reliability." - Danindu de Silva

Using Puppeteer with Latenode

Latenode

Latenode makes it easier to use Puppeteer for generating PDFs, offering a scalable and cost-efficient way to automate document workflows.

How Latenode Works

Latenode combines Puppeteer's PDF capabilities with tools like a visual workflow builder, AI-powered code generation, and conditional logic. It also supports a wide range of NPM packages. The platform uses a time-based credit system, which helps manage costs for PDF automation efficiently.

Building on Puppeteer's core functionality, Latenode simplifies complex PDF workflows with an easy-to-use interface designed for scalability.

Here are the standout features for PDF automation:

  • Integrated headless browser automation
  • AI-assisted custom code generation
  • A visual workflow builder for setting up PDF logic
  • Support for conditional branching in workflows

Setting Up Puppeteer in Latenode

To use Puppeteer for PDF generation on Latenode, you can follow this basic setup:

const puppeteer = require('puppeteer');

async function generatePDF(url) {
    const browser = await puppeteer.launch({ 
        headless: true,
        args: ['--disable-dev-shm-usage']
    });
    const page = await browser.newPage();
    await page.goto(url, { waitUntil: 'networkidle0' });
    const pdf = await page.pdf({ format: 'A4' });
    await browser.close();
    return pdf;
}

To keep document generation secure, store sensitive information like credentials in environment variables:

await page.type('#email', process.env.PDF_USER);
await page.type('#password', process.env.PDF_PASSWORD);

This setup ensures a secure and efficient workflow for generating PDFs.

Comparing Automation Platforms

When deciding on a platform for PDF automation, here’s how Latenode stacks up against traditional tools:

Feature Latenode Traditional Automation Tools
Pricing Model Time-based credits starting at $5/month Per-document or user-based licensing
Workflow Limits 20–unlimited Often limited by concurrent executions
NPM Package Support Over 1 million packages Usually limited to platform-specific modules
Execution History Retention for 1–60 days Often limited to basic logging

For enterprise use, Latenode's Prime plan ($297/month) supports up to 1.5 million scenario runs and retains execution history for 60 days. This makes it a strong choice for businesses with high-volume PDF generation needs.

The platform also simplifies tasks like modifying page styling before creating a PDF. For example, you can hide specific elements with this snippet:

await page.addStyleTag({ 
    content: '.nav { display: none } .navbar { border: 0px } #print-button { display: none }' 
});

This flexibility helps streamline even the most complex PDF workflows.

Conclusion

Main Points Review

Puppeteer is a powerful tool for generating PDFs, whether you're working with simple documents or complex reports. Its ability to handle modern web technologies and provide fine-tuned control over PDF output makes it a strong choice for large-scale use cases.

For example, Carriyo successfully used Puppeteer in April 2024 to generate 10,000 PDFs daily for shipment labels on AWS Lambda. They achieved a p95 latency of 365ms at a cost of $7.68 for 430,000 invocations.

Here are some standout features and their practical benefits:

Feature Benefit Real-World Impact
Headless Browser Enables server-side rendering with modern web capabilities Handles dynamic content, JavaScript, and CSS with precision
Resource Optimization Caches assets and disables unused features to boost performance Improves efficiency during PDF generation
Error Handling Includes retry mechanisms and timeout controls Ensures reliability in production environments
Scalability Supports high-volume PDF generation Proven performance under heavy workloads

Getting Started Tips

To make the most of Puppeteer, consider these steps for a successful deployment:

  • Performance Optimization: Use the userDataDir setting to cache resources and disable unused features to speed up PDF generation.
  • Resource Management: Generate PDFs on the server side to reduce the load on client devices, especially for high-volume tasks.
  • Error Handling: Implement robust error-handling strategies with timeouts and retry mechanisms to keep production environments stable.

For an even smoother experience, you can integrate Puppeteer with platforms like Latenode to simplify workflows while maintaining top performance.

Related Blog Posts

Related Blogs

Use case

Backed by