Complete Guide to PDF Generation with Puppeteer: From Simple Documents to Complex Reports
Learn how to harness Puppeteer for generating PDFs, from basic setups to advanced report creation with automation and customization features.

Puppeteer is a Node.js library that automates Chrome or Chromium browsers, making it a powerful tool for creating PDFs. Whether you're building simple documents or complex reports, Puppeteer offers features like native PDF support, custom layouts, and automation integration to streamline the process.
Key Features of Puppeteer for PDF Generation:
- Easy Setup: Use Puppeteer-Based Headless Browser and start generating PDFs with a few lines of code.
- Customizable Layouts: Adjust page size, orientation, margins, and more.
- Dynamic Content: Render JavaScript-heavy pages, apply custom styles, and include headers, footers, and page numbers.
- Performance: Faster than alternatives like Selenium for PDF creation.
- Scalability: Handles large-scale PDF generation, even for thousands of documents daily.
Why It’s Useful:
Puppeteer is perfect for automating workflows, generating reports, or exporting web content. From businesses creating branded documents to developers handling data visualizations, Puppeteer simplifies the process and ensures high-quality results.
If you're ready to dive in, the article walks you through setup, customization, and advanced features like handling large reports and troubleshooting common issues.
sbb-itb-23997f1
Start using Headless Browser on Latenode to Convert Files, Automate Web Monitoring, and More!
Latenode has an direct integration a Puppeteer-based Headless Browser, which allows you to integrate this library into your automation scenarios to scrape data from websites, take screenshots, convert files, and even automate services that do not have an API.
You can add code of any complexity, including scripts for converting HTML to PDF. Once the node is configured, you can link it to many other integrations to enhance your automation: AI models such as ChatGPT, databases like Airtable, CRM systems such as Webflow, and many other integrations.
Start using Headless Browser now to speed up, enhance, and simplify your work!
Basic PDF Setup with Puppeteer
Getting started with Puppeteer for creating PDFs involves setting it up correctly and understanding its basic configuration options. Here's a quick guide to help you begin.
Creating Your First PDF
To generate a basic PDF from a webpage, create an app.js file with the following code:
<span class="hljs-keyword">const</span> puppeteer = <span class="hljs-built_in">require</span>(<span class="hljs-string">'puppeteer'</span>);
(<span class="hljs-title function_">async</span> () => {
<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> puppeteer.<span class="hljs-title function_">launch</span>();
<span class="hljs-keyword">const</span> page = <span class="hljs-keyword">await</span> browser.<span class="hljs-title function_">newPage</span>();
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">goto</span>(<span class="hljs-string">'https://www.freecodecamp.org/'</span>, {
<span class="hljs-attr">waitUntil</span>: <span class="hljs-string">'networkidle0'</span>
});
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">pdf</span>({
<span class="hljs-attr">path</span>: <span class="hljs-string">'example.pdf'</span>,
<span class="hljs-attr">format</span>: <span class="hljs-string">'Letter'</span>
});
<span class="hljs-keyword">await</span> browser.<span class="hljs-title function_">close</span>();
})();
This script outlines the essential workflow: launching a browser, opening a page, navigating to a URL, generating the PDF, and closing the browser.
Page Settings and Layout
Puppeteer allows you to adjust various settings for PDF creation. Below are some key options you can modify:
| Setting | Description | Example Value |
|---|---|---|
| Format | Paper size | 'Letter', 'A4', 'Legal' |
| Width | Custom page width | '8.5in', '215.9mm' |
| Height | Custom page height | '11in', '279.4mm' |
| Landscape | Page orientation | true/false |
| Margin | Page margins | { top: '1in', right: '1in', bottom: '1in', left: '1in' } |
When generating PDFs from web pages, use the waitUntil: 'networkidle0' option. This ensures all network activity is completed before the PDF is generated [3].
For custom HTML content, you can create PDFs using local files. This is particularly helpful for templated documents or batch processing. Update the page.goto() function like this:
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">goto</span>(<span class="hljs-string">`file://<span class="hljs-subst">${absolutePath}</span>`</span>, {
<span class="hljs-attr">waitUntil</span>: <span class="hljs-string">'networkidle0'</span>
});
Because Puppeteer relies on Chrome's rendering engine, any CSS styles or formatting supported by Chrome will appear accurately in your PDFs. For more complex needs, you can explore advanced formatting techniques to build detailed PDF reports.
PDF Formatting Options
Puppeteer provides plenty of options to customize your PDFs. Check out the sections below to learn how to set up headers, control page layout, and keep your design consistent.
Headers and Footers
You can include headers and footers by modifying the PDF options with HTML templates. Here's an example:
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">pdf</span>({
<span class="hljs-attr">displayHeaderFooter</span>: <span class="hljs-literal">true</span>,
<span class="hljs-attr">headerTemplate</span>: <span class="hljs-string">`<div style="font-size: 10px; padding: 10px; width: 100%; text-align: center;">
<span class="title"></span> | Generated on <span class="date"></span>
</div>`</span>,
<span class="hljs-attr">footerTemplate</span>: <span class="hljs-string">`<div style="font-size: 10px; padding: 10px; width: 100%; text-align: center;">
Page <span class="pageNumber"></span> of <span class="totalPages"></span>
</div>`</span>,
<span class="hljs-attr">margin</span>: { <span class="hljs-attr">top</span>: <span class="hljs-string">'1.25in'</span>, <span class="hljs-attr">bottom</span>: <span class="hljs-string">'1in'</span> }
});
Here are the dynamic classes you can use:
- date: Adds the current timestamp.
- title: Displays the document title.
- url: Shows the page URL.
- pageNumber: Inserts the current page number.
- totalPages: Indicates the total number of pages.
Page Breaks and Layout Control
To manage page breaks and ensure content flows smoothly, use CSS rules like these:
<span class="hljs-selector-class">.no-break</span> {
<span class="hljs-attribute">page-break-inside</span>: avoid;
}
<span class="hljs-selector-class">.force-break</span> {
<span class="hljs-attribute">page-break-after</span>: always;
}
You can also adjust paper settings directly in the PDF options:
| Setting | Options | Example |
|---|---|---|
| Format | Letter, A4, Legal | format: 'Letter' |
| Dimensions | Custom width/height | width: '8.5in', height: '11in' |
| Orientation | Portrait/Landscape | landscape: true |
| Margins | Custom spacing | margin: { top: '1in', bottom: '1in' } |
Fonts and Visual Elements
Fine-tune fonts and visuals to align with your branding:
- Custom web fonts:
<span class="hljs-keyword">@font-face</span> {
<span class="hljs-attribute">font-family</span>: <span class="hljs-string">'CustomFont'</span>;
<span class="hljs-attribute">src</span>: <span class="hljs-built_in">url</span>(<span class="hljs-string">'path/to/font.woff2'</span>) <span class="hljs-built_in">format</span>(<span class="hljs-string">'woff2'</span>);
}
- Enable background colors and images:
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">pdf</span>({ <span class="hljs-attr">printBackground</span>: <span class="hljs-literal">true</span> });
<span class="hljs-selector-tag">body</span> {
-webkit-<span class="hljs-attribute">print-color-adjust</span>: exact;
<span class="hljs-attribute">background-color</span>: <span class="hljs-number">#f5f5f5</span>;
}
- Font configuration for Linux or AWS Lambda environments:
<span class="hljs-built_in">export</span> FONTCONFIG_FILE=/path/to/fonts.conf
These options allow you to create PDFs that look polished and professional.
Building Complex PDF Reports
Creating advanced PDF reports requires careful attention to detail, especially when it comes to data visualization, layout design, and performance. By enhancing basic formatting techniques, you can produce professional-quality documents that stand out.
Tables and Data Charts
Boost your reports with dynamic visuals by combining Puppeteer, D3.js, and Handlebars. Here's a sample setup for integrating data charts:
<span class="hljs-keyword">const</span> template = <span class="hljs-title class_">Handlebars</span>.<span class="hljs-title function_">compile</span>(<span class="hljs-string">`
<div class="report-container">
{{> dataTable}}
<div id="chart"></div>
</div>
`</span>);
<span class="hljs-comment">// D3.js chart configuration</span>
<span class="hljs-keyword">const</span> chartConfig = {
<span class="hljs-attr">container</span>: <span class="hljs-string">'#chart'</span>,
<span class="hljs-attr">data</span>: salesData,
<span class="hljs-attr">width</span>: <span class="hljs-number">800</span>,
<span class="hljs-attr">height</span>: <span class="hljs-number">400</span>
};
To ensure your PDF looks polished, configure the output settings as follows:
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">pdf</span>({
<span class="hljs-attr">printBackground</span>: <span class="hljs-literal">true</span>,
<span class="hljs-attr">format</span>: <span class="hljs-string">'Letter'</span>,
<span class="hljs-attr">margin</span>: {
<span class="hljs-attr">top</span>: <span class="hljs-string">'0.75in'</span>,
<span class="hljs-attr">right</span>: <span class="hljs-string">'0.5in'</span>,
<span class="hljs-attr">bottom</span>: <span class="hljs-string">'0.75in'</span>,
<span class="hljs-attr">left</span>: <span class="hljs-string">'0.5in'</span>
}
});
"D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS." [5]
Once your visuals are in place, organize the report with contents and page numbers.
Contents and Page Numbers
Follow these steps to add a table of contents and ensure accurate page numbering:
Generate Initial PDF Create the first version of the PDF with headers and footers:
<span class="hljs-keyword">const</span> mainPdf = <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">pdf</span>({ <span class="hljs-attr">format</span>: <span class="hljs-string">'Letter'</span>, <span class="hljs-attr">displayHeaderFooter</span>: <span class="hljs-literal">true</span> });Parse and Extract Page Numbers Extract page numbers and generate the table of contents dynamically:
<span class="hljs-keyword">const</span> pageMapping = <span class="hljs-keyword">await</span> <span class="hljs-title function_">extractPageNumbers</span>(mainPdf); <span class="hljs-keyword">const</span> tocHtml = <span class="hljs-title function_">generateTocHtml</span>(pageMapping);Merge Final Document Combine the table of contents with the main document:
<span class="hljs-keyword">const</span> finalPdf = <span class="hljs-keyword">await</span> <span class="hljs-title function_">mergePdfs</span>([tocPdf, mainPdf]);
For larger documents, you'll need to take additional steps to maintain performance.
Large Report Handling
Handling extensive reports efficiently requires specific optimizations. Here are some effective techniques:
| Technique | Benefit |
|---|---|
| Temporary File Usage | Cuts memory usage by 20% |
| CPU Core Limiting | Speeds up processing by balancing tasks |
For example, Carriyo's implementation in April 2024 generated 10,000 PDFs daily with a 95th percentile latency of 365ms on AWS Lambda [7]. Here's how temporary file handling can be applied:
<span class="hljs-comment">// Optimize file handling</span>
<span class="hljs-keyword">const</span> tempFile = <span class="hljs-keyword">await</span> <span class="hljs-title function_">saveTempHtml</span>(content);
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">goto</span>(<span class="hljs-string">`file://<span class="hljs-subst">${tempFile}</span>`</span>, {
<span class="hljs-attr">waitUntil</span>: <span class="hljs-string">'networkidle0'</span>,
<span class="hljs-attr">timeout</span>: <span class="hljs-number">30000</span>
});
"By using temporary files to bypass protocol limitations, we improved both performance and reliability." - Danindu de Silva [6]
If you encounter issues during navigation, make sure to log errors and reload the page:
<span class="hljs-keyword">try</span> {
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">goto</span>(url, {
<span class="hljs-attr">waitUntil</span>: <span class="hljs-string">'networkidle0'</span>,
<span class="hljs-attr">timeout</span>: <span class="hljs-number">30000</span>
});
} <span class="hljs-keyword">catch</span> (error) {
<span class="hljs-variable language_">console</span>.<span class="hljs-title function_">error</span>(<span class="hljs-string">'Navigation failed:'</span>, error);
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">reload</span>();
}
These methods ensure your large-scale report generation remains reliable and efficient.
Common Issues and Solutions
Tackling common problems is key to ensuring your automated PDF workflows run smoothly and reliably. The following solutions build on the configuration basics discussed earlier.
Browser Compatibility
In Q3 2023, Acme Corp's development team tackled "Failed to launch chrome" errors on Ubuntu servers by installing the necessary dependencies. This reduced error rates by 95% and saved 10 hours per week in debugging time [8].
For Chrome launch issues on Windows, use the following configuration:
<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> puppeteer.<span class="hljs-title function_">launch</span>({
<span class="hljs-attr">ignoreDefaultArgs</span>: [<span class="hljs-string">'--disable-extensions'</span>],
<span class="hljs-attr">args</span>: [<span class="hljs-string">'--disable-features=HttpsFirstBalancedModeAutoEnable'</span>]
});
Error Resolution Guide
Here’s how to address common errors efficiently:
| Error Type | Solution | Impact |
|---|---|---|
| Module Not Found | Update Node.js to v14+ | Fixes dependency conflicts |
| Navigation Timeout | Set custom timeout values | Prevents premature script failures |
| Sandbox Issues | Adjust permissions | Ensures secure execution |
For example, to handle navigation timeouts, you can use this retry logic:
<span class="hljs-keyword">const</span> <span class="hljs-title function_">navigateWithRetry</span> = <span class="hljs-keyword">async</span> (<span class="hljs-params">page, url</span>) => {
<span class="hljs-keyword">try</span> {
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">goto</span>(url, {
<span class="hljs-attr">waitUntil</span>: <span class="hljs-string">'networkidle0'</span>,
<span class="hljs-attr">timeout</span>: <span class="hljs-number">30000</span>
});
} <span class="hljs-keyword">catch</span> (error) {
<span class="hljs-variable language_">console</span>.<span class="hljs-title function_">error</span>(<span class="hljs-string">'Navigation error:'</span>, error);
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">reload</span>();
}
};
Once errors are resolved, you can shift attention to improving speed for better workflow efficiency.
Speed Optimization
After handling errors, the next step is boosting PDF generation speed. For instance, Carriyo generates 10,000 PDFs daily with a p95 latency of 365ms on AWS Lambda [7].
Here’s a code snippet to optimize performance by serving static assets locally:
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">setRequestInterception</span>(<span class="hljs-literal">true</span>);
page.<span class="hljs-title function_">on</span>(<span class="hljs-string">'request'</span>, <span class="hljs-function"><span class="hljs-params">request</span> =></span> {
<span class="hljs-keyword">if</span> (request.<span class="hljs-title function_">resourceType</span>() === <span class="hljs-string">'font'</span> || request.<span class="hljs-title function_">resourceType</span>() === <span class="hljs-string">'image'</span>) {
request.<span class="hljs-title function_">respond</span>({
<span class="hljs-attr">body</span>: localFileContent,
<span class="hljs-attr">headers</span>: { <span class="hljs-string">'Cache-Control'</span>: <span class="hljs-string">'public, max-age=31536000'</span> }
});
}
});
For large-scale operations, consider these strategies:
- Resource Management: Keep browser instances warm on serverless platforms to avoid cold starts that increase latency [7].
- System Configuration: On a 4-core system, limit concurrent PDF generation to three processes for better stability [7].
- Network Optimization: Use Puppeteer's network interception APIs to serve static assets locally [7].
"By using temporary files to bypass protocol limitations, we improved both performance and reliability." - Danindu de Silva [6]
Using Puppeteer with Latenode
Latenode makes it easier to use Puppeteer for generating PDFs, offering a scalable and cost-efficient way to automate document workflows.
How Latenode Works
Latenode combines Puppeteer's PDF capabilities with tools like a visual workflow builder, AI-powered code generation, and conditional logic. It also supports a wide range of NPM packages. The platform uses a time-based credit system, which helps manage costs for PDF automation efficiently.
Building on Puppeteer's core functionality, Latenode simplifies complex PDF workflows with an easy-to-use interface designed for scalability.
Here are the standout features for PDF automation:
- Integrated headless browser automation
- AI-assisted custom code generation
- A visual workflow builder for setting up PDF logic
- Support for conditional branching in workflows
Setting Up Puppeteer in Latenode
To use Puppeteer for PDF generation on Latenode, you can follow this basic setup:
<span class="hljs-keyword">const</span> puppeteer = <span class="hljs-built_in">require</span>(<span class="hljs-string">'puppeteer'</span>);
<span class="hljs-keyword">async</span> <span class="hljs-keyword">function</span> <span class="hljs-title function_">generatePDF</span>(<span class="hljs-params">url</span>) {
<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> puppeteer.<span class="hljs-title function_">launch</span>({
<span class="hljs-attr">headless</span>: <span class="hljs-literal">true</span>,
<span class="hljs-attr">args</span>: [<span class="hljs-string">'--disable-dev-shm-usage'</span>]
});
<span class="hljs-keyword">const</span> page = <span class="hljs-keyword">await</span> browser.<span class="hljs-title function_">newPage</span>();
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">goto</span>(url, { <span class="hljs-attr">waitUntil</span>: <span class="hljs-string">'networkidle0'</span> });
<span class="hljs-keyword">const</span> pdf = <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">pdf</span>({ <span class="hljs-attr">format</span>: <span class="hljs-string">'A4'</span> });
<span class="hljs-keyword">await</span> browser.<span class="hljs-title function_">close</span>();
<span class="hljs-keyword">return</span> pdf;
}
To keep document generation secure, store sensitive information like credentials in environment variables:
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">type</span>(<span class="hljs-string">'#email'</span>, process.<span class="hljs-property">env</span>.<span class="hljs-property">PDF_USER</span>);
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">type</span>(<span class="hljs-string">'#password'</span>, process.<span class="hljs-property">env</span>.<span class="hljs-property">PDF_PASSWORD</span>);
This setup ensures a secure and efficient workflow for generating PDFs.
Comparing Automation Platforms
When deciding on a platform for PDF automation, here’s how Latenode stacks up against traditional tools:
| Feature | Latenode | Traditional Automation Tools |
|---|---|---|
| Pricing Model | Time-based credits starting at $5/month | Per-document or user-based licensing |
| Workflow Limits | 20–unlimited | Often limited by concurrent executions |
| NPM Package Support | Over 1 million packages | Usually limited to platform-specific modules |
| Execution History | Retention for 1–60 days | Often limited to basic logging |
For enterprise use, Latenode's Prime plan ($297/month) supports up to 1.5 million scenario runs and retains execution history for 60 days. This makes it a strong choice for businesses with high-volume PDF generation needs.
The platform also simplifies tasks like modifying page styling before creating a PDF. For example, you can hide specific elements with this snippet:
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">addStyleTag</span>({
<span class="hljs-attr">content</span>: <span class="hljs-string">'.nav { display: none } .navbar { border: 0px } #print-button { display: none }'</span>
});
This flexibility helps streamline even the most complex PDF workflows.
Conclusion
Main Points Review
Puppeteer is a powerful tool for generating PDFs, whether you're working with simple documents or complex reports. Its ability to handle modern web technologies and provide fine-tuned control over PDF output makes it a strong choice for large-scale use cases.
For example, Carriyo successfully used Puppeteer in April 2024 to generate 10,000 PDFs daily for shipment labels on AWS Lambda. They achieved a p95 latency of 365ms at a cost of $7.68 for 430,000 invocations [7].
Here are some standout features and their practical benefits:
| Feature | Benefit | Real-World Impact |
|---|---|---|
| Headless Browser | Enables server-side rendering with modern web capabilities | Handles dynamic content, JavaScript, and CSS with precision |
| Resource Optimization | Caches assets and disables unused features to boost performance | Improves efficiency during PDF generation |
| Error Handling | Includes retry mechanisms and timeout controls | Ensures reliability in production environments |
| Scalability | Supports high-volume PDF generation | Proven performance under heavy workloads |
Getting Started Tips
To make the most of Puppeteer, consider these steps for a successful deployment:
- Performance Optimization: Use the
userDataDirsetting to cache resources and disable unused features to speed up PDF generation [4]. - Resource Management: Generate PDFs on the server side to reduce the load on client devices, especially for high-volume tasks [1].
- Error Handling: Implement robust error-handling strategies with timeouts and retry mechanisms to keep production environments stable [7].
For an even smoother experience, you can integrate Puppeteer with platforms like Latenode to simplify workflows while maintaining top performance.
Related posts



