PRICING
PRODUCT
SOLUTIONS
by use cases
AI Lead ManagementInvoicingSocial MediaProject ManagementData Managementby Industry
learn more
BlogTemplatesVideosYoutubeRESOURCES
COMMUNITIES AND SOCIAL MEDIA
PARTNERS
Puppeteer is a Node.js library that automates Chrome or Chromium browsers, making it a powerful tool for creating PDFs. Whether you're building simple documents or complex reports, Puppeteer offers features like native PDF support, custom layouts, and automation integration to streamline the process.
Puppeteer is perfect for automating workflows, generating reports, or exporting web content. From businesses creating branded documents to developers handling data visualizations, Puppeteer simplifies the process and ensures high-quality results.
If you're ready to dive in, the article walks you through setup, customization, and advanced features like handling large reports and troubleshooting common issues.
Latenode has an direct integration a Puppeteer-based Headless Browser, which allows you to integrate this library into your automation scenarios to scrape data from websites, take screenshots, convert files, and even automate services that do not have an API.
You can add code of any complexity, including scripts for converting HTML to PDF. Once the node is configured, you can link it to many other integrations to enhance your automation: AI models such as ChatGPT, databases like Airtable, CRM systems such as Webflow, and many other integrations.
Start using Headless Browser now to speed up, enhance, and simplify your work!
Getting started with Puppeteer for creating PDFs involves setting it up correctly and understanding its basic configuration options. Here's a quick guide to help you begin.
To generate a basic PDF from a webpage, create an app.js
file with the following code:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.freecodecamp.org/', {
waitUntil: 'networkidle0'
});
await page.pdf({
path: 'example.pdf',
format: 'Letter'
});
await browser.close();
})();
This script outlines the essential workflow: launching a browser, opening a page, navigating to a URL, generating the PDF, and closing the browser.
Puppeteer allows you to adjust various settings for PDF creation. Below are some key options you can modify:
Setting | Description | Example Value |
---|---|---|
Format | Paper size | 'Letter', 'A4', 'Legal' |
Width | Custom page width | '8.5in', '215.9mm' |
Height | Custom page height | '11in', '279.4mm' |
Landscape | Page orientation | true/false |
Margin | Page margins | { top: '1in', right: '1in', bottom: '1in', left: '1in' } |
When generating PDFs from web pages, use the waitUntil: 'networkidle0'
option. This ensures all network activity is completed before the PDF is generated.
For custom HTML content, you can create PDFs using local files. This is particularly helpful for templated documents or batch processing. Update the page.goto()
function like this:
await page.goto(`file://${absolutePath}`, {
waitUntil: 'networkidle0'
});
Because Puppeteer relies on Chrome's rendering engine, any CSS styles or formatting supported by Chrome will appear accurately in your PDFs. For more complex needs, you can explore advanced formatting techniques to build detailed PDF reports.
Puppeteer provides plenty of options to customize your PDFs. Check out the sections below to learn how to set up headers, control page layout, and keep your design consistent.
You can include headers and footers by modifying the PDF options with HTML templates. Here's an example:
await page.pdf({
displayHeaderFooter: true,
headerTemplate: `<div style="font-size: 10px; padding: 10px; width: 100%; text-align: center;">
<span class="title"></span> | Generated on <span class="date"></span>
</div>`,
footerTemplate: `<div style="font-size: 10px; padding: 10px; width: 100%; text-align: center;">
Page <span class="pageNumber"></span> of <span class="totalPages"></span>
</div>`,
margin: { top: '1.25in', bottom: '1in' }
});
Here are the dynamic classes you can use:
To manage page breaks and ensure content flows smoothly, use CSS rules like these:
.no-break {
page-break-inside: avoid;
}
.force-break {
page-break-after: always;
}
You can also adjust paper settings directly in the PDF options:
Setting | Options | Example |
---|---|---|
Format | Letter, A4, Legal | format: 'Letter' |
Dimensions | Custom width/height | width: '8.5in', height: '11in' |
Orientation | Portrait/Landscape | landscape: true |
Margins | Custom spacing | margin: { top: '1in', bottom: '1in' } |
Fine-tune fonts and visuals to align with your branding:
@font-face {
font-family: 'CustomFont';
src: url('path/to/font.woff2') format('woff2');
}
await page.pdf({ printBackground: true });
body {
-webkit-print-color-adjust: exact;
background-color: #f5f5f5;
}
export FONTCONFIG_FILE=/path/to/fonts.conf
These options allow you to create PDFs that look polished and professional.
Creating advanced PDF reports requires careful attention to detail, especially when it comes to data visualization, layout design, and performance. By enhancing basic formatting techniques, you can produce professional-quality documents that stand out.
Boost your reports with dynamic visuals by combining Puppeteer, D3.js, and Handlebars. Here's a sample setup for integrating data charts:
const template = Handlebars.compile(`
<div class="report-container">
{{> dataTable}}
<div id="chart"></div>
</div>
`);
// D3.js chart configuration
const chartConfig = {
container: '#chart',
data: salesData,
width: 800,
height: 400
};
To ensure your PDF looks polished, configure the output settings as follows:
await page.pdf({
printBackground: true,
format: 'Letter',
margin: {
top: '0.75in',
right: '0.5in',
bottom: '0.75in',
left: '0.5in'
}
});
"D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS."
Once your visuals are in place, organize the report with contents and page numbers.
Follow these steps to add a table of contents and ensure accurate page numbering:
const mainPdf = await page.pdf({
format: 'Letter',
displayHeaderFooter: true
});
const pageMapping = await extractPageNumbers(mainPdf);
const tocHtml = generateTocHtml(pageMapping);
const finalPdf = await mergePdfs([tocPdf, mainPdf]);
For larger documents, you'll need to take additional steps to maintain performance.
Handling extensive reports efficiently requires specific optimizations. Here are some effective techniques:
Technique | Benefit |
---|---|
Temporary File Usage | Cuts memory usage by 20% |
CPU Core Limiting | Speeds up processing by balancing tasks |
For example, Carriyo's implementation in April 2024 generated 10,000 PDFs daily with a 95th percentile latency of 365ms on AWS Lambda. Here's how temporary file handling can be applied:
// Optimize file handling
const tempFile = await saveTempHtml(content);
await page.goto(`file://${tempFile}`, {
waitUntil: 'networkidle0',
timeout: 30000
});
"By using temporary files to bypass protocol limitations, we improved both performance and reliability." - Danindu de Silva
If you encounter issues during navigation, make sure to log errors and reload the page:
try {
await page.goto(url, {
waitUntil: 'networkidle0',
timeout: 30000
});
} catch (error) {
console.error('Navigation failed:', error);
await page.reload();
}
These methods ensure your large-scale report generation remains reliable and efficient.
Tackling common problems is key to ensuring your automated PDF workflows run smoothly and reliably. The following solutions build on the configuration basics discussed earlier.
In Q3 2023, Acme Corp's development team tackled "Failed to launch chrome" errors on Ubuntu servers by installing the necessary dependencies. This reduced error rates by 95% and saved 10 hours per week in debugging time.
For Chrome launch issues on Windows, use the following configuration:
const browser = await puppeteer.launch({
ignoreDefaultArgs: ['--disable-extensions'],
args: ['--disable-features=HttpsFirstBalancedModeAutoEnable']
});
Here’s how to address common errors efficiently:
Error Type | Solution | Impact |
---|---|---|
Module Not Found | Update Node.js to v14+ | Fixes dependency conflicts |
Navigation Timeout | Set custom timeout values | Prevents premature script failures |
Sandbox Issues | Adjust permissions | Ensures secure execution |
For example, to handle navigation timeouts, you can use this retry logic:
const navigateWithRetry = async (page, url) => {
try {
await page.goto(url, {
waitUntil: 'networkidle0',
timeout: 30000
});
} catch (error) {
console.error('Navigation error:', error);
await page.reload();
}
};
Once errors are resolved, you can shift attention to improving speed for better workflow efficiency.
After handling errors, the next step is boosting PDF generation speed. For instance, Carriyo generates 10,000 PDFs daily with a p95 latency of 365ms on AWS Lambda.
Here’s a code snippet to optimize performance by serving static assets locally:
await page.setRequestInterception(true);
page.on('request', request => {
if (request.resourceType() === 'font' || request.resourceType() === 'image') {
request.respond({
body: localFileContent,
headers: { 'Cache-Control': 'public, max-age=31536000' }
});
}
});
For large-scale operations, consider these strategies:
"By using temporary files to bypass protocol limitations, we improved both performance and reliability." - Danindu de Silva
Latenode makes it easier to use Puppeteer for generating PDFs, offering a scalable and cost-efficient way to automate document workflows.
Latenode combines Puppeteer's PDF capabilities with tools like a visual workflow builder, AI-powered code generation, and conditional logic. It also supports a wide range of NPM packages. The platform uses a time-based credit system, which helps manage costs for PDF automation efficiently.
Building on Puppeteer's core functionality, Latenode simplifies complex PDF workflows with an easy-to-use interface designed for scalability.
Here are the standout features for PDF automation:
To use Puppeteer for PDF generation on Latenode, you can follow this basic setup:
const puppeteer = require('puppeteer');
async function generatePDF(url) {
const browser = await puppeteer.launch({
headless: true,
args: ['--disable-dev-shm-usage']
});
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle0' });
const pdf = await page.pdf({ format: 'A4' });
await browser.close();
return pdf;
}
To keep document generation secure, store sensitive information like credentials in environment variables:
await page.type('#email', process.env.PDF_USER);
await page.type('#password', process.env.PDF_PASSWORD);
This setup ensures a secure and efficient workflow for generating PDFs.
When deciding on a platform for PDF automation, here’s how Latenode stacks up against traditional tools:
Feature | Latenode | Traditional Automation Tools |
---|---|---|
Pricing Model | Time-based credits starting at $5/month | Per-document or user-based licensing |
Workflow Limits | 20–unlimited | Often limited by concurrent executions |
NPM Package Support | Over 1 million packages | Usually limited to platform-specific modules |
Execution History | Retention for 1–60 days | Often limited to basic logging |
For enterprise use, Latenode's Prime plan ($297/month) supports up to 1.5 million scenario runs and retains execution history for 60 days. This makes it a strong choice for businesses with high-volume PDF generation needs.
The platform also simplifies tasks like modifying page styling before creating a PDF. For example, you can hide specific elements with this snippet:
await page.addStyleTag({
content: '.nav { display: none } .navbar { border: 0px } #print-button { display: none }'
});
This flexibility helps streamline even the most complex PDF workflows.
Puppeteer is a powerful tool for generating PDFs, whether you're working with simple documents or complex reports. Its ability to handle modern web technologies and provide fine-tuned control over PDF output makes it a strong choice for large-scale use cases.
For example, Carriyo successfully used Puppeteer in April 2024 to generate 10,000 PDFs daily for shipment labels on AWS Lambda. They achieved a p95 latency of 365ms at a cost of $7.68 for 430,000 invocations.
Here are some standout features and their practical benefits:
Feature | Benefit | Real-World Impact |
---|---|---|
Headless Browser | Enables server-side rendering with modern web capabilities | Handles dynamic content, JavaScript, and CSS with precision |
Resource Optimization | Caches assets and disables unused features to boost performance | Improves efficiency during PDF generation |
Error Handling | Includes retry mechanisms and timeout controls | Ensures reliability in production environments |
Scalability | Supports high-volume PDF generation | Proven performance under heavy workloads |
To make the most of Puppeteer, consider these steps for a successful deployment:
userDataDir
setting to cache resources and disable unused features to speed up PDF generation.
For an even smoother experience, you can integrate Puppeteer with platforms like Latenode to simplify workflows while maintaining top performance.