PRICING
PRODUCT
SOLUTIONS
by use cases
AI Lead ManagementInvoicingSocial MediaProject ManagementData Managementby Industry
learn more
BlogTemplatesVideosYoutubeRESOURCES
COMMUNITIES AND SOCIAL MEDIA
PARTNERS
Proxies are essential for Puppeteer automation. They hide your IP, bypass geo-restrictions, and help avoid bans. Here's what you'll learn about using proxies in Puppeteer:
Proxy Type | Speed | Anonymity | Cost | Best For |
---|---|---|---|---|
Datacenter | Fast | Low | Affordable | High-volume scraping |
Residential | Moderate | High | Expensive | Sensitive or geo-specific tasks |
Mobile | Slow | Very High | Very Expensive | Ad verification, CAPTCHAs |
Setting up a proxy is a key step to optimize your Puppeteer automation. Here's how you can get started.
You can configure a proxy server during the browser launch in Puppeteer like this:
const browser = await puppeteer.launch({
args: ['--proxy-server=http://157.230.255.230:8118']
});
const page = await browser.newPage();
It's also important to handle errors when navigating with a proxy:
try {
await page.goto('https://books.toscrape.com/', {
timeout: 70000,
waitUntil: 'networkidle2'
});
} catch (err) {
console.log(err);
}
Next, you'll want to secure your connection by setting up proxy authentication.
If your proxy requires authentication, you can use the following method:
await page.authenticate({
username: 'your_proxy_username',
password: 'your_proxy_password'
});
Here's a quick comparison of different authentication methods:
Authentication Method | Best Use Case | Implementation Complexity |
---|---|---|
page.authenticate() |
Standard proxy auth | Low |
proxy-chain package |
Advanced routing | Medium |
Proxy-Authorization header |
HTTP-only sites | High |
Once you've set up your proxy and authentication, it's time to test it.
Make sure your proxy is working by following these steps:
Here's an example script to test your proxy:
async function testProxy() {
const browser = await puppeteer.launch({
args: ['--proxy-server=http://157.230.255.230:8118']
});
const page = await browser.newPage();
try {
await page.goto('https://books.toscrape.com/', {
timeout: 70000,
waitUntil: 'networkidle2'
});
const content = await page.content();
console.log('Proxy connection successful');
} catch (err) {
console.log('Proxy connection failed:', err);
}
await browser.close();
}
Testing ensures your proxy configuration is ready for use in your automation tasks.
Managing proxy rotation is essential to prevent IP bans and avoid rate limiting while scraping or automating tasks online.
To start, create a list of proxies that can be rotated efficiently. Here's an example of how to set up and use a proxy pool:
const proxyList = [
{
host: '157.230.255.230',
port: '8118',
username: 'user1',
password: 'pass1'
},
// Add more proxy configurations
];
function getRandomProxy() {
return proxyList[Math.floor(Math.random() * proxyList.length)];
}
When setting up your proxy list, it's important to understand the different types of proxies and their trade-offs:
Proxy Type | Response Time | Success Rate | Cost Range |
---|---|---|---|
Datacenter | 100-300ms | 85-95% | $0.50-$2/IP |
Residential | 500-1000ms | 90-98% | $2-$8/IP |
Mobile | 800-1500ms | 95-99% | $5-$15/IP |
Each type has its strengths and weaknesses, so choose based on your specific needs, such as speed, reliability, or budget.
To rotate proxies automatically, you can use the following setup:
const ProxyChain = require('proxy-chain');
const server = new ProxyChain.Server({
port: 8080,
prepareRequestFunction: ({ request }) => {
const proxy = getRandomProxy();
return {
upstreamProxyUrl: `http://${proxy.username}:${proxy.password}@${proxy.host}:${proxy.port}`
};
}
});
const browser = await puppeteer.launch({
args: ['--proxy-server=http://localhost:8080']
});
This approach ensures that each request gets routed through a different proxy without restarting the browser, making the process more efficient.
Here are some key practices to improve your proxy rotation strategy:
1. Timing Strategy
Introduce random delays between requests to simulate natural browsing behavior:
function getRotationDelay() {
// Random delay between 2-5 seconds
return Math.floor(Math.random() * 3000) + 2000;
}
async function rotateProxy() {
await new Promise(resolve => setTimeout(resolve, getRotationDelay()));
const proxy = getRandomProxy();
// Apply new proxy configuration
}
2. Error Handling
If a proxy fails, handle the error gracefully by switching to a new one and retrying the request:
async function handleProxyError(error, page) {
console.log(`Proxy error: ${error.message}`);
await rotateProxy();
return page.reload({ waitUntil: 'networkidle2' });
}
3. Session Management
For tasks requiring session consistency, use the same proxy for related requests:
const sessionMap = new Map();
function getProxyForSession(sessionId) {
if (!sessionMap.has(sessionId)) {
sessionMap.set(sessionId, getRandomProxy());
}
return sessionMap.get(sessionId);
}
With these strategies, you can efficiently manage proxies and prepare for more advanced challenges like handling blocks, CAPTCHAs, and other obstacles.
Selecting the right proxy is key when working with Puppeteer automation. Here’s a comparison to help you decide:
Feature | Residential Proxies | Datacenter Proxies |
---|---|---|
Source | Provided by ISPs to residential properties | Sourced from data centers |
Speed | Slower due to ISP routing | Faster with dedicated infrastructure |
Detection Risk | Lower - mimics real user traffic | Higher - easier to detect |
Cost | More expensive | Cost-effective |
Geographic Coverage | Broad, across multiple locations | Often limited |
Best Use Cases | High-anonymity tasks, sensitive scraping, ad verification | High-volume data mining, batch processing |
Residential proxies are a solid choice for tasks that demand high anonymity, as they closely resemble real user activity.
To tackle website blocks and CAPTCHA challenges, use proxy rotation and retry mechanisms. Here’s an example of how to manage these issues:
const proxyManager = {
rotationDelay: 2000,
async handleBlock(page) {
await page.waitForTimeout(this.rotationDelay);
const newProxy = await this.getNextProxy();
await page.authenticate(newProxy);
return page.reload({ waitUntil: 'networkidle0' });
},
async detectBlock(response) {
return response.status() === 403 || response.status() === 429;
}
};
When a block is detected, retrying with a new proxy can keep your automation running smoothly:
page.on('response', async (response) => {
if (await proxyManager.detectBlock(response)) {
await proxyManager.handleBlock(page);
}
});
These strategies help maintain workflow continuity, even when encountering challenges.
Handling proxy errors effectively requires timeout adjustments, validation checks, and backup proxies. For example:
async function handleProxyError(error, page) {
if (error.message.includes('net::ERR_PROXY_CONNECTION_FAILED')) {
const newProxy = await getBackupProxy();
await page.authenticate(newProxy);
await page.setExtraHTTPHeaders({
'User-Agent': generateRandomUserAgent()
});
return page.reload({ waitUntil: 'networkidle2' });
}
}
Follow these guidelines for better error management:
Configuring your browser correctly also plays a role in ensuring seamless performance:
const browser = await puppeteer.launch({
args: ['--proxy-server=http://proxy-server:port'],
timeout: 60000,
waitUntil: 'networkidle0'
});
Lastly, always validate your proxies to ensure they’re functional:
async function validateProxy(proxyConfig) {
try {
const response = await fetch('https://api.ipify.org?format=json', {
proxy: `http://${proxyConfig.host}:${proxyConfig.port}`
});
return response.ok;
} catch (error) {
console.error(`Proxy validation failed: ${error.message}`);
return false;
}
}
Want to get the most out of Puppeteer? These tips can help fine-tune your proxy setup and improve overall performance.
Cut down on loading times by blocking unnecessary requests like images, fonts, and stylesheets:
await page.setRequestInterception(true);
page.on('request', (req) => {
if (req.resourceType() === 'stylesheet' || req.resourceType() === 'font' || req.resourceType() === 'image') {
req.abort();
} else {
req.continue();
}
});
Pair page.goto
with page.waitForSelector
for quicker navigation:
await page.goto(url, {
timeout: 30000,
waitUntil: 'domcontentloaded'
});
await page.waitForSelector('.main-content');
"When optimizing Puppeteer, remember that there are only so many ways to speed up the startup/shutdown performance of Puppeteer itself. Most likely, the biggest gains will come from getting your target pages to render faster."
– Jon Yongfook, Founder, Bannerbear
Distribute traffic across multiple proxies using Puppeteer Cluster:
const proxyCluster = {
async distributeLoad(urls, proxyList) {
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 5
});
await cluster.task(async ({ page, data: url }) => {
const proxy = proxyList.getNext();
await page.authenticate(proxy);
await page.goto(url);
});
}
};
This approach balances traffic across proxies, complementing earlier rotation methods to avoid overloading any single proxy.
Here are some common challenges and how to handle them effectively:
const performRequest = async (page, url) => {
try {
await page.goto(url, {
timeout: 30000,
waitUntil: 'networkidle2'
});
} catch (error) {
if (error.name === 'TimeoutError') {
await page.evaluate(() => window.stop());
return handleTimeout();
}
throw error;
}
};
const checkRateLimit = async (response) => {
if (response.status() === 429 || response.headers()['retry-after']) {
await handleRateLimit(response.headers()['retry-after']);
}
};
const handleAuthError = async (response, page, proxy) => {
if (response.status() === 407) {
await retryWithNewCredentials(page, proxy);
return true;
}
return false;
};
"By optimizing your Puppeteer script, you can ensure smooth and efficient operation with accurate and consistent results."
– ScrapeOps
Keep an eye on response times and success rates to ensure your proxies remain efficient and undetected.
Setting up Puppeteer with proxies involves a few important steps. For private proxies, use the authenticate
function to provide credentials:
await page.authenticate({
username: 'proxyUser',
password: 'proxyPass'
});
To ensure the page fully loads, use the waitUntil
option (e.g., 'networkidle2'
) and set a timeout (e.g., 30 seconds):
await page.goto(url, {
waitUntil: 'networkidle2',
timeout: 30000
});
These configurations form the foundation of a reliable proxy setup.
To maintain secure and efficient proxy operations, follow these strategies:
Connection Management
Security Optimization
Here’s a quick summary of important considerations for proxy implementation:
Aspect | Implementation | Benefit |
---|---|---|
Authentication | Use authenticate function |
Secures access to private proxies |
Load Management | Enable request interception | Reduces bandwidth and speeds up loading |
Error Handling | Implement try-catch blocks |
Manages connection issues gracefully |
Identity Protection | Rotate proxies and user agents | Improves success rates and avoids blocks |