PRICING
PRODUCT
SOLUTIONS
by use cases
AI Lead ManagementInvoicingSocial MediaProject ManagementData Managementby Industry
learn more
BlogTemplatesVideosYoutubeRESOURCES
COMMUNITIES AND SOCIAL MEDIA
PARTNERS
CAPTCHAs are designed to block bots, making automation with tools like Puppeteer challenging. This article explains how to bypass CAPTCHA issues, from stealth techniques to solving methods. Here's what you'll learn:
CAPTCHA Type | Description | Challenges |
---|---|---|
Text-based | Distorted text for recognition | Hard to read complex text |
Image-based | Identify objects/patterns | Requires visual processing |
reCAPTCHA | Google’s risk analysis system | Detects bot-like behavior |
hCAPTCHA | Object identification tasks | Similar to reCAPTCHA |
Audio | Sound-based tasks | Complex speech recognition |
Learn how these methods can help you streamline automation while avoiding detection and solving CAPTCHAs efficiently.
To bypass CAPTCHA challenges effectively, Puppeteer scripts need to behave in ways that mimic real human users. This includes using stealth techniques and natural behavior patterns.
Using puppeteer-extra
with its stealth plugin can help avoid bot detection. Here's how to set it up:
const puppeteer = require('puppeteer-extra')
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
puppeteer.use(StealthPlugin())
You can also enable specific evasion techniques:
puppeteer.use(StealthPlugin({
enabledEvasions: new Set([
"chrome.app",
"chrome.csi",
"defaultArgs",
"navigator.plugins"
])
}))
The stealth plugin tackles common detection methods by:
navigator.webdriver
propertyBrowser fingerprinting is a key factor in bot detection. To create a convincing browser profile, focus on these areas:
Configuration Area | Implementation Details | Purpose |
---|---|---|
User Agent | Rotate strings dynamically | Hides automation markers |
WebGL Support | Enable hardware acceleration | Mimics a standard browser setup |
Viewport Settings | Use random, realistic dimensions | Matches common user setups |
Language Headers | Align with user agent locale | Ensures consistency in the browser profile |
In addition to static configurations, incorporating dynamic, human-like behaviors is critical.
Simulating human behavior helps reduce detection risks. Here are some effective techniques:
Finally, launch the browser with arguments that reduce bot detection:
const browser = await puppeteer.launch({
args: [
'--disable-blink-features=AutomationControlled',
'--window-size=1920,1080'
],
headless: false
})
Once stealth measures are in place, handling reCAPTCHA efficiently becomes essential for reliable automation. This builds on the stealth and behavior simulation techniques discussed earlier.
One way to handle reCAPTCHA programmatically is by integrating CAPTCHA-solving services. When your script encounters a reCAPTCHA, it sends the required parameters to a solver service. The service processes the CAPTCHA and returns the solution, usually within 10–30 seconds.
2Captcha is a commonly used service for solving reCAPTCHAs. Here's how you can integrate it into your Puppeteer setup:
const puppeteer = require('puppeteer-extra')
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
const path = require('path')
// Configure solver extension
const extensionPath = path.join(__dirname, './2captcha-solver')
const apiKey = 'YOUR_2CAPTCHA_API_KEY'
// Launch browser with the solver extension
const browser = await puppeteer.launch({
args: [
`--disable-extensions-except=${extensionPath}`,
`--load-extension=${extensionPath}`
],
headless: false
})
Once the browser is set up, you can check the CAPTCHA solver's status:
// Wait for solver button and check status
await page.waitForSelector('.captcha-solver')
const solverButton = await page.$('.captcha-solver')
const state = await solverButton.getAttribute('data-state')
// Proceed when solved
if (state === 'solved') {
await page.click('#submit-form')
}
To improve the chances of solving reCAPTCHAs effectively, follow these practices:
Here’s how you can integrate error handling into your CAPTCHA-solving process:
const solveCaptcha = async (page, maxRetries = 3) => {
let attempts = 0
while (attempts < maxRetries) {
try {
// Attempt CAPTCHA solution
await page.click('.captcha-solver')
await page.waitForSelector('[data-state="solved"]')
return true
} catch (error) {
attempts++
await page.waitForTimeout(2000 * attempts)
}
}
return false
}
A case study from DataScrape Solutions highlights the effectiveness of these methods. In March 2024, their use of 2Captcha with Puppeteer achieved a 95% decrease in manual CAPTCHA-solving time and boosted data extraction rates by 60% when processing over 1 million CAPTCHAs monthly.
Image CAPTCHAs are designed to challenge automated systems. However, with the right tools, OCR and image processing techniques can effectively solve these puzzles.
Now, let’s dive into OCR methods specifically designed for text-based CAPTCHAs.
Tesseract OCR is a powerful tool for recognizing text in images. Below is an example of how to integrate Tesseract OCR with Puppeteer to solve text-based CAPTCHAs:
const tesseract = require('node-tesseract-ocr')
const sharp = require('sharp')
async function solveCaptcha(imageBuffer) {
// Preprocess the image to improve OCR performance
const processedImage = await sharp(imageBuffer)
.grayscale()
.threshold(150)
.toBuffer()
const config = {
lang: "eng",
oem: 1,
psm: 7,
}
return await tesseract.recognize(processedImage, config)
}
Fine-tuning image properties during preprocessing plays a crucial role in boosting recognition accuracy.
Enhancing contrast and brightness can significantly improve OCR results. Here’s an example of adjusting these settings dynamically:
async function enhanceCaptchaRecognition(page) {
return await page.evaluate(() => {
const img = document.querySelector('.captcha-image')
const canvas = document.createElement('canvas')
const ctx = canvas.getContext('2d')
ctx.filter = 'contrast(150%) brightness(120%)'
ctx.drawImage(img, 0, 0)
return canvas.toDataURL()
})
}
For instance, a project targeting the Taiwan railway booking website achieved a 98.84% accuracy rate for single digits and an overall accuracy of 91.13%. Similarly, deep learning methods have proven effective for image-based CAPTCHAs. One TensorFlow-based model, leveraging a convolutional neural network, reached a 90% success rate. Experimenting with preprocessing techniques - like tweaking contrast, brightness, and thresholds - can further improve results based on the specific traits of each CAPTCHA type.
Creating reliable CAPTCHA-solving scripts requires strong error handling, IP rotation, and performance tweaks. Once you've set up CAPTCHA-solving techniques, focusing on script efficiency is the next step.
Good error handling is key to keeping your script stable. Here's an example that retries on failure:
async function handleCaptchaSolution(page) {
const MAX_RETRIES = 3;
let attempts = 0;
while (attempts < MAX_RETRIES) {
try {
await page.setDefaultNavigationTimeout(30000);
const result = await solveCaptcha(page);
return result;
} catch (error) {
if (error instanceof TimeoutError) {
console.error(`Attempt ${attempts + 1}: CAPTCHA timeout`);
} else if (error instanceof NetworkError) {
console.error(`Attempt ${attempts + 1}: Network failure`);
}
attempts++;
await page.waitForTimeout(2000 * attempts);
}
}
throw new Error('Maximum retry attempts exceeded');
}
This approach handles timeouts and network issues with incremental retries, ensuring your script doesn't crash unexpectedly.
Rotating IPs and browser fingerprints helps avoid detection. Here's how you can use puppeteer-extra plugins for this purpose:
const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
const AnonymizeUAPlugin = require('puppeteer-extra-plugin-anonymize-ua');
puppeteer.use(StealthPlugin());
puppeteer.use(AnonymizeUAPlugin());
async function rotateIdentity() {
const proxy = await getNextProxy(); // Your proxy rotation logic
const browser = await puppeteer.launch({
args: [`--proxy-server=${proxy.host}:${proxy.port}`]
});
return browser;
}
By rotating IPs and HTTP headers, your script mimics natural browsing behavior, reducing the chances of being flagged.
Boost your script's efficiency and success rate with the following techniques:
await page.setRequestInterception(true);
page.on('request', (request) => {
if (['image', 'stylesheet', 'font'].includes(request.resourceType())) {
request.abort();
} else {
request.continue();
}
});
const { Cluster } = require('puppeteer-cluster');
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 4,
monitor: true
});
await cluster.task(async ({ page, data: url }) => {
await handleCaptchaSolution(page);
});
const cache = new Map();
async function getCachedResponse(url) {
if (cache.has(url)) {
const { timestamp, data } = cache.get(url);
if (Date.now() - timestamp < 3600000) { // 1-hour cache
return data;
}
}
const response = await fetchResponse(url);
cache.set(url, { timestamp: Date.now(), data: response });
return response;
}
These methods work together to reduce resource usage, improve speed, and handle multiple tasks efficiently.
Handling CAPTCHAs effectively involves a layered strategy focused on prevention. By using tools like stealth techniques, optimized headers, and rotating IPs, you can reduce the chances of CAPTCHAs being triggered in the first place. Prevention is always better than solving them reactively.
Latenode makes CAPTCHA management easier with built-in features like stealth mode, proxy rotation, and cookie handling.
Here's an example of how you can set it up:
const workflow = new LatenodeWorkflow({ browserOptions: { stealth: true, proxyRotation: true, cookieManagement: true } });
await workflow.initBrowser({ captchaHandling: { prevention: true, autoRetry: true, maxAttempts: 3 } });
To enhance your automation workflow, consider these steps: