A low-code platform blending no-code simplicity with full-code power 🚀
Get started free
Overcoming CAPTCHA in Puppeteer Scripts: From reCAPTCHA to Recognition Services
March 26, 2025
•
8
min read

Overcoming CAPTCHA in Puppeteer Scripts: From reCAPTCHA to Recognition Services

George Miloradovich
Researcher, Copywriter & Usecase Interviewer
Table of contents

CAPTCHAs are designed to block bots, making automation with tools like Puppeteer challenging. This article explains how to bypass CAPTCHA issues, from stealth techniques to solving methods. Here's what you'll learn:

  • Types of CAPTCHAs: Text-based, image-based, reCAPTCHA, hCAPTCHA, and audio CAPTCHAs.
  • Avoiding Detection: Use Puppeteer-extra stealth plugins, manage browser fingerprints, and simulate human behavior (typing, mouse movement, scrolling).
  • Solving CAPTCHAs: Integrate services like 2Captcha or use OCR tools like Tesseract for image CAPTCHAs.
  • Improving Success Rates: Rotate IPs, handle errors with retries, and optimize resource usage.

Quick Comparison of CAPTCHA Types

CAPTCHA Type Description Challenges
Text-based Distorted text for recognition Hard to read complex text
Image-based Identify objects/patterns Requires visual processing
reCAPTCHA Google’s risk analysis system Detects bot-like behavior
hCAPTCHA Object identification tasks Similar to reCAPTCHA
Audio Sound-based tasks Complex speech recognition

Learn how these methods can help you streamline automation while avoiding detection and solving CAPTCHAs efficiently.

How to bypass reCAPTCHA with Puppeteer and Headless ...

reCAPTCHA

Bot Detection Prevention Methods

To bypass CAPTCHA challenges effectively, Puppeteer scripts need to behave in ways that mimic real human users. This includes using stealth techniques and natural behavior patterns.

Setting Up Puppeteer-extra Stealth

Using puppeteer-extra with its stealth plugin can help avoid bot detection. Here's how to set it up:

const puppeteer = require('puppeteer-extra')
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
puppeteer.use(StealthPlugin())

You can also enable specific evasion techniques:

puppeteer.use(StealthPlugin({
  enabledEvasions: new Set([
    "chrome.app",
    "chrome.csi",
    "defaultArgs",
    "navigator.plugins"
  ])
}))

The stealth plugin tackles common detection methods by:

  • Removing the navigator.webdriver property
  • Hiding indicators of headless Chrome
  • Adding Chrome App and CSI objects
  • Adjusting browser fingerprints

Browser Fingerprint Management

Browser fingerprinting is a key factor in bot detection. To create a convincing browser profile, focus on these areas:

Configuration Area Implementation Details Purpose
User Agent Rotate strings dynamically Hides automation markers
WebGL Support Enable hardware acceleration Mimics a standard browser setup
Viewport Settings Use random, realistic dimensions Matches common user setups
Language Headers Align with user agent locale Ensures consistency in the browser profile

In addition to static configurations, incorporating dynamic, human-like behaviors is critical.

Human Behavior Simulation

Simulating human behavior helps reduce detection risks. Here are some effective techniques:

  • Typing Patterns
    Introduce random delays between keystrokes (e.g., 50ms to 200ms) to mimic natural typing speeds and avoid automated input patterns.
  • Mouse Movement
    Use non-linear mouse paths with varied speeds. Small, random deviations can replicate human imperfections in cursor control.
  • Page Interaction
    Simulate realistic scrolling with variable speeds and pauses. Random viewport adjustments can emulate reading or scanning behavior.

Finally, launch the browser with arguments that reduce bot detection:

const browser = await puppeteer.launch({
  args: [
    '--disable-blink-features=AutomationControlled',
    '--window-size=1920,1080'
  ],
  headless: false
})

Solving reCAPTCHA with Puppeteer

Once stealth measures are in place, handling reCAPTCHA efficiently becomes essential for reliable automation. This builds on the stealth and behavior simulation techniques discussed earlier.

Using CAPTCHA Solving Services

One way to handle reCAPTCHA programmatically is by integrating CAPTCHA-solving services. When your script encounters a reCAPTCHA, it sends the required parameters to a solver service. The service processes the CAPTCHA and returns the solution, usually within 10–30 seconds.

Setting Up 2Captcha API with Puppeteer

2Captcha

2Captcha is a commonly used service for solving reCAPTCHAs. Here's how you can integrate it into your Puppeteer setup:

const puppeteer = require('puppeteer-extra')
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
const path = require('path')

// Configure solver extension
const extensionPath = path.join(__dirname, './2captcha-solver')
const apiKey = 'YOUR_2CAPTCHA_API_KEY'

// Launch browser with the solver extension
const browser = await puppeteer.launch({
    args: [
        `--disable-extensions-except=${extensionPath}`,
        `--load-extension=${extensionPath}`
    ],
    headless: false
})

Once the browser is set up, you can check the CAPTCHA solver's status:

// Wait for solver button and check status
await page.waitForSelector('.captcha-solver')
const solverButton = await page.$('.captcha-solver')
const state = await solverButton.getAttribute('data-state')

// Proceed when solved
if (state === 'solved') {
    await page.click('#submit-form')
}

Tips for Improving ReCAPTCHA Success Rates

To improve the chances of solving reCAPTCHAs effectively, follow these practices:

  • Use a pool of residential proxies to rotate IP addresses.
  • Add short delays between solving attempts to simulate natural user behavior.
  • Include error handling with exponential backoff retries.
  • Maintain browser context across attempts to avoid unnecessary reinitializations.

Here’s how you can integrate error handling into your CAPTCHA-solving process:

const solveCaptcha = async (page, maxRetries = 3) => {
    let attempts = 0
    while (attempts < maxRetries) {
        try {
            // Attempt CAPTCHA solution
            await page.click('.captcha-solver')
            await page.waitForSelector('[data-state="solved"]')
            return true
        } catch (error) {
            attempts++
            await page.waitForTimeout(2000 * attempts)
        }
    }
    return false
}

A case study from DataScrape Solutions highlights the effectiveness of these methods. In March 2024, their use of 2Captcha with Puppeteer achieved a 95% decrease in manual CAPTCHA-solving time and boosted data extraction rates by 60% when processing over 1 million CAPTCHAs monthly.

sbb-itb-23997f1

Image CAPTCHA Recognition Methods

Image CAPTCHAs are designed to challenge automated systems. However, with the right tools, OCR and image processing techniques can effectively solve these puzzles.

Types of Image CAPTCHAs

  • Text-based Images: These include distorted characters with varying fonts and complex backgrounds.
  • Object Recognition: Involves identifying specific objects from a set of options.
  • Pattern Matching: Requires users to match or identify visual patterns.

Now, let’s dive into OCR methods specifically designed for text-based CAPTCHAs.

Using OCR for CAPTCHA Text

Tesseract OCR is a powerful tool for recognizing text in images. Below is an example of how to integrate Tesseract OCR with Puppeteer to solve text-based CAPTCHAs:

const tesseract = require('node-tesseract-ocr')
const sharp = require('sharp')

async function solveCaptcha(imageBuffer) {
    // Preprocess the image to improve OCR performance
    const processedImage = await sharp(imageBuffer)
        .grayscale()
        .threshold(150)
        .toBuffer()

    const config = {
        lang: "eng",
        oem: 1,
        psm: 7,
    }

    return await tesseract.recognize(processedImage, config)
}

Fine-tuning image properties during preprocessing plays a crucial role in boosting recognition accuracy.

Improving Image Recognition Accuracy

Enhancing contrast and brightness can significantly improve OCR results. Here’s an example of adjusting these settings dynamically:

async function enhanceCaptchaRecognition(page) {
    return await page.evaluate(() => {
        const img = document.querySelector('.captcha-image')
        const canvas = document.createElement('canvas')
        const ctx = canvas.getContext('2d')

        ctx.filter = 'contrast(150%) brightness(120%)'
        ctx.drawImage(img, 0, 0)

        return canvas.toDataURL()
    })
}

For instance, a project targeting the Taiwan railway booking website achieved a 98.84% accuracy rate for single digits and an overall accuracy of 91.13%. Similarly, deep learning methods have proven effective for image-based CAPTCHAs. One TensorFlow-based model, leveraging a convolutional neural network, reached a 90% success rate. Experimenting with preprocessing techniques - like tweaking contrast, brightness, and thresholds - can further improve results based on the specific traits of each CAPTCHA type.

CAPTCHA Script Performance

Creating reliable CAPTCHA-solving scripts requires strong error handling, IP rotation, and performance tweaks. Once you've set up CAPTCHA-solving techniques, focusing on script efficiency is the next step.

Error Recovery Systems

Good error handling is key to keeping your script stable. Here's an example that retries on failure:

async function handleCaptchaSolution(page) {
    const MAX_RETRIES = 3;
    let attempts = 0;

    while (attempts < MAX_RETRIES) {
        try {
            await page.setDefaultNavigationTimeout(30000);
            const result = await solveCaptcha(page);
            return result;
        } catch (error) {
            if (error instanceof TimeoutError) {
                console.error(`Attempt ${attempts + 1}: CAPTCHA timeout`);
            } else if (error instanceof NetworkError) {
                console.error(`Attempt ${attempts + 1}: Network failure`);
            }
            attempts++;
            await page.waitForTimeout(2000 * attempts);
        }
    }
    throw new Error('Maximum retry attempts exceeded');
}

This approach handles timeouts and network issues with incremental retries, ensuring your script doesn't crash unexpectedly.

IP and Browser Rotation

Rotating IPs and browser fingerprints helps avoid detection. Here's how you can use puppeteer-extra plugins for this purpose:

const puppeteer = require('puppeteer-extra');
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
const AnonymizeUAPlugin = require('puppeteer-extra-plugin-anonymize-ua');

puppeteer.use(StealthPlugin());
puppeteer.use(AnonymizeUAPlugin());

async function rotateIdentity() {
    const proxy = await getNextProxy(); // Your proxy rotation logic
    const browser = await puppeteer.launch({
        args: [`--proxy-server=${proxy.host}:${proxy.port}`]
    });
    return browser;
}

By rotating IPs and HTTP headers, your script mimics natural browsing behavior, reducing the chances of being flagged.

Performance Optimization

Boost your script's efficiency and success rate with the following techniques:

  • Resource Management
    Stop unnecessary resource downloads like images, stylesheets, or fonts:
await page.setRequestInterception(true);
page.on('request', (request) => {
    if (['image', 'stylesheet', 'font'].includes(request.resourceType())) {
        request.abort();
    } else {
        request.continue();
    }
});
  • Parallel Processing
    Use puppeteer-cluster to solve multiple CAPTCHAs at the same time:
const { Cluster } = require('puppeteer-cluster');

const cluster = await Cluster.launch({
    concurrency: Cluster.CONCURRENCY_CONTEXT,
    maxConcurrency: 4,
    monitor: true
});

await cluster.task(async ({ page, data: url }) => {
    await handleCaptchaSolution(page);
});
  • Caching Strategy
    Cache responses to avoid redundant requests and save processing time:
const cache = new Map();

async function getCachedResponse(url) {
    if (cache.has(url)) {
        const { timestamp, data } = cache.get(url);
        if (Date.now() - timestamp < 3600000) { // 1-hour cache
            return data;
        }
    }
    const response = await fetchResponse(url);
    cache.set(url, { timestamp: Date.now(), data: response });
    return response;
}

These methods work together to reduce resource usage, improve speed, and handle multiple tasks efficiently.

Conclusion and Implementation Guide

CAPTCHA Solution Overview

Handling CAPTCHAs effectively involves a layered strategy focused on prevention. By using tools like stealth techniques, optimized headers, and rotating IPs, you can reduce the chances of CAPTCHAs being triggered in the first place. Prevention is always better than solving them reactively.

Latenode Browser Automation

Latenode

Latenode makes CAPTCHA management easier with built-in features like stealth mode, proxy rotation, and cookie handling.

Here's an example of how you can set it up:

const workflow = new LatenodeWorkflow({ browserOptions: { stealth: true, proxyRotation: true, cookieManagement: true } });

await workflow.initBrowser({ captchaHandling: { prevention: true, autoRetry: true, maxAttempts: 3 } });

Next Steps for Implementation

To enhance your automation workflow, consider these steps:

  • Enable Stealth Mode
    Use Puppeteer-extra stealth plugins to lower the chances of triggering CAPTCHAs.
  • Set Up Error Recovery
    Add error recovery mechanisms to handle different CAPTCHA types. Use automatic retries with strategies like exponential backoff for smoother operation.
  • Improve Resource Efficiency
    Reduce script execution time by selectively loading resources and using caching, ensuring better performance without sacrificing success rates.

Related posts

Related Blogs

Use case

Backed by