Overcoming CAPTCHA in Puppeteer Scripts: From reCAPTCHA to Recognition Services
Learn effective strategies to bypass CAPTCHAs in Puppeteer scripts, including stealth techniques and solving methods for better automation.

CAPTCHAs are designed to block bots, making automation with tools like Puppeteer challenging. This article explains how to bypass CAPTCHA issues, from stealth techniques to solving methods. Here's what you'll learn:
- Types of CAPTCHAs: Text-based, image-based, reCAPTCHA, hCAPTCHA, and audio CAPTCHAs.
- Avoiding Detection: Use Puppeteer-extra stealth plugins, manage browser fingerprints, and simulate human behavior (typing, mouse movement, scrolling).
- Solving CAPTCHAs: Integrate services like 2Captcha or use OCR tools like Tesseract for image CAPTCHAs.
- Improving Success Rates: Rotate IPs, handle errors with retries, and optimize resource usage.
Quick Comparison of CAPTCHA Types
| CAPTCHA Type | Description | Challenges |
|---|---|---|
| Text-based | Distorted text for recognition | Hard to read complex text |
| Image-based | Identify objects/patterns | Requires visual processing |
| reCAPTCHA | Google’s risk analysis system | Detects bot-like behavior |
| hCAPTCHA | Object identification tasks | Similar to reCAPTCHA |
| Audio | Sound-based tasks | Complex speech recognition |
Learn how these methods can help you streamline automation while avoiding detection and solving CAPTCHAs efficiently.
How to bypass reCAPTCHA with Puppeteer and Headless ...
Bot Detection Prevention Methods
To bypass CAPTCHA challenges effectively, Puppeteer scripts need to behave in ways that mimic real human users. This includes using stealth techniques and natural behavior patterns.
Setting Up Puppeteer-extra Stealth
Using puppeteer-extra with its stealth plugin can help avoid bot detection. Here's how to set it up:
<span class="hljs-keyword">const</span> puppeteer = <span class="hljs-built_in">require</span>(<span class="hljs-string">'puppeteer-extra'</span>)
<span class="hljs-keyword">const</span> <span class="hljs-title class_">StealthPlugin</span> = <span class="hljs-built_in">require</span>(<span class="hljs-string">'puppeteer-extra-plugin-stealth'</span>)
puppeteer.<span class="hljs-title function_">use</span>(<span class="hljs-title class_">StealthPlugin</span>())
You can also enable specific evasion techniques:
puppeteer.<span class="hljs-title function_">use</span>(<span class="hljs-title class_">StealthPlugin</span>({
<span class="hljs-attr">enabledEvasions</span>: <span class="hljs-keyword">new</span> <span class="hljs-title class_">Set</span>([
<span class="hljs-string">"chrome.app"</span>,
<span class="hljs-string">"chrome.csi"</span>,
<span class="hljs-string">"defaultArgs"</span>,
<span class="hljs-string">"navigator.plugins"</span>
])
}))
The stealth plugin tackles common detection methods by:
- Removing the
navigator.webdriverproperty - Hiding indicators of headless Chrome
- Adding Chrome App and CSI objects
- Adjusting browser fingerprints
Browser Fingerprint Management
Browser fingerprinting is a key factor in bot detection. To create a convincing browser profile, focus on these areas:
| Configuration Area | Implementation Details | Purpose |
|---|---|---|
| User Agent | Rotate strings dynamically | Hides automation markers |
| WebGL Support | Enable hardware acceleration | Mimics a standard browser setup |
| Viewport Settings | Use random, realistic dimensions | Matches common user setups |
| Language Headers | Align with user agent locale | Ensures consistency in the browser profile |
In addition to static configurations, incorporating dynamic, human-like behaviors is critical.
Human Behavior Simulation
Simulating human behavior helps reduce detection risks. Here are some effective techniques:
- Typing Patterns
Introduce random delays between keystrokes (e.g., 50ms to 200ms) to mimic natural typing speeds and avoid automated input patterns. - Mouse Movement
Use non-linear mouse paths with varied speeds. Small, random deviations can replicate human imperfections in cursor control. - Page Interaction
Simulate realistic scrolling with variable speeds and pauses. Random viewport adjustments can emulate reading or scanning behavior.
Finally, launch the browser with arguments that reduce bot detection:
<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> puppeteer.<span class="hljs-title function_">launch</span>({
<span class="hljs-attr">args</span>: [
<span class="hljs-string">'--disable-blink-features=AutomationControlled'</span>,
<span class="hljs-string">'--window-size=1920,1080'</span>
],
<span class="hljs-attr">headless</span>: <span class="hljs-literal">false</span>
})
Solving reCAPTCHA with Puppeteer
Once stealth measures are in place, handling reCAPTCHA efficiently becomes essential for reliable automation. This builds on the stealth and behavior simulation techniques discussed earlier.
Using CAPTCHA Solving Services
One way to handle reCAPTCHA programmatically is by integrating CAPTCHA-solving services. When your script encounters a reCAPTCHA, it sends the required parameters to a solver service. The service processes the CAPTCHA and returns the solution, usually within 10–30 seconds.
Setting Up 2Captcha API with Puppeteer
2Captcha is a commonly used service for solving reCAPTCHAs. Here's how you can integrate it into your Puppeteer setup:
<span class="hljs-keyword">const</span> puppeteer = <span class="hljs-built_in">require</span>(<span class="hljs-string">'puppeteer-extra'</span>)
<span class="hljs-keyword">const</span> <span class="hljs-title class_">StealthPlugin</span> = <span class="hljs-built_in">require</span>(<span class="hljs-string">'puppeteer-extra-plugin-stealth'</span>)
<span class="hljs-keyword">const</span> path = <span class="hljs-built_in">require</span>(<span class="hljs-string">'path'</span>)
<span class="hljs-comment">// Configure solver extension</span>
<span class="hljs-keyword">const</span> extensionPath = path.<span class="hljs-title function_">join</span>(__dirname, <span class="hljs-string">'./2captcha-solver'</span>)
<span class="hljs-keyword">const</span> apiKey = <span class="hljs-string">'YOUR_2CAPTCHA_API_KEY'</span>
<span class="hljs-comment">// Launch browser with the solver extension</span>
<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> puppeteer.<span class="hljs-title function_">launch</span>({
<span class="hljs-attr">args</span>: [
<span class="hljs-string">`--disable-extensions-except=<span class="hljs-subst">${extensionPath}</span>`</span>,
<span class="hljs-string">`--load-extension=<span class="hljs-subst">${extensionPath}</span>`</span>
],
<span class="hljs-attr">headless</span>: <span class="hljs-literal">false</span>
})
Once the browser is set up, you can check the CAPTCHA solver's status:
<span class="hljs-comment">// Wait for solver button and check status</span>
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">waitForSelector</span>(<span class="hljs-string">'.captcha-solver'</span>)
<span class="hljs-keyword">const</span> solverButton = <span class="hljs-keyword">await</span> page.$(<span class="hljs-string">'.captcha-solver'</span>)
<span class="hljs-keyword">const</span> state = <span class="hljs-keyword">await</span> solverButton.<span class="hljs-title function_">getAttribute</span>(<span class="hljs-string">'data-state'</span>)
<span class="hljs-comment">// Proceed when solved</span>
<span class="hljs-keyword">if</span> (state === <span class="hljs-string">'solved'</span>) {
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">click</span>(<span class="hljs-string">'#submit-form'</span>)
}
Tips for Improving ReCAPTCHA Success Rates
To improve the chances of solving reCAPTCHAs effectively, follow these practices:
- Use a pool of residential proxies to rotate IP addresses.
- Add short delays between solving attempts to simulate natural user behavior.
- Include error handling with exponential backoff retries.
- Maintain browser context across attempts to avoid unnecessary reinitializations.
Here’s how you can integrate error handling into your CAPTCHA-solving process:
<span class="hljs-keyword">const</span> <span class="hljs-title function_">solveCaptcha</span> = <span class="hljs-keyword">async</span> (<span class="hljs-params">page, maxRetries = <span class="hljs-number">3</span></span>) => {
<span class="hljs-keyword">let</span> attempts = <span class="hljs-number">0</span>
<span class="hljs-keyword">while</span> (attempts < maxRetries) {
<span class="hljs-keyword">try</span> {
<span class="hljs-comment">// Attempt CAPTCHA solution</span>
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">click</span>(<span class="hljs-string">'.captcha-solver'</span>)
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">waitForSelector</span>(<span class="hljs-string">'[data-state="solved"]'</span>)
<span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>
} <span class="hljs-keyword">catch</span> (error) {
attempts++
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">waitForTimeout</span>(<span class="hljs-number">2000</span> * attempts)
}
}
<span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>
}
A case study from DataScrape Solutions highlights the effectiveness of these methods. In March 2024, their use of 2Captcha with Puppeteer achieved a 95% decrease in manual CAPTCHA-solving time and boosted data extraction rates by 60% when processing over 1 million CAPTCHAs monthly [2].
sbb-itb-23997f1
Image CAPTCHA Recognition Methods
Image CAPTCHAs are designed to challenge automated systems. However, with the right tools, OCR and image processing techniques can effectively solve these puzzles.
Types of Image CAPTCHAs
- Text-based Images: These include distorted characters with varying fonts and complex backgrounds.
- Object Recognition: Involves identifying specific objects from a set of options.
- Pattern Matching: Requires users to match or identify visual patterns.
Now, let’s dive into OCR methods specifically designed for text-based CAPTCHAs.
Using OCR for CAPTCHA Text
Tesseract OCR is a powerful tool for recognizing text in images. Below is an example of how to integrate Tesseract OCR with Puppeteer to solve text-based CAPTCHAs:
<span class="hljs-keyword">const</span> tesseract = <span class="hljs-built_in">require</span>(<span class="hljs-string">'node-tesseract-ocr'</span>)
<span class="hljs-keyword">const</span> sharp = <span class="hljs-built_in">require</span>(<span class="hljs-string">'sharp'</span>)
<span class="hljs-keyword">async</span> <span class="hljs-keyword">function</span> <span class="hljs-title function_">solveCaptcha</span>(<span class="hljs-params">imageBuffer</span>) {
<span class="hljs-comment">// Preprocess the image to improve OCR performance</span>
<span class="hljs-keyword">const</span> processedImage = <span class="hljs-keyword">await</span> <span class="hljs-title function_">sharp</span>(imageBuffer)
.<span class="hljs-title function_">grayscale</span>()
.<span class="hljs-title function_">threshold</span>(<span class="hljs-number">150</span>)
.<span class="hljs-title function_">toBuffer</span>()
<span class="hljs-keyword">const</span> config = {
<span class="hljs-attr">lang</span>: <span class="hljs-string">"eng"</span>,
<span class="hljs-attr">oem</span>: <span class="hljs-number">1</span>,
<span class="hljs-attr">psm</span>: <span class="hljs-number">7</span>,
}
<span class="hljs-keyword">return</span> <span class="hljs-keyword">await</span> tesseract.<span class="hljs-title function_">recognize</span>(processedImage, config)
}
Fine-tuning image properties during preprocessing plays a crucial role in boosting recognition accuracy.
Improving Image Recognition Accuracy
Enhancing contrast and brightness can significantly improve OCR results. Here’s an example of adjusting these settings dynamically:
<span class="hljs-keyword">async</span> <span class="hljs-keyword">function</span> <span class="hljs-title function_">enhanceCaptchaRecognition</span>(<span class="hljs-params">page</span>) {
<span class="hljs-keyword">return</span> <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">evaluate</span>(<span class="hljs-function">() =></span> {
<span class="hljs-keyword">const</span> img = <span class="hljs-variable language_">document</span>.<span class="hljs-title function_">querySelector</span>(<span class="hljs-string">'.captcha-image'</span>)
<span class="hljs-keyword">const</span> canvas = <span class="hljs-variable language_">document</span>.<span class="hljs-title function_">createElement</span>(<span class="hljs-string">'canvas'</span>)
<span class="hljs-keyword">const</span> ctx = canvas.<span class="hljs-title function_">getContext</span>(<span class="hljs-string">'2d'</span>)
ctx.<span class="hljs-property">filter</span> = <span class="hljs-string">'contrast(150%) brightness(120%)'</span>
ctx.<span class="hljs-title function_">drawImage</span>(img, <span class="hljs-number">0</span>, <span class="hljs-number">0</span>)
<span class="hljs-keyword">return</span> canvas.<span class="hljs-title function_">toDataURL</span>()
})
}
For instance, a project targeting the Taiwan railway booking website achieved a 98.84% accuracy rate for single digits and an overall accuracy of 91.13% [1]. Similarly, deep learning methods have proven effective for image-based CAPTCHAs. One TensorFlow-based model, leveraging a convolutional neural network, reached a 90% success rate [1]. Experimenting with preprocessing techniques - like tweaking contrast, brightness, and thresholds - can further improve results based on the specific traits of each CAPTCHA type.
CAPTCHA Script Performance
Creating reliable CAPTCHA-solving scripts requires strong error handling, IP rotation, and performance tweaks. Once you've set up CAPTCHA-solving techniques, focusing on script efficiency is the next step.
Error Recovery Systems
Good error handling is key to keeping your script stable. Here's an example that retries on failure:
<span class="hljs-keyword">async</span> <span class="hljs-keyword">function</span> <span class="hljs-title function_">handleCaptchaSolution</span>(<span class="hljs-params">page</span>) {
<span class="hljs-keyword">const</span> <span class="hljs-variable constant_">MAX_RETRIES</span> = <span class="hljs-number">3</span>;
<span class="hljs-keyword">let</span> attempts = <span class="hljs-number">0</span>;
<span class="hljs-keyword">while</span> (attempts < <span class="hljs-variable constant_">MAX_RETRIES</span>) {
<span class="hljs-keyword">try</span> {
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">setDefaultNavigationTimeout</span>(<span class="hljs-number">30000</span>);
<span class="hljs-keyword">const</span> result = <span class="hljs-keyword">await</span> <span class="hljs-title function_">solveCaptcha</span>(page);
<span class="hljs-keyword">return</span> result;
} <span class="hljs-keyword">catch</span> (error) {
<span class="hljs-keyword">if</span> (error <span class="hljs-keyword">instanceof</span> <span class="hljs-title class_">TimeoutError</span>) {
<span class="hljs-variable language_">console</span>.<span class="hljs-title function_">error</span>(<span class="hljs-string">`Attempt <span class="hljs-subst">${attempts + <span class="hljs-number">1</span>}</span>: CAPTCHA timeout`</span>);
} <span class="hljs-keyword">else</span> <span class="hljs-keyword">if</span> (error <span class="hljs-keyword">instanceof</span> <span class="hljs-title class_">NetworkError</span>) {
<span class="hljs-variable language_">console</span>.<span class="hljs-title function_">error</span>(<span class="hljs-string">`Attempt <span class="hljs-subst">${attempts + <span class="hljs-number">1</span>}</span>: Network failure`</span>);
}
attempts++;
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">waitForTimeout</span>(<span class="hljs-number">2000</span> * attempts);
}
}
<span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-title class_">Error</span>(<span class="hljs-string">'Maximum retry attempts exceeded'</span>);
}
This approach handles timeouts and network issues with incremental retries, ensuring your script doesn't crash unexpectedly.
IP and Browser Rotation
Rotating IPs and browser fingerprints helps avoid detection. Here's how you can use puppeteer-extra plugins for this purpose:
<span class="hljs-keyword">const</span> puppeteer = <span class="hljs-built_in">require</span>(<span class="hljs-string">'puppeteer-extra'</span>);
<span class="hljs-keyword">const</span> <span class="hljs-title class_">StealthPlugin</span> = <span class="hljs-built_in">require</span>(<span class="hljs-string">'puppeteer-extra-plugin-stealth'</span>);
<span class="hljs-keyword">const</span> <span class="hljs-title class_">AnonymizeUAPlugin</span> = <span class="hljs-built_in">require</span>(<span class="hljs-string">'puppeteer-extra-plugin-anonymize-ua'</span>);
puppeteer.<span class="hljs-title function_">use</span>(<span class="hljs-title class_">StealthPlugin</span>());
puppeteer.<span class="hljs-title function_">use</span>(<span class="hljs-title class_">AnonymizeUAPlugin</span>());
<span class="hljs-keyword">async</span> <span class="hljs-keyword">function</span> <span class="hljs-title function_">rotateIdentity</span>(<span class="hljs-params"></span>) {
<span class="hljs-keyword">const</span> proxy = <span class="hljs-keyword">await</span> <span class="hljs-title function_">getNextProxy</span>(); <span class="hljs-comment">// Your proxy rotation logic</span>
<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> puppeteer.<span class="hljs-title function_">launch</span>({
<span class="hljs-attr">args</span>: [<span class="hljs-string">`--proxy-server=<span class="hljs-subst">${proxy.host}</span>:<span class="hljs-subst">${proxy.port}</span>`</span>]
});
<span class="hljs-keyword">return</span> browser;
}
By rotating IPs and HTTP headers, your script mimics natural browsing behavior, reducing the chances of being flagged.
Performance Optimization
Boost your script's efficiency and success rate with the following techniques:
- Resource Management
Stop unnecessary resource downloads like images, stylesheets, or fonts:
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">setRequestInterception</span>(<span class="hljs-literal">true</span>);
page.<span class="hljs-title function_">on</span>(<span class="hljs-string">'request'</span>, <span class="hljs-function">(<span class="hljs-params">request</span>) =></span> {
<span class="hljs-keyword">if</span> ([<span class="hljs-string">'image'</span>, <span class="hljs-string">'stylesheet'</span>, <span class="hljs-string">'font'</span>].<span class="hljs-title function_">includes</span>(request.<span class="hljs-title function_">resourceType</span>())) {
request.<span class="hljs-title function_">abort</span>();
} <span class="hljs-keyword">else</span> {
request.<span class="hljs-title function_">continue</span>();
}
});
- Parallel Processing
Use puppeteer-cluster to solve multiple CAPTCHAs at the same time:
<span class="hljs-keyword">const</span> { <span class="hljs-title class_">Cluster</span> } = <span class="hljs-built_in">require</span>(<span class="hljs-string">'puppeteer-cluster'</span>);
<span class="hljs-keyword">const</span> cluster = <span class="hljs-keyword">await</span> <span class="hljs-title class_">Cluster</span>.<span class="hljs-title function_">launch</span>({
<span class="hljs-attr">concurrency</span>: <span class="hljs-title class_">Cluster</span>.<span class="hljs-property">CONCURRENCY_CONTEXT</span>,
<span class="hljs-attr">maxConcurrency</span>: <span class="hljs-number">4</span>,
<span class="hljs-attr">monitor</span>: <span class="hljs-literal">true</span>
});
<span class="hljs-keyword">await</span> cluster.<span class="hljs-title function_">task</span>(<span class="hljs-title function_">async</span> ({ page, <span class="hljs-attr">data</span>: url }) => {
<span class="hljs-keyword">await</span> <span class="hljs-title function_">handleCaptchaSolution</span>(page);
});
- Caching Strategy
Cache responses to avoid redundant requests and save processing time:
<span class="hljs-keyword">const</span> cache = <span class="hljs-keyword">new</span> <span class="hljs-title class_">Map</span>();
<span class="hljs-keyword">async</span> <span class="hljs-keyword">function</span> <span class="hljs-title function_">getCachedResponse</span>(<span class="hljs-params">url</span>) {
<span class="hljs-keyword">if</span> (cache.<span class="hljs-title function_">has</span>(url)) {
<span class="hljs-keyword">const</span> { timestamp, data } = cache.<span class="hljs-title function_">get</span>(url);
<span class="hljs-keyword">if</span> (<span class="hljs-title class_">Date</span>.<span class="hljs-title function_">now</span>() - timestamp < <span class="hljs-number">3600000</span>) { <span class="hljs-comment">// 1-hour cache</span>
<span class="hljs-keyword">return</span> data;
}
}
<span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> <span class="hljs-title function_">fetchResponse</span>(url);
cache.<span class="hljs-title function_">set</span>(url, { <span class="hljs-attr">timestamp</span>: <span class="hljs-title class_">Date</span>.<span class="hljs-title function_">now</span>(), <span class="hljs-attr">data</span>: response });
<span class="hljs-keyword">return</span> response;
}
These methods work together to reduce resource usage, improve speed, and handle multiple tasks efficiently.
Conclusion and Implementation Guide
CAPTCHA Solution Overview
Handling CAPTCHAs effectively involves a layered strategy focused on prevention. By using tools like stealth techniques, optimized headers, and rotating IPs, you can reduce the chances of CAPTCHAs being triggered in the first place. Prevention is always better than solving them reactively.
Latenode Browser Automation
Latenode makes CAPTCHA management easier with built-in features like stealth mode, proxy rotation, and cookie handling.
Here's an example of how you can set it up:
<span class="hljs-keyword">const</span> workflow = <span class="hljs-keyword">new</span> <span class="hljs-title class_">LatenodeWorkflow</span>({ <span class="hljs-attr">browserOptions</span>: { <span class="hljs-attr">stealth</span>: <span class="hljs-literal">true</span>, <span class="hljs-attr">proxyRotation</span>: <span class="hljs-literal">true</span>, <span class="hljs-attr">cookieManagement</span>: <span class="hljs-literal">true</span> } });
<span class="hljs-keyword">await</span> workflow.<span class="hljs-title function_">initBrowser</span>({ <span class="hljs-attr">captchaHandling</span>: { <span class="hljs-attr">prevention</span>: <span class="hljs-literal">true</span>, <span class="hljs-attr">autoRetry</span>: <span class="hljs-literal">true</span>, <span class="hljs-attr">maxAttempts</span>: <span class="hljs-number">3</span> } });
Next Steps for Implementation
To enhance your automation workflow, consider these steps:
- Enable Stealth Mode
Use Puppeteer-extra stealth plugins to lower the chances of triggering CAPTCHAs. - Set Up Error Recovery
Add error recovery mechanisms to handle different CAPTCHA types. Use automatic retries with strategies like exponential backoff for smoother operation. - Improve Resource Efficiency
Reduce script execution time by selectively loading resources and using caching, ensuring better performance without sacrificing success rates.
Related posts



