Latenode

Strategies for Bypassing Cloudflare Protection with Puppeteer

Learn effective techniques to ethically bypass Cloudflare protections using Puppeteer, including proxy management and human behavior simulation.

RaianRaian
Strategies for Bypassing Cloudflare Protection with Puppeteer

Cloudflare's security measures make it tough for bots to access websites. But with Puppeteer, you can bypass these defenses if done ethically. Here's a quick guide to get started:

  • Core Techniques:

    • Use residential proxies and rotate them to avoid detection.
    • Mimic human behavior with random delays and mouse movements.
    • Handle JavaScript challenges and CAPTCHAs using plugins like puppeteer-extra-plugin-recaptcha.
    • Disguise automation by tweaking browser fingerprints and user agents.
  • Puppeteer Setup:

    • Install Puppeteer and plugins:

      npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth
      
    • Add stealth settings to avoid detection:

      <span class="hljs-keyword">const</span> puppeteer = <span class="hljs-built_in">require</span>(<span class="hljs-string">&#x27;puppeteer-extra&#x27;</span>);
      <span class="hljs-keyword">const</span> <span class="hljs-title class_">StealthPlugin</span> = <span class="hljs-built_in">require</span>(<span class="hljs-string">&#x27;puppeteer-extra-plugin-stealth&#x27;</span>);
      puppeteer.<span class="hljs-title function_">use</span>(<span class="hljs-title class_">StealthPlugin</span>());
      
  • Legal Reminder: Always respect website terms of service, avoid overloading servers, and use automation responsibly.

These steps can improve your success rate against Cloudflare's protections. Dive into the article for detailed code snippets and advanced techniques.

Nodejs Puppeteer Tutorial #7 - Bypass Detection using ...

Puppeteer Setup Guide

Follow these steps to configure Puppeteer with custom settings and plugins to navigate around Cloudflare protections.

Basic Puppeteer Installation Steps

First, make sure you have Node.js v18 or newer installed. Then, run the following command to install Puppeteer and its related plugins:

npm install puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

Next, create a new JavaScript file and import the necessary modules:

<span class="hljs-keyword">const</span> puppeteer = <span class="hljs-built_in">require</span>(<span class="hljs-string">&#x27;puppeteer-extra&#x27;</span>);
<span class="hljs-keyword">const</span> <span class="hljs-title class_">StealthPlugin</span> = <span class="hljs-built_in">require</span>(<span class="hljs-string">&#x27;puppeteer-extra-plugin-stealth&#x27;</span>);
puppeteer.<span class="hljs-title function_">use</span>(<span class="hljs-title class_">StealthPlugin</span>());

After this, you'll need to tweak the browser launch settings to avoid detection.

Required Puppeteer Settings

Set up your browser instance with configurations that lower the chance of being flagged:

<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> puppeteer.<span class="hljs-title function_">launch</span>({
    <span class="hljs-attr">headless</span>: <span class="hljs-literal">false</span>,
    <span class="hljs-attr">args</span>: [
        <span class="hljs-string">&#x27;--no-sandbox&#x27;</span>,
        <span class="hljs-string">&#x27;--disable-setuid-sandbox&#x27;</span>,
        <span class="hljs-string">&#x27;--disable-infobars&#x27;</span>,
        <span class="hljs-string">&#x27;--window-position=0,0&#x27;</span>,
        <span class="hljs-string">&#x27;--ignore-certificate-errors&#x27;</span>,
        <span class="hljs-string">&#x27;--ignore-certificate-errors-spki-list&#x27;</span>
    ]
});

If you want to use a proxy to mask your IP, include these additional settings:

<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> puppeteer.<span class="hljs-title function_">launch</span>({
    <span class="hljs-attr">args</span>: [
        <span class="hljs-string">&#x27;--proxy-server=http://proxy-address:port&#x27;</span>
    ]
});

<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">authenticate</span>({
    <span class="hljs-attr">username</span>: <span class="hljs-string">&#x27;proxy-username&#x27;</span>,
    <span class="hljs-attr">password</span>: <span class="hljs-string">&#x27;proxy-password&#x27;</span>
});

Useful Puppeteer Add-ons

Boost Puppeteer's functionality with these plugins:

Plugin NamePurposeKey Features
puppeteer-extra-plugin-stealthAnti-detectionModifies browser fingerprinting and mimics human behavior
puppeteer-extra-plugin-recaptchaCAPTCHA handlingAutomates CAPTCHA solving
puppeteer-extra-plugin-adblockerResource managementBlocks ads and trackers to reduce detection risks

To integrate these add-ons, use the following code:

<span class="hljs-keyword">const</span> puppeteer = <span class="hljs-built_in">require</span>(<span class="hljs-string">&#x27;puppeteer-extra&#x27;</span>);
<span class="hljs-keyword">const</span> <span class="hljs-title class_">StealthPlugin</span> = <span class="hljs-built_in">require</span>(<span class="hljs-string">&#x27;puppeteer-extra-plugin-stealth&#x27;</span>);
<span class="hljs-keyword">const</span> <span class="hljs-title class_">AdblockerPlugin</span> = <span class="hljs-built_in">require</span>(<span class="hljs-string">&#x27;puppeteer-extra-plugin-adblocker&#x27;</span>);

puppeteer.<span class="hljs-title function_">use</span>(<span class="hljs-title class_">StealthPlugin</span>());
puppeteer.<span class="hljs-title function_">use</span>(<span class="hljs-title class_">AdblockerPlugin</span>({ <span class="hljs-attr">blockTrackers</span>: <span class="hljs-literal">true</span> }));

Finally, to make your actions look more natural, introduce random delays between them:

<span class="hljs-keyword">const</span> <span class="hljs-title function_">delay</span> = ms =&gt; <span class="hljs-keyword">new</span> <span class="hljs-title class_">Promise</span>(<span class="hljs-function"><span class="hljs-params">resolve</span> =&gt;</span> <span class="hljs-built_in">setTimeout</span>(resolve, ms));
<span class="hljs-keyword">await</span> <span class="hljs-title function_">delay</span>(<span class="hljs-title class_">Math</span>.<span class="hljs-title function_">random</span>() * <span class="hljs-number">1000</span> + <span class="hljs-number">1000</span>); <span class="hljs-comment">// Random delay between 1-2 seconds</span>
sbb-itb-23997f1

Methods to Handle Cloudflare Security

Enhance your Puppeteer setup with these techniques to better navigate Cloudflare's defenses.

Browser Identity Management

Cloudflare's anti-bot system monitors browser fingerprints and automation signals. To disguise Puppeteer's activity, tweak browser identifiers and properties as shown below:

<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> puppeteer.<span class="hljs-title function_">launch</span>({
    <span class="hljs-attr">args</span>: [
        <span class="hljs-string">&#x27;--window-size=1920,1080&#x27;</span>,
        <span class="hljs-string">&#x27;--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36&#x27;</span>
    ],
    <span class="hljs-attr">ignoreDefaultArgs</span>: [<span class="hljs-string">&#x27;--enable-automation&#x27;</span>]
});

<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">evaluateOnNewDocument</span>(<span class="hljs-function">() =&gt;</span> {
    <span class="hljs-keyword">delete</span> navigator.<span class="hljs-property">webdriver</span>;
    <span class="hljs-title class_">Object</span>.<span class="hljs-title function_">defineProperty</span>(navigator, <span class="hljs-string">&#x27;plugins&#x27;</span>, {
        <span class="hljs-attr">get</span>: <span class="hljs-function">() =&gt;</span> [<span class="hljs-number">1</span>, <span class="hljs-number">2</span>, <span class="hljs-number">3</span>, <span class="hljs-number">4</span>, <span class="hljs-number">5</span>]
    });
});

This configuration removes automation indicators and mimics a standard browser fingerprint, helping bypass Cloudflare's checks.

JavaScript Challenge Solutions

Dealing with Cloudflare's JavaScript challenges requires simulating human-like behavior. For instance, you can wait for the challenge form to disappear and introduce random delays between actions:

<span class="hljs-comment">// Wait until the Cloudflare challenge form is gone</span>
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">waitForFunction</span>(<span class="hljs-function">() =&gt;</span> {
    <span class="hljs-keyword">return</span> <span class="hljs-variable language_">document</span>.<span class="hljs-title function_">querySelector</span>(<span class="hljs-string">&#x27;#challenge-form&#x27;</span>) === <span class="hljs-literal">null</span>;
}, { <span class="hljs-attr">timeout</span>: <span class="hljs-number">30000</span> });

<span class="hljs-comment">// Add random delays to simulate human interaction</span>
<span class="hljs-keyword">const</span> <span class="hljs-title function_">randomDelay</span> = (<span class="hljs-params">min, max</span>) =&gt; {
    <span class="hljs-keyword">return</span> <span class="hljs-title class_">Math</span>.<span class="hljs-title function_">floor</span>(<span class="hljs-title class_">Math</span>.<span class="hljs-title function_">random</span>() * (max - min + <span class="hljs-number">1</span>) + min);
};
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">waitForTimeout</span>(<span class="hljs-title function_">randomDelay</span>(<span class="hljs-number">1000</span>, <span class="hljs-number">3000</span>));

You can also create custom handlers to better mimic user behavior as needed. Next, you'll need a strategy for handling CAPTCHAs.

CAPTCHA Management Options

When faced with CAPTCHAs, using a CAPTCHA-solving plugin can simplify the process:

<span class="hljs-keyword">const</span> <span class="hljs-title class_">RecaptchaPlugin</span> = <span class="hljs-built_in">require</span>(<span class="hljs-string">&#x27;puppeteer-extra-plugin-recaptcha&#x27;</span>);
puppeteer.<span class="hljs-title function_">use</span>(
    <span class="hljs-title class_">RecaptchaPlugin</span>({
        <span class="hljs-attr">provider</span>: {
            <span class="hljs-attr">id</span>: <span class="hljs-string">&#x27;2captcha&#x27;</span>,
            <span class="hljs-attr">token</span>: <span class="hljs-string">&#x27;your-api-key&#x27;</span>
        }
    })
);

If the plugin fails to solve the CAPTCHA, you can switch to proxy rotation as a fallback:

<span class="hljs-keyword">try</span> {
    <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">solveRecaptchas</span>();
} <span class="hljs-keyword">catch</span> (e) {
    <span class="hljs-comment">// Rotate to a new proxy if CAPTCHA solving fails</span>
    <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">setRequestInterception</span>(<span class="hljs-literal">true</span>);
    <span class="hljs-keyword">await</span> <span class="hljs-title function_">useNextProxy</span>();
}

These methods help you navigate CAPTCHA challenges and maintain access, even when automation tools encounter roadblocks.

Reliability Tips and Guidelines

Implementing reliable techniques is key to ensuring success in automation.

Proxy Setup and Usage

Using well-managed proxies can help reduce detection risks significantly. For example, premium residential proxies offer better bypass capabilities. Here's a setup example:

<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> puppeteer.<span class="hljs-title function_">launch</span>({
    <span class="hljs-attr">args</span>: [
        <span class="hljs-string">`--proxy-server=<span class="hljs-subst">${proxyAddress}</span>`</span>,
        <span class="hljs-string">&#x27;--no-sandbox&#x27;</span>,
        <span class="hljs-string">&#x27;--disable-setuid-sandbox&#x27;</span>
    ]
});

<span class="hljs-comment">// Handle proxy authentication</span>
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">setRequestInterception</span>(<span class="hljs-literal">true</span>);
page.<span class="hljs-title function_">on</span>(<span class="hljs-string">&#x27;request&#x27;</span>, <span class="hljs-function"><span class="hljs-params">request</span> =&gt;</span> {
    request.<span class="hljs-title function_">continue</span>({
        <span class="hljs-attr">headers</span>: {
            ...request.<span class="hljs-title function_">headers</span>(),
            <span class="hljs-string">&#x27;Proxy-Authorization&#x27;</span>: <span class="hljs-string">`Basic <span class="hljs-subst">${Buffer.<span class="hljs-keyword">from</span>(
                <span class="hljs-string">`<span class="hljs-subst">${proxyUsername}</span>:<span class="hljs-subst">${proxyPassword}</span>`</span>
            ).toString(<span class="hljs-string">&#x27;base64&#x27;</span>)}</span>`</span>
        }
    });
});

Rotating proxies regularly also helps maintain uninterrupted access:

<span class="hljs-keyword">const</span> proxyList = [
    <span class="hljs-string">&#x27;proxy1.example.com:8080&#x27;</span>,
    <span class="hljs-string">&#x27;proxy2.example.com:8080&#x27;</span>,
    <span class="hljs-string">&#x27;proxy3.example.com:8080&#x27;</span>
];

<span class="hljs-keyword">function</span> <span class="hljs-title function_">getNextProxy</span>(<span class="hljs-params"></span>) {
    <span class="hljs-keyword">const</span> proxy = proxyList.<span class="hljs-title function_">shift</span>();
    proxyList.<span class="hljs-title function_">push</span>(proxy);
    <span class="hljs-keyword">return</span> proxy;
}

Request Timing Control

Simulating human-like behavior can minimize detection risks. Here's how you can manage request timing effectively:

<span class="hljs-keyword">const</span> <span class="hljs-title function_">simulateHumanBehavior</span> = <span class="hljs-keyword">async</span> (<span class="hljs-params">page</span>) =&gt; {
    <span class="hljs-comment">// Add random mouse movements</span>
    <span class="hljs-keyword">await</span> page.<span class="hljs-property">mouse</span>.<span class="hljs-title function_">move</span>(<span class="hljs-number">100</span>, <span class="hljs-number">200</span>);
    <span class="hljs-keyword">await</span> page.<span class="hljs-property">mouse</span>.<span class="hljs-title function_">move</span>(<span class="hljs-number">150</span>, <span class="hljs-number">250</span>, { <span class="hljs-attr">steps</span>: <span class="hljs-number">10</span> });

    <span class="hljs-comment">// Randomize typing speed</span>
    <span class="hljs-keyword">await</span> page.<span class="hljs-property">keyboard</span>.<span class="hljs-title function_">type</span>(<span class="hljs-string">&#x27;Hello World&#x27;</span>, { 
        <span class="hljs-attr">delay</span>: <span class="hljs-title class_">Math</span>.<span class="hljs-title function_">floor</span>(<span class="hljs-title class_">Math</span>.<span class="hljs-title function_">random</span>() * (<span class="hljs-number">150</span> - <span class="hljs-number">50</span>) + <span class="hljs-number">50</span>) 
    });

    <span class="hljs-comment">// Add pauses</span>
    <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">waitForTimeout</span>(
        <span class="hljs-title class_">Math</span>.<span class="hljs-title function_">floor</span>(<span class="hljs-title class_">Math</span>.<span class="hljs-title function_">random</span>() * (<span class="hljs-number">3000</span> - <span class="hljs-number">1000</span>) + <span class="hljs-number">1000</span>)
    );
};

These techniques, combined with proper session handling, make automation efforts more reliable.

Cookie and Session Management

Managing cookies effectively is crucial for maintaining session continuity. Here's how you can store and restore sessions:

<span class="hljs-comment">// Store successful session cookies</span>
<span class="hljs-keyword">const</span> <span class="hljs-title function_">storeCookies</span> = <span class="hljs-keyword">async</span> (<span class="hljs-params">page</span>) =&gt; {
    <span class="hljs-keyword">const</span> cookies = <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">cookies</span>();
    <span class="hljs-keyword">await</span> fs.<span class="hljs-title function_">writeFile</span>(
        <span class="hljs-string">&#x27;cookies.json&#x27;</span>,
        <span class="hljs-title class_">JSON</span>.<span class="hljs-title function_">stringify</span>(cookies, <span class="hljs-literal">null</span>, <span class="hljs-number">2</span>)
    );
};

<span class="hljs-comment">// Restore previous session</span>
<span class="hljs-keyword">const</span> <span class="hljs-title function_">loadCookies</span> = <span class="hljs-keyword">async</span> (<span class="hljs-params">page</span>) =&gt; {
    <span class="hljs-keyword">try</span> {
        <span class="hljs-keyword">const</span> cookiesString = <span class="hljs-keyword">await</span> fs.<span class="hljs-title function_">readFile</span>(<span class="hljs-string">&#x27;cookies.json&#x27;</span>);
        <span class="hljs-keyword">const</span> cookies = <span class="hljs-title class_">JSON</span>.<span class="hljs-title function_">parse</span>(cookiesString);
        <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">setCookie</span>(...cookies);
    } <span class="hljs-keyword">catch</span> (error) {
        <span class="hljs-variable language_">console</span>.<span class="hljs-title function_">log</span>(<span class="hljs-string">&#x27;No stored cookies found&#x27;</span>);
    }
};

If cookies fail validation, you can refresh them automatically:

<span class="hljs-keyword">const</span> <span class="hljs-title function_">validateCookies</span> = <span class="hljs-keyword">async</span> (<span class="hljs-params">page</span>) =&gt; {
    <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">goto</span>(targetUrl);
    <span class="hljs-keyword">if</span> (response.<span class="hljs-title function_">status</span>() === <span class="hljs-number">403</span>) {
        <span class="hljs-keyword">await</span> <span class="hljs-title function_">clearCookies</span>(page);
        <span class="hljs-keyword">await</span> <span class="hljs-title function_">initializeNewSession</span>(page);
    }
};

Conclusion

Let's wrap up by highlighting the main techniques and steps we've covered.

Key Techniques Recap

Getting past Cloudflare protection requires a mix of methods working together. The most important strategies include:

  • Using residential proxies with proper rotation
  • Managing browser fingerprints and user agents
  • Handling cookies and sessions effectively
  • Mimicking human behavior with random delays
  • Setting accurate request headers
  • Managing authentication correctly

According to industry data, combining these techniques can lead to success rates as high as 98.7% when conditions are optimized [1].

By focusing on these core methods, you can build a reliable and compliant automation process.

Steps for Implementation

Here are some final tips to refine your Puppeteer setup and Cloudflare bypass efforts:

  • Begin with basic Puppeteer configurations
  • Rotate proxies thoughtfully
  • Set up strong error-handling mechanisms
  • Check the target site's terms of service before starting
  • Keep request rates moderate to avoid triggering defenses
  • Document your automation workflows

Puppeteer requires careful setup and regular tweaking to stay effective against evolving Cloudflare defenses. As security measures grow more advanced, success will depend on ongoing updates to your strategy and strict adherence to ethical practices. While initial configurations might work well, maintaining long-term performance means staying flexible and following clear guidelines.

Related posts

Raian

Researcher, Nocode Expert

Author details →