A low-code platform blending no-code simplicity with full-code power 🚀
Get started free
March 3, 2025
•
8
min read

How Headless Browser Detection Works and How to Bypass It

George Miloradovich
Researcher, Copywriter & Usecase Interviewer
Table of contents

Headless browsers are powerful tools for automation, testing, and web scraping. However, websites have advanced methods to detect and block them. Here's a quick overview of how detection works and ways to bypass it:

How Websites Detect Headless Browsers

  1. Browser-Side Techniques:
    • User Agent Analysis: Detects unusual or inconsistent browser identifiers.
    • JavaScript Execution: Flags missing or modified JavaScript features.
    • Canvas Fingerprinting: Identifies unique graphics rendering signatures.
    • Permission States: Checks for anomalies in browser permissions.
    • Plugin Detection: Looks for missing standard plugins.
  2. Server-Side Techniques:
    • Request Pattern Analysis: Tracks timing and frequency of requests.
    • Header Examination: Examines HTTP headers for inconsistencies.
    • IP Behavior Tracking: Flags suspicious IP activity or proxy usage.
    • Browser Fingerprinting: Combines multiple signals to create unique identifiers.

How to Bypass Detection

  • Modify Browser Settings:
    • Use common User Agents.
    • Adjust window size and viewport to match standard devices.
    • Disable automation flags (e.g., --disable-blink-features=AutomationControlled).
  • Use Anti-Detection Tools:
    • Tools like Puppeteer Stealth and ZenRows can mimic real user behavior.
    • Features include fingerprint modifications, proxy rotation, and interaction simulation.
  • Optimize IP and User Agent Rotation:
    • Rotate IPs and User Agents based on time, location, and device type.
    • Use residential proxies for better authenticity.
  • Handle CAPTCHAs:
    • Use CAPTCHA-solving tools like 2Captcha or Anti-Captcha.
    • Add delays and session management to reduce CAPTCHA triggers.

Quick Comparison Table

Detection Method What It Checks Bypass Strategy
User Agent Analysis Browser identifiers Use common User Agent strings
JavaScript Execution JavaScript environment Ensure full JavaScript support
Canvas Fingerprinting Graphics rendering signatures Use anti-fingerprinting tools
Request Pattern Analysis Timing/frequency of requests Add random delays and spread requests
IP Behavior Tracking Proxy or VPN usage Rotate residential IPs

Web scraping and automation require careful configuration to avoid detection. By understanding how detection works and using ethical bypass methods, you can minimize risks while staying compliant with website policies.

Bypass Detection using plugins, settings & proxies

Detection Methods Used by Websites

Modern websites use both browser-side and server-side techniques to identify and block headless browsers. Here's a closer look at how these methods work.

Browser-Side Detection

This approach focuses on spotting inconsistencies in browser properties and behaviors that often signal the use of headless browsers. These methods highlight differences between headless setups and standard browsers.

Detection Method What It Checks Why It Works
User Agent Analysis Identification Headless browsers often use unusual or inconsistent user agents
JavaScript Execution JavaScript environment Headless setups may lack or modify standard JavaScript features
Canvas Fingerprinting Graphics rendering Headless browsers can produce distinct rendering signatures
Permission States Browser permissions Headless browsers struggle with handling Notification.permission states
Plugin Detection Available plugins Headless browsers usually don't include standard browser plugins

Companies like Fingerprint Pro use over 70 browser signals to generate unique identifiers . Their method combines various fingerprinting techniques to identify users effectively:

"Browser fingerprinting is the foundation in which device intelligence is built, enabling businesses to uniquely identify website visitors on websites around the world." – Fingerprint Pro

Server-Side Detection

Server-side detection looks at request patterns and network behaviors to identify suspicious activity. Here are some common strategies:

  1. Request Pattern Analysis: Servers track the timing and frequency of requests, as human users typically show natural variations .
  2. Header Examination: HTTP headers are analyzed for inconsistencies that might indicate a headless browser.
  3. IP Behavior Tracking: Systems flag unusual activity, such as multiple requests from a single IP, use of proxies or VPNs, or geographic mismatches.
  4. Browser Fingerprinting: Browser signals are compiled on the server side to create unique identifiers for visitors.

These techniques, when combined, help websites detect and block non-human traffic effectively.

Safe Ways to Reduce Detection

Once you understand detection methods, you can take specific steps to minimize detection risks. These strategies align your technical setup with typical user behavior, making it harder for systems to flag automation.

Browser Settings Changes

Adjusting your browser settings can help it behave more like a regular user's browser.

Setting Type Recommended Change Impact
User Agent Use a common browser string Masks automation signatures
Window Size Set standard resolutions (e.g., 1920x1080) Imitates real desktop displays
WebDriver Disable automation flags Reduces detectable signals
Viewport Enable mobile emulation when needed Matches device-specific behavior

For instance, using Chrome's --disable-blink-features=AutomationControlled flag can prevent websites from identifying automation tools. This approach has been shown to reduce detection risks while maintaining legitimate functionality.

Anti-Detection Tools

Tools like Puppeteer Stealth, equipped with 17 evasion modules, provide advanced methods for ethical automation . Similarly, ZenRows achieves a 98.7% success rate in bypassing anti-bot measures while adhering to website policies .

Some key features of these tools include:

  • Modifying browser fingerprints
  • Adjusting request headers
  • Rotating proxies
  • Simulating mouse movements
  • Mimicking keyboard input patterns

"The ZenRows Scraping Browser fortifies your Puppeteer browser instance with advanced evasions to mimic an actual user and bypass anti-bot checks."

IP and User Agent Changes

After optimizing your browser and tools, focus on rotating IP addresses and User Agents to replicate natural browsing patterns. Here are some effective techniques:

  • Time-based rotation: Change User Agents based on typical daily usage patterns, increasing frequency during peak hours and spacing out requests to appear more organic.
  • Geographic alignment: Use IP addresses and User Agents that match the region you're targeting. For example, when accessing U.S.-based services, select User Agents resembling popular American browsers.
  • Device-specific selection: Match User Agents to the type of content you're accessing. For mobile-optimized pages, use mobile browser signatures to maintain consistency.

For example, an online retailer implemented these strategies and saw a 40% reduction in costs along with a 25% improvement in data accuracy .

sbb-itb-23997f1

Setting Up Detection Bypasses

To reduce detection risks, configure your browser and tools to imitate regular user behavior effectively.

Adjusting Chrome Settings

Tweak Chrome settings to lower the chances of detection. Here are key parameters to configure:

Setting Command Flag Purpose
Automation Control --disable-blink-features=AutomationControlled Masks automation signals
Window Size --window-size=1920,1080 Aligns with standard desktop resolutions
User Agent --user-agent="Mozilla/5.0 ..." Mimics a standard browser identification

To launch Chrome with these settings, use the following command:

chrome --headless --disable-blink-features=AutomationControlled --window-size=1920,1080

Once Chrome is properly configured, enhance concealment further using specialized tools.

Leveraging Puppeteer Stealth

Puppeteer Stealth

Puppeteer Stealth is a tool that modifies browser properties to obscure automation signals. It includes multiple modules for evasion . Here's how to set it up:

const puppeteer = require('puppeteer-extra'); 
const StealthPlugin = require('puppeteer-extra-plugin-stealth'); 
puppeteer.use(StealthPlugin());

As noted in the Puppeteer Stealth documentation:

"It's probably impossible to prevent all ways to detect headless chromium, but it should be possible to make it so difficult that it becomes cost-prohibitive or triggers too many false-positives to be feasible." - Puppeteer Stealth documentation

Strategies for Handling CAPTCHAs

Beyond browser setup, CAPTCHAs often require dedicated solutions. Modern CAPTCHA-solving services provide varying levels of efficiency and pricing:

Service Cost per 1,000 CAPTCHAs Features
2Captcha $0.77 Basic CAPTCHA solving
DeathByCaptcha $1.39 AI + human solvers
Anti-Captcha $1.00 Supports automation tools

For example, Adrian Rosebrock demonstrated an AI-based CAPTCHA bypass for the E-ZPass New York website by training a model on hundreds of CAPTCHA images .

Here’s how to approach CAPTCHAs:

  • Start by optimizing browser configurations to avoid them when possible.
  • Use session management to maintain a consistent user identity.
  • Add random delays between requests to imitate human browsing patterns.
  • Employ residential proxies to spread requests naturally across different locations.

Guidelines and Rules

Before starting any web scraping activity, it's crucial to ensure compliance with legal standards. Here's a quick breakdown:

Requirement Description Impact
Terms of Service Rules set by the website regarding automation May restrict or forbid automated access
Data Protection Laws like GDPR or other privacy regulations Influences how data can be collected and stored
Access Rates Limits in robots.txt or specified terms Defines how frequently requests can be made

Meeting Website Rules

Stick to these practices to stay within the boundaries of acceptable use:

  • Request Rate Management: Space out your requests by 5–10 seconds to simulate human browsing and avoid detection.
  • Robots.txt Compliance: Always check and adhere to the instructions outlined in a website's robots.txt file.
  • Data Usage Guidelines: Only collect data in accordance with the website's acceptable use policies.

Other Automation Options

If you're encountering challenges with detection or access, consider these alternatives to traditional headless browsers:

Alternative Benefits Best Use Case
Official APIs Provides structured, documented data access When the website offers API functionality
RSS Feeds Lightweight and authorized updates Ideal for content monitoring or aggregation
Data Partnerships Offers authorized, reliable access Suitable for large-scale data needs

To enhance security and ensure compliance, isolate your headless environments and enforce strict access controls. When automation is unavoidable, use rotating IP addresses and introduce delays between requests to maintain responsible access patterns. These adjustments help balance efficient scraping with ethical practices .

Summary

This section highlights the technical methods and ethical strategies discussed earlier.

Detection Methods Review

Websites today rely on advanced techniques to identify headless browsers. Fingerprinting has become a primary method, surpassing traditional client-based cookie tracking. It's worth noting that automated bots account for about 25% of all website traffic .

Detection Layer Key Techniques Common Indicators
Browser-side Fingerprinting, JavaScript checks Signs of automation
Server-side Traffic analysis, IP examination Request timing, proxy usage
Behavioral Interaction tracking, navigation analysis Click patterns, scrolling behavior

These insights lay the groundwork for implementing safer bypass techniques.

Safe Bypass Methods

Consider these practical strategies to avoid detection:

Strategy Implementation Effectiveness
Stealth Tools Tools like Undetected Chromedriver or Puppeteer-Stealth Effective for evading basic detection
Request Timing Introducing 5–10 second delays Mimics human browsing patterns
Proxy Rotation Using residential IPs with location alignment Reduces chances of being blocked

Combining these techniques can help your automation efforts remain under the radar.

Next Steps

  1. Choose Tools: Opt for stealth tools such as Undetected Chromedriver or Puppeteer-Stealth.
  2. Set Up Configuration: Use browser.createIncognitoBrowserContext() for session isolation, enable WebRTC leak protection, and align timezone and language settings with your proxy's location.
  3. Optimize Resources: Apply throttling, cache data to reduce redundant requests, and spread tasks across multiple IPs to evenly distribute the load.

Related Blog Posts

Related Blogs

Use case

Backed by