Latenode

Browserless Chrome: A Powerful Tool for Browser Automation

Streamline browser automation with a cloud-based headless Chrome service that simplifies web scraping, PDF generation, and testing.

RaianRaian
Browserless Chrome: A Powerful Tool for Browser Automation

Browserless Chrome is a cloud-based service that simplifies browser automation by running headless Chrome for tasks like web scraping, PDF generation, and testing. It eliminates the need for local browser setups and manages browser crashes, session isolation, and resource optimization automatically. Key benefits include:

  • No Local Setup: Use Docker or cloud deployment to get started quickly.
  • Fast Performance: Screenshots in ~1 second, PDFs in ~2 seconds.
  • Bot Detection Bypass: Overcomes blockers like Cloudflare for reliable automation.
  • Resource Efficiency: Reduces proxy and virtual machine usage by up to 90%.

Quick Setup: Integrate with popular libraries like Puppeteer, Playwright, or Selenium using simple connection methods. Browserless also offers APIs for web scraping, JavaScript rendering, and custom automation workflows.

Quick Comparison of Features

FeatureBenefitExample
Session CleanupRemoves inactive sessions automaticallyKeeps resources optimized
Bot Detection PreventionBypasses blockers like CloudflareReliable for scraping tasks
Multi-task ProcessingHandles concurrent requests effectively2M+ sessions processed weekly
Ready-to-use APIsSimplifies automation tasksJSON data extraction, PDFs

Browserless Chrome is ideal for developers and businesses looking to streamline automation without managing complex infrastructure.

Setup Guide

How to Install

To get started with Browserless Chrome, you can choose between two installation options: a local setup using Docker or a cloud deployment. For a local Docker setup, use the following command:

docker run -p 3000:3000 ghcr.io/browserless/chromium

This command pulls the latest image and makes it accessible on port 3000 [3]. It works seamlessly across Windows, macOS, and Linux.

Initial Configuration

Browserless Chrome includes several built-in features to simplify the setup process:

FeatureDefault SettingPurpose
Session Cleanup30 secondsRemoves inactive sessions automatically
Health ChecksEnabledEnsures system stability
Request QueueConfigurableManages multiple concurrent connections
Resource LimitsAdjustableControls memory and CPU usage

You can customize the environment by setting these variables:

MAX_CONCURRENT_SESSIONS=10
CONNECTION_TIMEOUT=30000
MAX_QUEUE_LENGTH=100

Once configured, you can connect to Browserless using your preferred integration method.

Making Your First Connection

Depending on the library you use, here’s how you can establish your first connection:

  • Puppeteer Connection
<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> puppeteer.<span class="hljs-title function_">connect</span>({
  <span class="hljs-attr">browserWSEndpoint</span>: <span class="hljs-string">&#x27;wss://chrome.browserless.io?token=YOUR-API-TOKEN&#x27;</span>
});
  • Playwright Integration
<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> playwright.<span class="hljs-property">firefox</span>.<span class="hljs-title function_">connect</span>(
  <span class="hljs-string">`wss://production-sfo.browserless.io/firefox/playwright?token=<span class="hljs-subst">${TOKEN}</span>&amp;proxy=residential`</span>
);
  • REST API Access
curl -X POST https://chrome.browserless.io/content
  -H <span class="hljs-string">&#x27;Content-Type: application/json&#x27;</span>
  -H <span class="hljs-string">&#x27;Authorization: Basic YOUR-BASE64-TOKEN&#x27;</span>
  -d <span class="hljs-string">&#x27;{ &quot;url&quot;: &quot;https://example.com/&quot;}&#x27;</span>

Browserless V2 improves reliability with two regional endpoints: the US West Coast (production-sfo.browserless.io) and Europe (production-lon.browserless.io). These endpoints handle session isolation, manage concurrent requests, and recover from crashes automatically. They also clean up inactive sessions after 30 seconds by launching a fresh browser instance for every new session [4].

Main Features

Headless Browser Basics

Browserless Chrome operates without a graphical interface, running in a headless mode. It automatically starts new browser instances for incoming requests, ensuring efficient resource use.

Here’s a quick overview of its key features:

FeatureDescriptionBenefit
Session IsolationIndependent browser sessionsLowers infrastructure costs
Automatic RecoveryRestarts after crashesKeeps operations running
Resource OptimizationEfficient use of memory and CPUBoosts overall performance

Beyond these essentials, Browserless is designed to handle multiple tasks at the same time with ease.

Multi-task Processing

With over 2 million sessions handled, Browserless Chrome has generated millions of screenshots, PDFs, and test results [5]. Its smart queue management system ensures requests are processed without overloading resources, maintaining consistent performance. This has proven especially useful for companies like Samsara, which switched from an in-house testing service to Browserless for better scalability.

"Browserless boasts a range of features designed to simplify and accelerate web browser automation tasks. With its robust API and ability to handle parallel operations, Browserless stands out as a leader in the automation space." – Elest.io

Browserless doesn’t just excel at multitasking - it also simplifies automation workflows with ready-to-use APIs.

Ready-to-use API Functions

Browserless offers APIs tailored for common automation needs, enhancing its core functionality:

  • Web Scraping API: Extracts structured JSON data from webpage elements.
  • Unblock API: Fetches HTML content after running JavaScript.
  • Function API: Executes custom Puppeteer code with ESM module imports.

These APIs have delivered real-world results:

"We started using another scraping company's headless browsers to run Puppeteer scripts. But, it required a Vercel upgrade due to slow fetch times, and the proxies weren't running correctly. I found Browserless and had our Puppeteer code running within an hour. The scrapes are now 5x faster and 1/3rd of the price, plus the support has been excellent." – Nicklas Smit, Full-Stack Developer, Takeoff Copenhagen [2]

"We built a scraping tool to train our chatbots on public website data, but it quickly got complicated due to edge cases and bot detection. I found Browserless and set aside a day for the integration, but it only took a couple of hours. I didn't need to become an expert in managing proxy servers or virtual computers, so now I can stay focused on core parts of the business." – Mike Heap, Founder, My AskAI [2]

What is browserless?

sbb-itb-23997f1

Library Integration Guide

Browserless Chrome works seamlessly with major automation libraries, offering performance and reliability. Here's how you can integrate it with some of the most popular tools.

Puppeteer Integration

Switching to Browserless in Puppeteer is simple - just replace puppeteer.launch() with puppeteer.connect() [6].

Setup TypeCode StructureAdvantages
Traditional PuppeteerUses puppeteer.launch()Consumes local resources
Browserless PuppeteerUses puppeteer.connect()Optimized for the cloud
Enhanced BrowserlessCustom launch argumentsAdvanced configurations

You can also pass custom launch arguments via the WebSocket endpoint:

<span class="hljs-keyword">const</span> launchArgs = <span class="hljs-title class_">JSON</span>.<span class="hljs-title function_">stringify</span>({
  <span class="hljs-attr">args</span>: [<span class="hljs-string">&#x27;--window-size=1920,1080&#x27;</span>],
  <span class="hljs-attr">stealth</span>: <span class="hljs-literal">true</span>,
  <span class="hljs-attr">timeout</span>: <span class="hljs-number">5000</span>
});
<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> puppeteer.<span class="hljs-title function_">connect</span>({
  <span class="hljs-attr">browserWSEndpoint</span>: <span class="hljs-string">`wss://production-sfo.browserless.io/?token=YOUR_API_TOKEN_HERE&amp;launch=<span class="hljs-subst">${launchArgs}</span>`</span>
});

This setup supports advanced configurations while maintaining simplicity.

Playwright Integration

Browserless works equally well with Playwright. Here's an example of how to connect using Firefox:

<span class="hljs-comment">// Firefox implementation with Playwright Protocol</span>
<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> playwright.<span class="hljs-property">firefox</span>.<span class="hljs-title function_">connect</span>(
  <span class="hljs-string">&#x27;wss://production-sfo.browserless.io/firefox/playwright?token=YOUR_API_TOKEN_HERE&#x27;</span>
);

For developers using Python, Browserless ensures a consistent experience:

<span class="hljs-keyword">with</span> sync_playwright() <span class="hljs-keyword">as</span> p:
  browser = p.firefox.connect(<span class="hljs-string">&#x27;wss://production-sfo.browserless.io/firefox/playwright?token=YOUR_API_TOKEN_HERE&#x27;</span>)
  context = browser.new_context()

This cross-language compatibility makes it easy to integrate Browserless into various workflows.

Selenium Integration

For Selenium, use the following Ruby configuration to connect to Browserless:

caps = <span class="hljs-title class_">Selenium</span><span class="hljs-symbol">:</span><span class="hljs-symbol">:WebDriver</span><span class="hljs-symbol">:</span><span class="hljs-symbol">:Remote</span><span class="hljs-symbol">:</span><span class="hljs-symbol">:Capabilities</span>.chrome(<span class="hljs-string">&quot;goog:chromeOptions&quot;</span> =&gt; {
  <span class="hljs-string">&quot;args&quot;</span> =&gt; [
    <span class="hljs-string">&quot;--disable-dev-shm-usage&quot;</span>,
    <span class="hljs-string">&quot;--disable-extensions&quot;</span>,
    <span class="hljs-string">&quot;--headless&quot;</span>,
    <span class="hljs-string">&quot;--no-sandbox&quot;</span>
  ]
})

You can establish the WebDriver connection using a simple URL format:

driver = <span class="hljs-title class_">Selenium</span><span class="hljs-symbol">:</span><span class="hljs-symbol">:WebDriver</span>.<span class="hljs-keyword">for</span> <span class="hljs-symbol">:remote</span>, 
  <span class="hljs-symbol">url:</span> <span class="hljs-string">&quot;https://[email protected]/webdriver&quot;</span>,
  <span class="hljs-symbol">desired_capabilities:</span> caps

This setup ensures secure and efficient operation, leveraging sandboxing and other resource-saving features. Always close browser instances after use to avoid memory leaks and optimize resource usage.

Performance Tips

When working with Browserless Chrome, managing performance is key to maintaining efficiency. With the platform handling nearly 5 million headless sessions weekly [8], careful resource and security management is essential for smooth operations at this scale.

Resource Management

Efficiently managing resources starts with how browser instances are handled. Instead of creating a new instance for every task, reuse existing instances to cut down on the overhead of starting new sessions:

<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> puppeteer.<span class="hljs-title function_">connect</span>({ 
  <span class="hljs-attr">browserWSEndpoint</span>: <span class="hljs-string">&#x27;wss://chrome.browserless.io?token=YOUR-TOKEN&#x27;</span> 
});
<span class="hljs-comment">// Reuse the instance by disconnecting instead of closing</span>
<span class="hljs-keyword">await</span> browser.<span class="hljs-title function_">disconnect</span>();

Another effective tactic is blocking unnecessary assets to reduce resource use. Here's a breakdown:

Resource TypeImpact on PerformanceRecommended Action
ImagesConsumes high bandwidthBlock using page.setRequestInterception()
CSS FilesUses extra memoryDisable unless critical for layout
FontsSlows loadingBlock external font requests

For instance, Browserless.io reported a performance improvement in September 2024, where blocking these resources reduced execution time from 2,114 ms to 1,676 ms [10].

Handling High Traffic

Once resources are optimized, the next step is managing high traffic effectively. Horizontal scaling is more reliable than depending on a few large instances.

"Chrome is really really good at using full system resources, and loves to use equal parts CPU and memory for most things" [8]

To handle high-volume demands, consider these strategies:

  • Use Nginx for load balancing across multiple smaller Browserless instances.
  • Enable pre-request health checks with PRE_REQUEST_HEALTH_CHECK=true and limit concurrent sessions using MAX_CONCURRENT_SESSIONS=10.
  • Ensure proper process termination to avoid lingering "zombie" processes.

"Regardless of where or how you're running your headless sessions, it's important to kill Chrome with the fire of thousand suns" [8]

Security Setup

A secure setup not only protects your data but also ensures consistent performance under heavy loads. Here's how to secure your deployment:

  • Store API keys as hashed values in secure environments.
  • Use IP restrictions to control access.
  • Enable role-based access management.
  • Apply rate limiting to API endpoints.

For Docker deployments, set resource limits to avoid overloading:

docker run -e MAX_CONCURRENT_SESSIONS=10 \
    -e CONNECTION_TIMEOUT=30000 \
    --memory=2g \
    --cpu-shares=1024 \
    browserless/chrome

For handling untrusted code, use the vm2 module to create isolated environments. This approach prevents CPU-intensive attacks. Since March 5, 2018, Browserless.io has been using dumb-init within Docker containers to manage process termination effectively [9].

Summary

Browserless Chrome simplifies automation by taking over the heavy lifting of infrastructure tasks, which used to take up a significant chunk of developers' time - around 60%. By isolating Chrome from core services, it ensures better load balancing, scalability, and error management. One notable example is Samsara, which revamped its Puppeteer-based testing by removing the hassle of maintaining specialized infrastructure. This allowed their engineers to focus more on building their core product instead of worrying about backend operations [1].

Here’s a snapshot of what makes Browserless Chrome a game-changer:

FeatureBusiness Impact
Infrastructure SeparationPrevents Chrome-related issues from disrupting the entire service [11]
Built-in Load BalancingAllows for effortless scaling without extra setup
Bot Detection AvoidanceBoosts success rates for web automation tasks [1]
REST API IntegrationMakes tasks like PDF creation and screenshot generation much easier [1]

These features make switching to Browserless Chrome a practical and efficient choice for automation needs.

Getting Started Steps

Want to integrate Browserless Chrome into your workflow? Here’s how you can get started:

  1. Choose an Integration Method: Before diving in, test the functionality with the online debugger. Then, decide between Puppeteer, Playwright, or Selenium based on your current tools [7].
  2. Update Your Setup: Replace your local Puppeteer launch by connecting to Browserless. Simply update your code to use puppeteer.connect() with your Browserless endpoint.
  3. Track Performance: Use Browserless's built-in tools like health checks and queue metrics to keep an eye on performance [1].

Related posts

Raian

Researcher, Nocode Expert

Author details →