A low-code platform blending no-code simplicity with full-code power 🚀
Get started free
March 2, 2025
•
10
min read

Python Headless Browser: Best Libraries for Automation

George Miloradovich
Researcher, Copywriter & Usecase Interviewer
Table of contents

Headless browsers let you automate web tasks without showing a visible browser window. They're faster, use fewer resources, and are great for web scraping, testing, and more. Python offers several libraries for headless browser automation, each with unique strengths:

  • Selenium (2004): Works with multiple browsers, mature ecosystem, great for legacy systems.
  • Playwright (2020): Modern, async support, fast, and ideal for modern web apps.
  • Pyppeteer (2017): Lightweight, Chromium-only, great for quick scripts.
  • Requests-HTML: Simple, fast, and best for static content scraping.

Quick Comparison

Feature Selenium Playwright Pyppeteer Requests-HTML
Browser Support Chrome, Firefox, IE Chrome, Firefox, WebKit Chromium only Chromium (for JS)
Async Support No Yes Yes No
Resource Usage High Medium Medium Low
Best For Legacy systems Modern web apps Quick scripts Static content

If you need broad browser support, go with Selenium. For modern apps and better performance, Playwright is a better choice. Pyppeteer is ideal for quick tasks, while Requests-HTML excels in lightweight static scraping. Pick the one that fits your project needs.

What is a headless browser? How do you run Headless Chrome?

1. Selenium

Selenium

Selenium, first introduced in 2004, is a well-established tool for browser automation, offering support across multiple browsers and advanced automation features.

Installation and Setup

To get started, install Selenium using pip:

pip install selenium

For setting up a headless Chrome browser:

from selenium import webdriver
from selenium.webdriver.common.by import By

options = webdriver.ChromeOptions()
options.add_argument("--headless=new")
driver = webdriver.Chrome(options=options)

Browser Support and Features

Selenium 4 and newer versions bring automatic WebDriver management and compatibility with both WebDriver protocol and Chrome DevTools Protocol (CDP). It supports three major browsers in headless mode, each with its strengths:

Browser Highlights Best Use Case
Chrome Fast execution, developer tools General automation, web scraping
Firefox Strong privacy, reliable rendering Security-focused tasks
Edge Windows integration, Chromium base Windows-specific automation

Performance Optimization

To improve Selenium's performance, consider these strategies:

  • Resource Management
    Disable unnecessary resources (like images), set page load timeouts, and use dynamic waits to reduce delays.
  • Efficient Element Location
    Use precise methods to locate elements for faster interaction:
    element = driver.find_element(By.ID, "search-input")
    
  • Browser Instance Management
    Manage browser instances carefully to avoid resource drain:
    driver.set_page_load_timeout(30)
    driver.quit()  # Clean up resources
    

Advanced Features

Selenium offers several advanced capabilities:

  • Bypassing anti-bot detection using tools like Undetected ChromeDriver
  • Cross-browser testing
  • Network control for deeper automation
  • JavaScript execution for custom interactions

Although Selenium may require more setup compared to tools like Playwright, its extensive browser support and compatibility with older systems, including Internet Explorer, make it a solid choice for complex automation projects. Its mature ecosystem ensures reliability for a wide range of use cases.

2. Playwright

Playwright

Playwright, developed by Microsoft, provides a fast and reliable way to automate headless browsers by directly communicating with the Chrome DevTools protocol.

Installation and Setup

To get started with Playwright, install it using pip and set up the required browser binaries:

pip install playwright
playwright install  # Installs browser binaries

Here’s an example of a basic script:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    # Add your automation tasks here
    browser.close()

Once installed, you can explore Playwright's capabilities and performance.

Performance and Features

Playwright stands out by using efficient WebSocket-based communication, unlike Selenium's traditional methods. In performance tests, Playwright completed 100 iterations in 290.37 ms, compared to Selenium's 536.34 ms .

Some key features include:

  • Auto-waiting: Automatically waits for elements to be ready, reducing the need for manual timeouts.
  • Video recording: Built-in support for recording debugging sessions.
  • Cross-browser support: Works with Chromium, Firefox, and WebKit.
  • Isolated browser contexts: Ensures test isolation by separating browser sessions.

Browser Support Comparison

Here’s a quick look at headless mode support across browsers in Playwright:

Browser Headless Mode
Chromium Enabled by default
Firefox Supported
WebKit Supported

Best Practices

To get the most out of Playwright, follow these tips:

  • Leverage Built-in Waiting

Instead of hardcoding delays, use Playwright's auto-waiting:

# Avoid time.sleep()
page.wait_for_selector('#element')
  • Use Browser Contexts

Browser contexts provide a clean slate for each test:

context = browser.new_context()
page = context.new_page()
# Perform tasks within this context
context.close()

Properly managing browser instances is especially important in environments with multiple threads.

Threading Considerations

Since Playwright’s API isn’t thread-safe, you’ll need a separate instance for each thread :

def thread_function():
    with sync_playwright() as p:
        browser = p.chromium.launch()
        # Perform thread-specific tasks
        browser.close()

Playwright is well-suited for modern web automation projects. Its debugging tools and code generator can save developers time compared to older frameworks. Though its community size (116K GitHub repositories) is smaller than Selenium's (283K repositories) , its rapid growth and Microsoft's support indicate a promising future.

sbb-itb-23997f1

3. Pyppeteer

Pyppeteer

Pyppeteer is an unofficial Python port of Puppeteer, designed for automating Chromium-based browsers. Despite its small size, it offers powerful tools for web automation.

Installation and Basic Setup

To use Pyppeteer, you’ll need Python 3.6 or later. Install it via pip with the following commands:

pip install pyppeteer
pyppeteer-install  # Downloads Chromium (~150MB)

Here’s a simple script showcasing its asynchronous features:

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    await page.goto('https://example.com')
    await page.screenshot({'path': 'screenshot.png'})
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

Performance Insights

Tests indicate that Pyppeteer runs about 30% faster than Playwright for shorter scripts . Its asynchronous design also makes it efficient when handling multiple tasks at the same time.

Key Features and Limitations

Feature Details
Browser Support Chromium only
Async Support Built-in
JavaScript Rendering Fully supported
Memory Usage Lower compared to Selenium
Installation Size Compact (~150MB with Chromium)
Cross-browser Testing Not supported

Performance Optimization Tips

To improve Pyppeteer's performance, reuse the same browser instance for multiple tasks instead of opening new instances:

browser = await launch()

for task in tasks:
    page = await browser.newPage()
    # Perform operations
    await page.close()
await browser.close()

This approach can help reduce overhead and speed up your scripts.

Error Handling

One common issue is the "Browser Closed Unexpectedly" error, which is often caused by missing Chromium dependencies . Running pyppeteer-install ensures all necessary components are in place.

"Pyppeteer is a tool to automate a Chromium browser with code, allowing Python developers to gain JavaScript-rendering capabilities to interact with modern websites and simulate human behavior better." - ZenRows

Since it only supports Chromium, Pyppeteer is best suited for projects focused on Chrome-based web scraping and automation. It’s a great choice if cross-browser testing isn’t a priority.

4. Requests-HTML

Requests-HTML

Requests-HTML is a lightweight tool for web scraping that combines the simplicity of Requests with powerful HTML parsing capabilities. It's particularly fast and efficient when working with static content.

Installation and Setup

To use Requests-HTML, ensure you have Python 3.6 or later. Install it with:

pip install requests-html

If you enable JavaScript rendering for the first time, the library will automatically download Chromium (~150MB) to your home directory (~/.pyppeteer/).

Performance Benchmarks

Requests-HTML outperforms browser-based tools like Selenium when it comes to speed. Here’s a comparison from recent tests :

Operation Type Requests-HTML Selenium
API Requests 0.11s ± 0.01s 5.16s ± 0.04s
Text Extraction 0.28s ± 0.01s 5.32s ± 0.09s

This data highlights how Requests-HTML excels in tasks requiring quick responses.

Key Features and Capabilities

Here’s a quick example of how to use Requests-HTML:

from requests_html import HTMLSession

session = HTMLSession()
r = session.get('https://example.com')

r.html.links           # Extract all links
r.html.absolute_links  # Extract absolute URLs

# Enable JavaScript rendering
r.html.render()        

Some of its standout features include:

  • CSS Selectors (similar to jQuery)
  • XPath support
  • Automatic redirect handling
  • Connection pooling
  • Cookie persistence
  • Mocked user-agent strings for flexibility

Performance Optimization Tips

To get the best performance:

  • Limit JavaScript rendering to reduce Chromium overhead.
  • Reuse session objects for multiple requests.
  • Opt for CSS selectors over XPath for simpler and faster queries.

Limitations and Use Cases

Aspect Details
JavaScript Support Available but must be explicitly enabled
Memory Usage Low for static content; higher with JS rendering
Authentication Requires manual setup
CAPTCHA Handling Limited functionality

"Use requests if you need a fast, lightweight, and reliable way to fetch static web content or API data." - Joseph McGuire

Requests-HTML is ideal for tasks where speed and resource efficiency are key. For example, scraping static web pages takes just milliseconds, compared to several seconds with tools like Selenium .

Resource Optimization

Requests-HTML minimizes bandwidth usage by loading only the resources you request. This can significantly lower proxy costs for projects that rely on bandwidth-based pricing models . Its efficient design not only speeds up execution but also reduces resource consumption.

For projects focused on static content, Requests-HTML offers a lean and efficient solution compared to heavier browser automation tools. This makes it a strong choice in scenarios where speed and resource savings are priorities.

Library Comparison Chart

Here's a detailed comparison of Python headless browser libraries based on their features, performance, and resource efficiency.

Core Features and Capabilities

Feature Selenium Playwright Pyppeteer Requests-HTML
Browser Support Chrome, Firefox, Safari, IE Chrome, Firefox, WebKit Chromium only Chromium (for JS)
JavaScript Support Full Full Full Limited
Async Support No Yes Yes No
Installation Complexity High (WebDriver needed) Medium Medium Low
Resource Usage High Medium Medium Low
Community Size 283K+ repos 116K+ repos Moderate Small

These features provide a snapshot of each library's strengths and limitations, setting the stage for further analysis.

Performance Benchmarks

Benchmark tests highlight key performance differences :

Operation Playwright Selenium Pyppeteer
Execution Time 290.37ms 536.34ms ~203ms
Resource Intensity Medium High Medium
Memory Usage Moderate High Moderate

Playwright and Pyppeteer show faster execution times compared to Selenium, with Pyppeteer leading in short script performance.

Development and Debugging Features

Debugging tools and development support vary greatly among these libraries:

Feature Selenium Playwright Pyppeteer Requests-HTML
Debugging Tools Basic Advanced Basic Limited
Auto-wait Features Manual Built-in Basic N/A
Cross-platform Support Yes Yes Limited Yes
Tech Support Community Documentation + Community Limited Basic

Playwright stands out with advanced debugging tools and built-in auto-wait features, making it ideal for complex projects.

Use Case Optimization

Different libraries excel in specific scenarios:

Use Case Recommended Library Why
Legacy Systems Selenium Broad browser compatibility
Modern Web Apps Playwright Async support and faster execution
Static Content Requests-HTML Lightweight and efficient
Quick Scripts Pyppeteer Fast execution and balanced features

Each library has its niche, depending on the project's requirements.

Resource Efficiency

Resource usage varies significantly among the libraries:

Library CPU Usage Memory Footprint Bandwidth Efficiency
Selenium High High Moderate
Playwright Medium Medium High
Pyppeteer Medium Medium High
Requests-HTML Low Low Very High

For static content, Requests-HTML is the most efficient, while Playwright balances performance and resource usage for dynamic applications.

Pyppeteer outpaces Playwright in short script execution, running almost 30% faster . However, Playwright's broader browser compatibility and advanced debugging tools make it a better choice for more demanding, enterprise-level tasks.

Which Library Should You Choose?

Selecting the right headless browser library depends on your specific automation needs and technical setup. Based on the comparisons above, here’s how you can decide.

If you're working with modern web applications, Playwright is a strong choice. It outperformed Selenium in benchmarks, completing tasks in just 290.37 milliseconds compared to Selenium's 536.34 milliseconds. Its asynchronous support and advanced debugging tools make it well-suited for handling complex automation tasks.

For enterprise or legacy systems, Selenium is a reliable option. With over 283,000 GitHub repositories dedicated to it, Selenium offers a wealth of community resources, compatibility with older browsers like Internet Explorer, and real device automation.

For environments with limited resources, here’s a quick guide:

Environment Type Recommended Library Key Advantage
Static Content Requests-HTML Low resource usage
Dynamic Content Pyppeteer Lightweight with asynchronous operations

In continuous integration (CI) setups, Playwright shines. It integrates smoothly with platforms like GitHub Actions, supports parallel testing, and helps reduce flaky tests, making it a great fit for CI/CD pipelines.

Ultimately, your choice should focus on your automation goals. Playwright is excellent for modern web automation, while Selenium offers broader browser support and real device testing options.

Related Blog Posts

Related Blogs

Use case

Backed by