Latenode

Cache Management in Puppeteer: Disabling, Clearing, and Performance Optimization

Optimize Puppeteer performance by effectively managing cache: disable, clear, and implement smart caching techniques for faster automation.

RaianRaian
Cache Management in Puppeteer: Disabling, Clearing, and Performance Optimization

Want faster Puppeteer automation? Managing browser cache is key. This guide covers how to disable, clear, and optimize cache for better performance.

Key Takeaways:

  • Disabling Cache: Use setCacheEnabled(false) or browser launch flags like --disable-cache to simulate fresh page loads.
  • Clearing Cache: Use Network.clearBrowserCache via Chrome DevTools Protocol (CDP) for clean test environments.
  • Smart Caching: Reduce data transfer by up to 92% with custom caching logic and in-memory storage.
  • Performance Boost: Block unnecessary resources like images or ads to speed up tests and save bandwidth.

Efficient cache management can dramatically reduce data usage, improve test accuracy, and speed up automation workflows. Dive in to learn how!

Puppeteer Tutorial #4 | Launch Browser with Options

Turning Off Cache in Puppeteer

Disabling the cache in Puppeteer can be helpful for testing and automation tasks where fresh page loads are needed. Here's how you can do it and what to keep in mind.

Using the setCacheEnabled() Method

You can turn off caching in Puppeteer with the setCacheEnabled() method:

<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">setCacheEnabled</span>(<span class="hljs-literal">false</span>);

Run this command before navigating to any page. By default, caching is on, so you need to disable it when your tests require a clean load of resources. For a more browser-wide solution, check out the next section.

Browser Launch Flags for Cache

To disable caching at the browser level, launch Chromium with specific flags:

<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> puppeteer.<span class="hljs-title function_">launch</span>({
    <span class="hljs-attr">args</span>: [<span class="hljs-string">&#x27;--disable-cache&#x27;</span>]
});

This method works well when you need to control caching for the entire browser session, complementing the setCacheEnabled() approach.

What Happens When You Disable Cache?

When the cache is off, every resource is downloaded fresh, which can slow things down and increase data usage. For example, tests on CNN's website showed an 88% jump in data transfer when caching was disabled [1]. To strike a balance between accuracy and performance, consider these tips:

  • Use Chrome DevTools to check if the page content is cacheable.
  • Add in-memory caching for specific resources if applicable.
  • Only disable the cache when your test scenario demands it.
  • Keep an eye on network reliability when the cache is off.

Disabling the cache is great for simulating first-time user behavior, but weigh the trade-offs based on your testing goals.

Removing Cache Data in Puppeteer

Automated tests often need a cleared cache to maintain consistent results.

Clearing Cache with setCacheEnabled()

You can clear cache data using Chrome DevTools Protocol (CDP) commands:

<span class="hljs-keyword">const</span> client = <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">target</span>().<span class="hljs-title function_">createCDPSession</span>();
<span class="hljs-keyword">await</span> client.<span class="hljs-title function_">send</span>(<span class="hljs-string">&#x27;Network.clearBrowserCache&#x27;</span>);
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">setCacheEnabled</span>(<span class="hljs-literal">false</span>);

This approach clears the browser cache and disables caching, ensuring a clean slate for your automation tasks.

You can also clear both cache and cookies together:

<span class="hljs-keyword">const</span> client = <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">target</span>().<span class="hljs-title function_">createCDPSession</span>();
<span class="hljs-keyword">await</span> client.<span class="hljs-title function_">send</span>(<span class="hljs-string">&#x27;Network.clearBrowserCache&#x27;</span>);
<span class="hljs-keyword">await</span> client.<span class="hljs-title function_">send</span>(<span class="hljs-string">&#x27;Network.clearBrowserCookies&#x27;</span>);

Handling Specific Storage Types

Sometimes, you might need to clear specific stored data instead of the entire cache. Here's how you can manage cookies:

<span class="hljs-comment">// Clear all cookies</span>
<span class="hljs-keyword">const</span> cookies = <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">cookies</span>();
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">deleteCookie</span>(...cookies);

<span class="hljs-comment">// To delete a specific cookie, use:</span>
<span class="hljs-comment">// await page.deleteCookie({ name: &#x27;cookie_name&#x27;, url: &#x27;https://example.com&#x27; });</span>

<span class="hljs-comment">// Set cookies to expire</span>
<span class="hljs-keyword">const</span> cookies = <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">cookies</span>();
<span class="hljs-keyword">for</span> (<span class="hljs-keyword">let</span> cookie <span class="hljs-keyword">of</span> cookies) {
    cookie.<span class="hljs-property">expires</span> = -<span class="hljs-number">1</span>;
}
<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">setCookies</span>(...cookies);

This allows precise control over cookie management during your tests.

Managing Cache in Multiple Tabs

When working with multiple tabs, it's a good idea to isolate cache data by using separate browser contexts. Here's how:

<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> puppeteer.<span class="hljs-title function_">launch</span>();
<span class="hljs-keyword">const</span> context = <span class="hljs-keyword">await</span> browser.<span class="hljs-title function_">createIncognitoBrowserContext</span>();
<span class="hljs-keyword">const</span> page = <span class="hljs-keyword">await</span> context.<span class="hljs-title function_">newPage</span>();

<span class="hljs-keyword">const</span> client = <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">target</span>().<span class="hljs-title function_">createCDPSession</span>();
<span class="hljs-keyword">await</span> client.<span class="hljs-title function_">send</span>(<span class="hljs-string">&#x27;Network.clearBrowserCache&#x27;</span>);

<span class="hljs-comment">// Close the context after tasks are done</span>
<span class="hljs-keyword">await</span> context.<span class="hljs-title function_">close</span>();

Using separate contexts prevents cache interference between tabs, making it ideal for running parallel tests.

sbb-itb-23997f1

Cache Settings for Better Speed

Managing cache effectively in Puppeteer can cut data transfer by up to 92% [1], making automation much faster.

Smart Cache Usage

To balance speed and up-to-date data, you can intercept requests and responses to implement smarter caching. Here's an example:

<span class="hljs-keyword">const</span> cache = <span class="hljs-keyword">new</span> <span class="hljs-title class_">Map</span>();

<span class="hljs-keyword">async</span> <span class="hljs-keyword">function</span> <span class="hljs-title function_">handleRequest</span>(<span class="hljs-params">request</span>) {
    <span class="hljs-keyword">const</span> url = request.<span class="hljs-title function_">url</span>();
    <span class="hljs-keyword">if</span> (cache.<span class="hljs-title function_">has</span>(url)) {
        <span class="hljs-keyword">const</span> cachedResponse = cache.<span class="hljs-title function_">get</span>(url);
        <span class="hljs-keyword">if</span> (<span class="hljs-title function_">isFresh</span>(cachedResponse)) {
            <span class="hljs-keyword">return</span> request.<span class="hljs-title function_">respond</span>(cachedResponse);
        }
    }

    <span class="hljs-comment">// Continue the request if it&#x27;s not cached</span>
    request.<span class="hljs-title function_">continue</span>();
}

<span class="hljs-keyword">async</span> <span class="hljs-keyword">function</span> <span class="hljs-title function_">handleResponse</span>(<span class="hljs-params">response</span>) {
    <span class="hljs-keyword">const</span> headers = response.<span class="hljs-title function_">headers</span>();
    <span class="hljs-keyword">if</span> (headers[<span class="hljs-string">&#x27;cache-control&#x27;</span>] &amp;&amp; headers[<span class="hljs-string">&#x27;cache-control&#x27;</span>].<span class="hljs-title function_">includes</span>(<span class="hljs-string">&#x27;max-age&#x27;</span>)) {
        <span class="hljs-keyword">const</span> responseData = {
            <span class="hljs-attr">status</span>: response.<span class="hljs-title function_">status</span>(),
            <span class="hljs-attr">headers</span>: headers,
            <span class="hljs-attr">body</span>: <span class="hljs-keyword">await</span> response.<span class="hljs-title function_">buffer</span>()
        };
        cache.<span class="hljs-title function_">set</span>(response.<span class="hljs-title function_">url</span>(), responseData);
    }
}

This setup minimizes unnecessary network requests while keeping essential data updated by validating the cache-control header.

Building Custom Cache Rules

Tailor caching to your needs by creating specific rules. For instance:

<span class="hljs-keyword">const</span> customCacheRules = {
    <span class="hljs-attr">shouldCache</span>: <span class="hljs-function">(<span class="hljs-params">response</span>) =&gt;</span> {
        <span class="hljs-keyword">const</span> headers = response.<span class="hljs-title function_">headers</span>();
        <span class="hljs-keyword">return</span> headers[<span class="hljs-string">&#x27;cache-control&#x27;</span>] &amp;&amp; 
               headers[<span class="hljs-string">&#x27;cache-control&#x27;</span>].<span class="hljs-title function_">includes</span>(<span class="hljs-string">&#x27;max-age&#x27;</span>) &amp;&amp;
               <span class="hljs-title class_">Number</span>(headers[<span class="hljs-string">&#x27;cache-control&#x27;</span>].<span class="hljs-title function_">match</span>(<span class="hljs-regexp">/max-age=(\d+)/</span>)[<span class="hljs-number">1</span>]) &gt; <span class="hljs-number">0</span>;
    },

    <span class="hljs-attr">getExpirationTime</span>: <span class="hljs-function">(<span class="hljs-params">headers</span>) =&gt;</span> {
        <span class="hljs-keyword">const</span> maxAge = headers[<span class="hljs-string">&#x27;cache-control&#x27;</span>].<span class="hljs-title function_">match</span>(<span class="hljs-regexp">/max-age=(\d+)/</span>)[<span class="hljs-number">1</span>];
        <span class="hljs-keyword">return</span> <span class="hljs-title class_">Date</span>.<span class="hljs-title function_">now</span>() + (<span class="hljs-built_in">parseInt</span>(maxAge) * <span class="hljs-number">1000</span>);
    }
};

These rules help determine which responses to cache and how long to keep them.

Checking Cache Performance

Once your caching rules are in place, evaluate their impact using performance metrics:

<span class="hljs-keyword">const</span> metrics = {
    <span class="hljs-attr">totalRequests</span>: <span class="hljs-number">0</span>,
    <span class="hljs-attr">cachedResponses</span>: <span class="hljs-number">0</span>,
    <span class="hljs-attr">dataSaved</span>: <span class="hljs-number">0</span>
};

<span class="hljs-keyword">async</span> <span class="hljs-keyword">function</span> <span class="hljs-title function_">trackCacheMetrics</span>(<span class="hljs-params">request, response</span>) {
    metrics.<span class="hljs-property">totalRequests</span>++;
    <span class="hljs-keyword">if</span> (response.<span class="hljs-title function_">fromCache</span>()) {
        metrics.<span class="hljs-property">cachedResponses</span>++;
        metrics.<span class="hljs-property">dataSaved</span> += <span class="hljs-built_in">parseInt</span>(response.<span class="hljs-title function_">headers</span>()[<span class="hljs-string">&#x27;content-length&#x27;</span>] || <span class="hljs-number">0</span>);
    }
}

Track key metrics like total requests, cached responses, and data saved. Here's a comparison based on testing [1]:

Metric TypeWithout CacheWith CacheImprovement
Data Transfer177 MB13.4 MB92% reduction

These results highlight how well-designed caching can drastically improve Puppeteer's performance.

Common Issues and Solutions

Fixing Cache Problems

When using Puppeteer, enabling request interception disables the browser's native caching. This can lead to higher data transfer and slower page load times [1]. To address this, you can implement custom caching with the following approach:

<span class="hljs-keyword">const</span> browser = <span class="hljs-keyword">await</span> puppeteer.<span class="hljs-title function_">launch</span>();
<span class="hljs-keyword">const</span> page = <span class="hljs-keyword">await</span> browser.<span class="hljs-title function_">newPage</span>();

<span class="hljs-comment">// Initialize cache storage</span>
<span class="hljs-keyword">const</span> responseCache = <span class="hljs-keyword">new</span> <span class="hljs-title class_">Map</span>();

<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">setRequestInterception</span>(<span class="hljs-literal">true</span>);
page.<span class="hljs-title function_">on</span>(<span class="hljs-string">&#x27;request&#x27;</span>, <span class="hljs-keyword">async</span> request =&gt; {
    <span class="hljs-keyword">const</span> url = request.<span class="hljs-title function_">url</span>();
    <span class="hljs-keyword">if</span> (responseCache.<span class="hljs-title function_">has</span>(url)) {
        <span class="hljs-keyword">await</span> request.<span class="hljs-title function_">respond</span>(responseCache.<span class="hljs-title function_">get</span>(url));
        <span class="hljs-keyword">return</span>;
    }
    request.<span class="hljs-title function_">continue</span>();
});

page.<span class="hljs-title function_">on</span>(<span class="hljs-string">&#x27;response&#x27;</span>, <span class="hljs-keyword">async</span> response =&gt; {
    <span class="hljs-keyword">const</span> url = response.<span class="hljs-title function_">url</span>();
    <span class="hljs-keyword">const</span> headers = response.<span class="hljs-title function_">headers</span>();

    <span class="hljs-keyword">if</span> (headers[<span class="hljs-string">&#x27;cache-control&#x27;</span>] &amp;&amp; headers[<span class="hljs-string">&#x27;cache-control&#x27;</span>].<span class="hljs-title function_">includes</span>(<span class="hljs-string">&#x27;max-age&#x27;</span>)) {
        responseCache.<span class="hljs-title function_">set</span>(url, {
            <span class="hljs-attr">status</span>: response.<span class="hljs-title function_">status</span>(),
            <span class="hljs-attr">headers</span>: headers,
            <span class="hljs-attr">body</span>: <span class="hljs-keyword">await</span> response.<span class="hljs-title function_">buffer</span>()
        });
    }
});

To avoid potential memory leaks, make sure to clean up resources effectively:

<span class="hljs-keyword">async</span> <span class="hljs-keyword">function</span> <span class="hljs-title function_">cleanupResources</span>(<span class="hljs-params">page</span>) {
    <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">removeAllListeners</span>();
    <span class="hljs-keyword">const</span> client = <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">target</span>().<span class="hljs-title function_">createCDPSession</span>();
    <span class="hljs-keyword">await</span> client.<span class="hljs-title function_">send</span>(<span class="hljs-string">&#x27;Network.clearBrowserCache&#x27;</span>);
    <span class="hljs-keyword">await</span> client.<span class="hljs-title function_">detach</span>();
    <span class="hljs-keyword">await</span> page.<span class="hljs-title function_">close</span>();
}

By combining these techniques, you can reduce overhead and improve Puppeteer's performance.

Cache Management Tips

Here are some practical tips for managing cache more effectively, based on testing and analysis:

IssueSolutionImpact
High Data TransferUse in-memory cachingReduces traffic by up to 92% [1]
Resource LeaksApply cleanup proceduresHelps prevent memory exhaustion [3]
Slow Page LoadsBlock unnecessary resourcesImproves rendering speed significantly [2]

For better performance, you can block certain resources like images or stylesheets to speed up page loading:

<span class="hljs-keyword">const</span> browserOptions = {
    <span class="hljs-attr">userDataDir</span>: <span class="hljs-string">&#x27;./cache-directory&#x27;</span>,
    <span class="hljs-attr">args</span>: [
        <span class="hljs-string">&#x27;--disable-background-timer-throttling&#x27;</span>,
        <span class="hljs-string">&#x27;--disable-extensions&#x27;</span>
    ]
};

<span class="hljs-keyword">await</span> page.<span class="hljs-title function_">setRequestInterception</span>(<span class="hljs-literal">true</span>);
page.<span class="hljs-title function_">on</span>(<span class="hljs-string">&#x27;request&#x27;</span>, <span class="hljs-function"><span class="hljs-params">request</span> =&gt;</span> {
    <span class="hljs-keyword">if</span> (request.<span class="hljs-title function_">resourceType</span>() === <span class="hljs-string">&#x27;image&#x27;</span> || request.<span class="hljs-title function_">resourceType</span>() === <span class="hljs-string">&#x27;stylesheet&#x27;</span>) {
        request.<span class="hljs-title function_">abort</span>();
    } <span class="hljs-keyword">else</span> {
        request.<span class="hljs-title function_">continue</span>();
    }
});

Using these strategies can streamline your Puppeteer workflows while keeping resource usage under control.

Conclusion

Efficient cache management in Puppeteer can dramatically improve performance while reducing resource usage. This guide has covered how to disable, clear, and adjust cache settings to achieve better results. Below is a concise summary of the main strategies and their effects.

Summary Points

Testing has shown how effective proper cache management can be [1], emphasizing the importance of handling it carefully.

Here’s a quick look at some key strategies and their outcomes:

StrategyImplementationPerformance Impact
In-Memory CachingCache responses with max-age > 092% reduction in data transfer [1]
Resource BlockingDisable ads and tracking scriptsNoticeable page load improvement [2]
Smart Screenshot TimingUse waitForSelector()Faster rendering completion [2]
Cross-Session CachingConfigure userDataDirRetains CSS/JS/image assets [2]

Key Implementation Tips

  • Asset Optimization: Compress assets and optimize images to minimize HTTP payloads [2].
  • Precise Timing: Take screenshots exactly when content is ready, avoiding unnecessary delays [2].
  • Memory Efficiency: Use Buffer operations instead of file system writes to speed up processing [2].

"When optimizing Puppeteer, remember that there are only so many ways to speed up the startup/shutdown performance of Puppeteer itself. Most likely, the biggest speed gains will come from getting your target pages to render faster." - Jon Yongfook, Founder, Bannerbear [2]

Related posts

Raian

Researcher, Nocode Expert

Author details →