Headless Chromium at scale: four fixes for a fleet that kept eating RAM

The first time a worker died with an OOM kill in the middle of a render, I assumed it was a bad page — some site with an infinite-scroll loop or a 200MB hero video. The second time it happened, on a different worker rendering a different URL, I started paying attention. The third time, a Tuesday morning, every worker in the fleet went down inside a five-minute window.

Headless Chromium leaks memory. Not in a "oh that's a bug, file an issue" way — in a "this is the operating reality of a 30-million-line C++ browser, and you have to plan around it" way. If you run Playwright or Puppeteer in production for more than a few minutes per request, you will eventually meet this reality. This post is the four things I changed in Rendershot — a screenshot and PDF API I run — that took us from "workers crashing twice a day" to "workers running for weeks without intervention."

None of these are clever. They're the boring discipline of treating a browser like a long-lived process, not a function call.

Setup, in one paragraph

Each Rendershot worker is a Docker container running an ARQ (Redis-backed) job queue. Jobs come off the queue, get rendered with Playwright, and the resulting bytes are uploaded and the file path written back to Postgres. Concurrency is bounded; the worker fleet scales horizontally — no shared state between workers, just one Chromium process each.

That last part was the first fix.

Fix 1 — One browser per worker, not per request

The naive way to run Playwright is the way the docs suggest:

async with async_playwright() as p:
    browser = await p.chromium.launch()
    page = await browser.new_page()
    await page.goto(url)
    await page.screenshot(path="out.png")
    await browser.close()

This is fine for a script. It is catastrophic for a server. Launching Chromium takes 300–600ms on a modern Linux box, allocates ~150MB of resident memory before you've even pointed it at a URL, and forks a small army of helper processes (renderer, GPU, network, utility). Tearing it down repeats most of that work.

If your worker handles 10 renders per second, you are spending more time launching and killing browsers than you are rendering anything. And every leaked file descriptor, zombie subprocess, or partially-released shared memory segment compounds.

The fix is to launch the browser once per worker, on startup, and reuse it for every request:

class WorkerSettings:
    on_startup = startup
    on_shutdown = shutdown
    max_jobs = config.settings.browser_max_pages

async def startup(ctx):
    pool = BrowserPool()
    await pool.start()  # launches one Chromium
    ctx['pool'] = pool

async def shutdown(ctx):
    await ctx['pool'].stop()

Each render now creates a page (cheap, ~5ms), uses it, and closes it. The browser stays alive for the lifetime of the worker. Crash isolation is per-container — if a worker's browser dies, we lose that worker, not the fleet.

Fix 2 — Cap concurrent pages with a semaphore (and match it to your job queue)

A persistent browser will happily let you open 50 tabs. It will also happily eat 8GB of RAM doing it.

You need a hard cap on how many pages render concurrently inside one browser. We use an asyncio.Semaphore:

@dataclasses.dataclass
class BrowserPool:
    max_pages: int = 4
    _semaphore: asyncio.Semaphore | None = None

    async def start(self):
        self._semaphore = asyncio.Semaphore(self.max_pages)
        self._browser = await self._playwright.chromium.launch(args=_CHROMIUM_ARGS)

    async def render_screenshot(self, params):
        async with self._semaphore:
            context, page = await self._new_page(params)
            try:
                await self._navigate(page, params)
                return await page.screenshot(...)
            finally:
                await page.close()
                await context.close()

The non-obvious part: the semaphore alone isn't enough. Your job queue needs to match it. ARQ has a max_jobs setting that controls how many tasks the worker pulls off Redis simultaneously. If max_jobs > max_pages, jobs get pulled, hit the semaphore, and wait — eating queue slots that another worker could be servicing.

class WorkerSettings:
    max_jobs = config.settings.browser_max_pages  # match the semaphore

Both numbers tied to the same setting. No oversubscription. The "right" number for both is a function of how much RAM your container has and how heavy your renders are; we tune ours per environment.

Fix 3 — Restart the browser on a schedule, not on failure

This is the one that took us longest to accept.

Chromium's memory growth is not linear. Most pages cause a small bump that gets mostly reclaimed when the page closes. Some pages — a video, a leaky JavaScript framework, a page with a couple thousand DOM nodes — cause a bump that never gets reclaimed. Over hours and tens of thousands of renders, the resident set creeps. By hour 8 you're at 1.5GB. By hour 24 you're getting OOM-killed.

You can chase the leaks. Profile, diff snapshots, file Chromium bugs. Some of these are real bugs that get fixed. Others are by design — V8's garbage collector is not optimised for long-running, multi-tenant browser fleets.

Or you can preempt: every hour, kill the browser and start a fresh one.

async def maybe_restart(self):
    elapsed = time.monotonic() - self._last_restart
    if elapsed < self.restart_interval:
        return
    async with self._lock:
        if time.monotonic() - self._last_restart < self.restart_interval:
            return
        if self._browser:
            await self._browser.close()
        await self._launch_browser()

We call this from an hourly ARQ cron. The lock prevents two coroutines racing into a restart; the double-check inside the lock handles the case where one already won. A restart costs us about 800ms of latency on whichever request is unlucky enough to land during the swap — we accept it as the price of not paging an engineer.

If you can stomach a slightly more aggressive cadence (every 30 min, every 1000 renders), you can probably get away with a smaller container. We tuned to one hour because it's the sweet spot for our workload.

Fix 4 — A fresh `BrowserContext` per render, and close everything in `finally`

You are not just running renders. You are running other people's renders. Different tenants. Different cookies, different basic auth, different custom headers.

A BrowserContext is Playwright's isolation unit — its own cookies, storage, cache. If two tenants share a context, tenant A's session cookie can leak into tenant B's render. This is bad. You make a fresh context per render and you close it after:

async def _new_page(self, params):
    context_kwargs = {
        'viewport': params.get('viewport') or {'width': 1280, 'height': 720},
    }
    if params.get('headers'):
        context_kwargs['extra_http_headers'] = params['headers']
    if params.get('basic_auth'):
        context_kwargs['http_credentials'] = params['basic_auth']

    context = await self._browser.new_context(**context_kwargs)

    if params.get('cookies'):
        await context.add_cookies(params['cookies'])

    page = await context.new_page()
    return context, page

And on the consumer side — always in a finally block:

context, page = await self._new_page(params)
try:
    await self._navigate(page, params)
    return await asyncio.wait_for(
        page.screenshot(...),
        timeout=self.timeout_seconds,
    )
finally:
    await page.close()
    await context.close()

The asyncio.wait_for is a hard cap on render time — without it, a page can hang on networkidle indefinitely and tie up a semaphore slot. With it, we always close. Without it, a single slow page becomes a fleet outage.

Bonus: Chromium launch flags that actually matter

Most "performance flag" lists you'll find online are cargo-culted. Here's the short list that's been load-bearing for us:

_CHROMIUM_ARGS = [
    '--no-sandbox',
    '--disable-setuid-sandbox',
    '--disable-dev-shm-usage',  # use /tmp instead of /dev/shm
    '--disable-gpu',
    '--disable-extensions',
    '--disable-background-networking',
    '--mute-audio',
    '--hide-scrollbars',
]

The most important one is --disable-dev-shm-usage. By default Chromium uses /dev/shm for shared memory between processes; in a container, /dev/shm is typically tiny (64MB), and a busy renderer will OOM the moment it tries to allocate a large pixmap. Routing it to /tmp (which is just regular disk-backed memory) trades a small amount of latency for not crashing.

--no-sandbox and --disable-setuid-sandbox are required if you're running as a non-root user in Docker without the right capabilities. They're a downgrade in defense-in-depth — if you're rendering URLs supplied by your own tenants you should weigh whether to instead grant the container the right caps. For our threat model (tenants render their own URLs, not ours), the tradeoff is acceptable.

What I'd do differently

If I were starting again:

Cap viewport size aggressively at the schema layer, not in the renderer. We started lenient ("let people render at 4K!") and walked it back when one tenant's 8K full-page screenshot used 2GB of RSS for one render.
Track per-render memory, not just per-worker. A page that allocates 800MB before crashing should be killed and the tenant should see a clear error, not a generic 504. We added this later; should have been from day one.
Treat browser restarts as a SLO, not a coincidence. Once we started measuring "% of requests that landed during a restart," we could tune the cadence with data instead of hunches.

Closing

There's nothing magical here. One browser per worker, semaphore-capped concurrency, scheduled restarts, fresh contexts. The discipline is in actually doing all four; skipping any one of them eventually crashes a worker.

If you're running a screenshot API, a PDF generator, an HTML-to-image pipeline, or any other long-running headless-browser workload, the same pattern applies. If you'd rather not run any of this yourself, Rendershot is the API that comes out of the patterns above — free tier of 200 renders/month, no card required.