Technology Apr 30, 2026 · 6 min read

Why your `[^<]+` regex is silently breaking on React SSR output

Picture this. You've shipped a programmatic SEO site, a few thousand pages of templated content. Google flags 14 URLs as soft 404s. You write a quick diagnostic: hit each URL, fetch the SSR HTML, check for a few content markers (a price string, a state average, a section header). Confirm what's real...

DE
DEV Community
by GasPriceCheck
Why your `[^<]+` regex is silently breaking on React SSR output

Picture this. You've shipped a programmatic SEO site, a few thousand pages of templated content. Google flags 14 URLs as soft 404s. You write a quick diagnostic: hit each URL, fetch the SSR HTML, check for a few content markers (a price string, a state average, a section header). Confirm what's really rendering, fix what's missing, move on.

That was the plan. Forty minutes in, my script told me 0 of 14 pages had any EIA price data rendered. None. I was about to dig into the data fetching layer when something nagged at me. I curled one of the URLs by hand. The price was right there in the HTML. Plain text, easy to spot.

The script was lying. The regex was lying.

It took me longer than I want to admit to figure out what was happening, so here's the writeup so you don't burn the same hour.

The regex that "worked"

The diagnostic was straightforward. For each ZIP code page, I wanted to detect whether the EIA state average price was rendered in the SSR HTML. The component looked roughly like this:

<p>
  Texas average: ${eiaAverage} per gallon as of {eiaDate}.
</p>

So I grepped the SSR output for the marker pattern:

const html = await fetch(url).then(r => r.text());
const match = html.match(/Texas average: \$([^<]+) per gallon/);

The capture group uses [^<]+ to pick up "everything until the next tag." Standard pattern. I've used variations of this in dozens of throwaway scrapers.

Across all 14 URLs, this regex returned no match. Not "match with empty capture." No match at all.

Meanwhile, when I curled the same URL and read the response, the rendered text was right there:

Texas average: $2.97 per gallon as of 4/22/2026.

How does a regex miss text that's literally in the string?

What React is actually putting in your HTML

Here's the thing nobody told me about server-rendered React. When you have multiple adjacent text expressions inside a single element, React inserts an HTML comment as a hydration boundary marker. The actual SSR output for that paragraph looked like:

<p>Texas average: $<!-- -->2.97<!-- --> per gallon as of <!-- -->4/22/2026<!-- -->.</p>

Each {expression} interpolation gets bracketed by <!-- -->. This is intentional. React uses these markers during hydration to know where one text node ends and the next one begins. Without them, React can't reconcile the SSR text with what its virtual DOM says should be there, because adjacent text nodes get merged in the browser.

So when my regex hits Texas average: $, the next character is < (start of the comment). The [^<]+ capture group requires at least one non-< character. It fails immediately. The regex moves on, finds no other "Texas average:" anchor in the page, and reports no match.

The price ($2.97) is in the HTML. It's just not adjacent to the anchor text. There's a comment node between them.

The fix

Strip the comments before any content matching. Two lines:

const html = await fetch(url).then(r => r.text());
const stripped = html.replace(/<!--[\s\S]*?-->/g, '');
const match = stripped.match(/Texas average: \$([^<]+) per gallon/);

Use [\s\S] rather than . because comments can span multiple lines and . in JavaScript doesn't match newlines without the s flag (and s flag support is patchy enough that I just default to [\s\S]).

Once I added that one line, the regex worked on all 14 URLs. The EIA data was rendering correctly the whole time. My "soft 404 root cause" was a bug in my diagnostic, not a real content gap.

Why this gotcha matters more than you think

The reason I'm writing this up: the failure mode is silent and high-impact.

If your regex returned an obvious garbage value, you'd debug it in five minutes. But mine returned no match at all. That's the same return value as "this content is genuinely missing." Which is the exact thing I was trying to detect. The diagnostic was indistinguishable from the bug it was looking for.

I trusted the output. I started forming hypotheses based on the output. "The EIA fetch must be failing on the server. Let me check my fallback chain. Let me check Redis." I burned an hour on those wrong paths because the symptom was confirmed by my (broken) detector.

Two takeaways that generalize beyond this exact case.

First, when you grep server-rendered React HTML, always strip comments first. Any regex with [^<]+, \w+, or word-boundary anchors will trip on the hydration markers. Some browser View Source viewers hide comments by default, which is part of why this gotcha is invisible. View the raw response with curl -s URL | head -200 and look for the <!-- --> pattern. You'll see them everywhere, especially on text-heavy pages with lots of variable interpolation.

Second, validate your diagnostic before you trust its output. Run it against a known-good page. If your "missing content" detector reports content as missing on a page where you can visually confirm the content exists, your detector is broken, not the content. I should have done this sanity check before chasing fix hypotheses. I didn't, because the detector was "obviously simple."

This bit me on a Next.js 15 / React 18 project. I checked: the same hydration-comment behavior is documented in the Next.js docs as part of how React handles hydration. It's not going away. If you're parsing SSR HTML programmatically, assume comments are everywhere.

One last thing worth knowing

There's a sister gotcha. If you're using cheerio or another DOM parser instead of regex, you don't have this problem. The DOM parser walks the tree, and adjacent text nodes are joined when you call .text(). So $('p').text() returns "Texas average: $2.97 per gallon as of 4/22/2026." with no comment artifacts.

But if you're using regex (faster, simpler for one-off scripts), strip first, match second. Or write a one-line helper:

const stripComments = (html) => html.replace(/<!--[\s\S]*?-->/g, '');

Drop it at the top of every SSR-parsing script you write. Future-you will thank you.

That's the whole post. One bug, one fix, one hour I'm not getting back. Hope I saved you the same hour.

If you've hit a similar silent-failure-mode debugging story, drop it in the comments. I collect these.

DE
Source

This article was originally published by DEV Community and written by GasPriceCheck.

Read original article on DEV Community
Back to Discover

Reading List