GitHub Broke Git: The Merge Queue Bug That Silently Deleted Your Code

If you use GitHub's merge queue and had a rough week around April 23rd, 2026, you were not imagining things. Your code actually disappeared. Not because of a bad commit, not because of a rogue team member, but because GitHub itself quietly deleted it.

This is the story of what happened, why it was way worse than the official numbers suggest, and what it means for the way we all trust the tools we build on.

The Day GitHub Stopped Being Git

At 16:05 UTC on April 23rd, 2026, a regression crept into GitHub's merge queue. For the next three and a half hours, engineers around the world were reviewing pull requests, clicking "merge," and watching everything look completely fine. Green checks. Clean diffs. No warnings.

What was actually happening behind the scenes was quietly horrifying.

A PR with a perfectly reasonable +29 / -34 diff would get approved and queued. What landed on main was a commit worth +245 / -1,137. Thousands of lines of code that other engineers had already shipped, reviewed, and moved on from, just gone. And every merge that came after went in on top of that broken history.

The UI showed zero problems. The status page showed no outage. The platform was lying to everyone's faces.

What Actually Went Wrong Under the Hood

GitHub's merge queue works by creating a temporary branch for each PR in the queue. Normally, that temp branch starts from the tip of main plus the PR's diff. CI runs against it, it passes, it lands.

On April 23rd, the queue started building those temp branches from the wrong starting point. Instead of branching from the current tip of main, it was branching from wherever the feature branch had originally diverged from main, potentially dozens or hundreds of commits back.

Then it pushed the entire contents of that temp branch to main.

So if your feature branch was 50 commits behind main when it hit the queue, the "merge" silently removed those 50 commits of other people's work as a side effect of landing yours. CI passed because the temp branch on its own was internally consistent. main blew up because the temp branch had nothing to do with the current state of main.

The root cause? A new code path that adjusted merge base computation was meant to be gated behind a feature flag for an unreleased feature. The gating was incomplete. The new behavior leaked into production and applied to all squash merge groups.

Three things made this bug particularly nasty:

1. The PR UI lied. You reviewed +29/-34. The commit that landed was +245/-1,137. The thing engineers approved was not the thing that merged. That breaks the most fundamental contract of a code review system.

2. It was completely silent. No merge conflict. No failed check. No banner on the PR. Teams only found out when someone noticed code on main that should have been there simply was not.

3. It scaled with repo activity. The faster a repo was merging, the further feature branches had drifted from main, and the more damage each bad merge did. The teams that relied most on merge queue got hit the hardest.

The Human Cost

This was not a theoretical problem. Engineering teams spent entire afternoons in incident mode: combing through commit graphs, reconstructing deleted code by hand, coordinating recovery across multiple repos, and filing support tickets that would take days to hear back on.

One organization reported that every single team running on GitHub's merge queue got hit, with dozens of bad commits each and hundreds of existing commits clobbered before anyone noticed. One company alone claimed to have experienced over 200 ruined PRs.

GitHub later said 2,092 pull requests across 230 repositories were affected during the impact window of April 22 to 23. Earlier messaging from GitHub's COO on X had put the number at 2,804 PRs, and some community members pushed back hard on both figures given what individual companies were experiencing.

The incident was not detected by GitHub's automated monitoring because it affected merge commit correctness rather than availability. GitHub only became aware of the regression at 19:38 UTC, following an increase in customer support inquiries. The fix, a revert and force-deploy, was complete by 20:43 UTC. Three hours and thirty-three minutes of silent corruption.

Why the Status Page Was Useless

Here is the part that stings. If you checked GitHub's status page on April 23rd, you probably saw nothing alarming. There was no major outage reported. No partial outage.

That is because GitHub's status page calculus specifically excludes "Degraded Performance" from downtime numbers. The platform itself never went down. Developers could still push code, open PRs, and click merge. The fact that clicking merge was silently destroying their codebase did not register as an incident on the dashboard.

This is a telling gap. Uptime and correctness are not the same thing. A bank that processes your transactions but records them incorrectly is not "up." GitHub processed the merges. It just produced wrong results. The status page was not built to catch that kind of failure.

This Was Not an Isolated Bad Day

It would be easier to move on from this if it were a one-off. But April 2026 was a genuinely rough stretch for GitHub.

Four days after the merge queue incident, on April 27th, GitHub's Elasticsearch cluster became overloaded, likely from a botnet attack, and search-backed UI surfaces stopped returning results. Pull request lists went blank. Issues disappeared from view. Projects and Actions workflow pages showed nothing. The underlying data was still there, but developers could not see it.

And then, on April 28th, the same morning GitHub's CTO published an apology post about reliability, a separate security disclosure dropped: researchers at Wiz had found a critical remote code execution vulnerability in GitHub's git push pipeline (CVE-2026-3854, CVSS 8.7). A single crafted git push with injected options could reach unsandboxed code execution on GitHub's servers. It was patched in 75 minutes on github.com, but the timing was brutal.

Three significant failures in five days. Merge queue correctness. Search collapse. An RCE in the core git push path.

GitHub's CTO, Vlad Fedorov, acknowledged in the April 28th post that none of this is acceptable. He also revealed the scale of what GitHub is dealing with: the company had planned to scale capacity by 10x in October 2025. By February 2026, projections driven by agentic development workflows (AI coding tools like Copilot, Cursor, and Codex flooding the platform with automated PRs) forced a rethink to a 30x redesign. GitHub is now hitting peaks of 90 million merged PRs and 1.4 billion commits.

The Deeper Architectural Problem

There is a reason this specific failure mode existed. GitHub's merge queue constructs merge commits through a code path that is separate from how a regular PR merge works. Two code paths, two places where behavior can quietly diverge.

This is the danger that comes with delegation. A merge queue is supposed to automate exactly what a human would do when clicking "Merge pull request." The moment it does something a human would not do, because it has its own logic for building the merge commit, it can silently produce commits nobody wrote and nobody approved.

This is not just a GitHub problem. It is a pattern that shows up every time we give automated systems write access to things that matter. Queues, bots, AI agents. As long as those systems are doing something equivalent to what a human would do, the failure modes are familiar. When they start doing things a human would not do, the failures become invisible until the damage is already done.

The lesson is not to avoid merge queues. It is to make sure that whatever writes to main stays as close as possible to boring, well-understood git operations, with no novel logic in the merge commit path that reviewers cannot audit.

Will Anyone Actually Leave?

After something like this, the obvious question is whether developers will migrate off GitHub. And the honest answer is: probably not in any significant numbers.

GitHub is deeply embedded. CI pipelines, webhook integrations, RBAC policies, Actions workflows, third-party app permissions, team structures, pull request history. Migration is not just switching a remote URL. It is months of work and coordination.

That stickiness is real and it is not purely irrational. GitHub is still where most open source lives. It is still where most integrations point. It is still the default. The addictive hold it has over the development ecosystem is less like a premium SaaS product and more like a utility. You do not switch utilities because of a bad week.

But what this incident should change is the baseline of trust. GitHub is infrastructure. And infrastructure that silently corrupts your data, even for a few hours, with no visible error, is infrastructure you need to have a recovery plan for.

The minimum response is not migration. It is verification. Audit squash merges in merge queue groups of two or more PRs from the April 22 to 23 window. Write down which parts of your build and deploy pipeline silently assume git history is correct. Then make that assumption visible somewhere it can be challenged.

What GitHub Says It Is Doing About It

GitHub's post-incident response included a few concrete commitments:

Expanding test coverage for merge correctness validation
Adding regression checks that validate resulting git contents across supported merge configurations before reaching production
Migrating performance-sensitive code from its older Ruby codebase to Go
Moving systems to public cloud infrastructure to handle the 30x scale requirement

The April 23rd bug specifically was caused by incomplete feature flagging on a new code path. The fix was a revert. The longer-term fix is better test coverage for multi-PR merge queue groups, which were apparently underrepresented in existing test suites.

The Takeaway

GitHub's merge queue, for a few hours on April 23rd, 2026, broke the most fundamental contract of version control: that what you approve is what merges. It did it silently, with clean green UI, no errors, and no status page entry.

The code was still there in Git object storage. But the branch history was wrong, and no automated system could safely repair it across every affected repository. Engineers had to do it by hand.

That is the thing that lingers. Git is supposed to be the boring, reliable layer that everything else is built on. When the boring layer gets interesting, it gets interesting in the worst possible way.

If you found this useful, drop a comment below or follow for more deep dives into the tools we trust (sometimes too much).

DE

Source

This article was originally published by DEV Community and written by Varshith V Hegde.

Read original article on DEV Community

Back to Discover