What Is Going On With GitHub?
The hosts discuss GitHub's recent wave of outages and operational failures, including a notable merge queue incident that corrupted code repositories. They explore the root causes — primarily the explosion of AI-driven code commits overwhelming GitHub's legacy infrastructure — and debate how much sympathy large tech companies deserve when facing unprecedented scaling challenges.
Summary
The episode opens with hosts Matt and Mike framing GitHub's recent instability as a significant concern given how deeply embedded the platform has become in modern software development pipelines. Mike presents data from an unofficial 'Missing GitHub Status Page' showing an 86.75% uptime over 90 days — a shockingly low figure for a platform of GitHub's scale, especially compared to the industry standard of 'five nines' (99.999% uptime). The hosts note that GitHub's official status page shows higher numbers because it only tracks core services, not the full chain of interconnected services developers rely on.
The most alarming incident discussed is the April 23rd merge queue failure, which affected 658 repositories and 2,092 pull requests. Unlike typical outages, this incident actually corrupted Git history by merging pull requests out of order, and GitHub could not automatically fix it — they had to instruct affected clients to manually untangle their repositories. The hosts argue this represents a qualitatively different and more serious type of failure than simple downtime.
A central theme is the explosive growth in GitHub usage driven by AI coding tools. Mike references a GitHub blog post showing they had planned for 10x capacity growth starting in October 2025, only to realize they actually need 30x capacity. The hosts attribute this to AI agents and vibe coders committing code far more frequently than traditional developers, and new non-developer users entering the space through AI-assisted coding platforms.
The hosts engage in a nuanced debate about corporate sympathy. Matt argues that while the failures are unacceptable, GitHub is navigating an unprecedented technological shift that any platform would struggle with. Mike counters that Microsoft, owning Azure and having vast engineering resources, has fewer excuses than a small company would. Both agree the real red line is code integrity failures, not mere downtime.
The conversation broadens into a discussion about technology's dependency problem. The hosts describe how modern software stacks — from simple marketing websites to enterprise applications — are built on layers of third-party dependencies, each representing a potential point of failure or security vulnerability. They discuss supply chain attacks on NPM packages as a growing threat, and speculate that 'zero dependency or low dependency development' may become a necessary philosophy for critical systems, though they acknowledge the counterargument that widely-used open-source packages benefit from community vetting.
The hosts briefly consider a hypothetical scenario where Microsoft sunsets GitHub, concluding it would cause significant industry chaos given how many platforms (like Vercel) are deeply integrated with GitHub's services. They also consider the slower alternative scenario where GitHub gradually becomes antiquated as AI coding platforms develop their own internal version control systems, drawing parallels to the decline of BlackBerry and MySpace. The episode closes with the hosts agreeing on a 'wait and see' stance while acknowledging GitHub's published remediation plans.
Key Insights
- The unofficial Missing GitHub Status Page shows 86.75% uptime over 90 days, while GitHub's official status page shows higher numbers because it only tracks select core services rather than the full interconnected service chain.
- GitHub planned to increase capacity by 10x starting in October 2025 but has since determined they actually need 30x capacity, driven by parabolic growth in usage correlated with the release of highly capable LLMs like Opus 4.5.
- The April 23rd merge queue incident is framed as categorically more serious than typical outages because it corrupted Git history and required manual untangling by affected clients — something GitHub could not automate a fix for.
- Mike argues that AI-assisted projects generate astronomically more commits than traditionally coded projects because the effort to commit drops to near-zero, significantly inflating GitHub's request load.
- The hosts argue GitHub has effectively become a developer operating system — handling CI/CD pipelines, secrets management, hosting via Pages, wikis, issues tracking, and CMS-like functions — meaning any single service failure can cascade across an entire development workflow.
- Matt draws a parallel between GitHub's scaling crisis and small businesses suddenly going viral — the difference being that Microsoft owns Azure and has the engineering resources to have anticipated and prepared for this growth.
- The hosts identify a broader technology dependency problem where modern software stacks — even simple marketing websites — involve dozens of third-party services, any one of which can cause cascading failures.
- Mike speculates that supply chain attacks on package managers like NPM are becoming a more dangerous threat vector than writing code yourself with potential vulnerabilities, because a single compromised popular dependency can affect hundreds of millions of downstream systems.
- The hosts suggest that GitLab, while the most feature-complete GitHub competitor, is already slow at its current scale and would likely struggle even more if a significant portion of GitHub's traffic migrated to it.
- Matt argues that vibe coders and non-developer AI users may not care about Git or GitHub at all, instead relying on internal version control systems within closed coding platforms, potentially fragmenting the centralized role GitHub currently plays.
- Mike expresses that while he currently has patience for GitHub's outages, repeated code integrity incidents — as opposed to simple downtime — would be his personal threshold for migrating away from the platform.
- The hosts contend that GitHub's current architecture was not designed from the ground up to serve as a platform of its current scale and complexity, and that truly fixing the problems may require a ground-up redesign rather than continued patching of legacy systems.
Topics
Full transcript available for MurmurCast members
Sign Up to Access