GitHub recorded 10 service degradation incidents in April 2026. The worst: a full code search outage on April 1 that lasted 8 hours and 43 minutes, with 100% of search queries failing for 2 hours and 20 minutes after a botched messaging infrastructure upgrade cascaded into a complete index wipe. On April 9, a rate limiting bug with incorrect global scoping hit the Copilot coding agent across four outage waves, delaying or failing roughly 22,700 workflow creations, with queue wait times peaking at 54 minutes against a normal baseline of 15 to 40 seconds.
The incident reports are worth reading in full for the mechanism details, not just the outcomes. The April 1 search outage escalated specifically because an unintended service deployment cleared internal routing state mid-recovery, turning a staleness problem into a total outage. The April 9 Copilot incidents were compounded by a caching bug that persisted a rate-limited state beyond its actual window, causing the outage to recur. On April 13, an automated DNS tool deleted a live record after its upstream data source returned a blank, treating the absence as staleness. GitHub Pages then served 17.5 million HTTP 500 errors across 39 minutes before the record was manually recreated.
GitHub is responding with architectural fixes across all four incidents: gradual upgrade rollouts with health checks for search, per-installation API credential scoping for Copilot, availability-zone-tolerant failover routing for Pages, and credential rotation hardening for audit logs. A separate blog post covers the April 23 and April 27 incidents not included in this summary. The full report is the primary source for root cause chains and remediation timelines.
[READ ORIGINAL →]