GitHub Copilot code review has grown 10X since its April 2024 launch and now handles more than one in five pull request reviews on GitHub. The system moved from a static model to an agentic architecture that retrieves repository context, reasons across changes, and flags issues mid-read rather than only at the end. That architectural shift alone drove an 8.1% increase in positive developer feedback.

The team optimized around three measurable axes: accuracy, signal, and speed. In 71% of reviews, the agent surfaces actionable feedback. In the remaining 29%, it says nothing. When it does comment, it averages 5.1 comments per review. Switching to a more advanced reasoning model improved positive feedback rates by 6% even as latency increased 16%. GitHub kept the trade-off, prioritizing finding real issues over returning fast noise.

The full article explains how thumbs-up and thumbs-down reactions on individual comments feed directly into the evaluation loop, how production signals track whether flagged issues are resolved before merging, and how the team defines the difference between a high-signal comment and review churn. Those details matter if you are building or benchmarking any AI review system.

[READ ORIGINAL →]