IBM's Carbon design system has 76 open pull requests, some sitting since April, each waiting on a single reviewer. That is not a Carbon problem. That is the design system industry's structural reality: one person, one queue, one point of failure. Sparkbox's survey found that while 61% of teams had a contribution process, only 16% tracked a single metric about whether it was working. Zeroheight's more recent data shows stakeholder buy-in dropped from 42% to 32% year over year. The gatekeeper model is collapsing under its own weight, and AI is arriving precisely as the people holding it together are burning out.

The current wave of AI and design system writing is almost entirely about making systems machine-readable: tokenized components, LLM-parseable metadata, agentic doc structures. That work exists and matters. It is not the bottleneck. The bottleneck is review, and review is two distinct activities that get treated as one. Enforcement asks whether contrast ratios pass, whether tokens are bound, whether snapshots changed. Judgment asks whether the component should exist, whether five variants should collapse into one, whether alt text is meaningful or just technically present. Automated accessibility tools illustrate the boundary precisely: axe-core catches 57% of WCAG issues by its own documentation. The remaining 40-plus percent requires a human. That 57 to 43 split is not a tooling failure. It is the line itself.

The piece is worth reading in full because it does not stop at the diagnosis. It works through what actually happens when you try to hand enforcement to machines without accidentally handing them judgment too, using accessibility as the stress test case because the stakes are concrete and the measurement exists. The argument about how conflating enforcement with judgment is the single most expensive mistake in design system governance is the thread that runs through everything here, and it has direct operational consequences for anyone currently sitting in the reviewer's chair.

[READ ORIGINAL →]