Codex can now click, type, and navigate local Mac applications autonomously, without hijacking your active session. OpenAI's Ari Weinstein explains the core engineering decision: combining screenshots with accessibility tree data gives the agent a richer, more reliable read of UI state than pixels alone.

The architecture handles multiple apps simultaneously and runs in the background, which changes the practical ceiling for what coding agents can automate. The permissions model is granular, scoped app-by-app, which is the detail most coverage will skip but matters most for anyone deploying this in a real workflow.

The full video is worth watching for Weinstein's explanation of why accessibility data plus screenshots outperforms either signal alone, and how the background execution model was designed to avoid the core annoyance of prior computer-use demos: the agent stealing your mouse.

[WATCH ON YOUTUBE →]