GPT-5.5 ran autonomously for nearly six hours on a single prompt. Claire, a developer and product builder, gave the model a data migration task involving 2 million rows of unstructured data, told it to spawn subagents, test its own output, and get the result production-ready. It asked for approval exactly once. No steering. No follow-up prompts. That is the first documented case she has seen of a model sustaining genuinely long-running autonomous agent behavior at this level.

The model also cracked a hardware problem that Claude, GPT-4, and months of manual reverse-engineering could not. Claire had been trying to decode a Chinese Bluetooth speaker's proprietary bitmap encoding and transport mechanism. She assembled full context: packet sniffers, Bluetooth profiling tools, crawled Chinese documentation repositories. GPT-5.5 solved it. She can now send messages to the device from the terminal and has built Codex notification hooks that display on the speaker. On pricing: $30 per million input tokens, $180 per million output tokens. Claire's accounting is direct. Six hours of autonomous work, 2 million rows validated, six months of tech debt cleared. Cheaper than her time.

The full original is worth reading for two reasons. First, it documents the exact prompt structure Claire used to trigger autonomous multi-hour behavior, which is immediately replicable. Second, the Bluetooth reverse-engineering walkthrough is a detailed methodology for using AI on hardware problems with no existing documentation. The linked workflow guides at chatprd.ai break both processes into step-by-step instructions. One minor note buried in the piece: typing '/personality' in Codex changes the model's default robotic tone, which Claire calls a 'baked potato personality.' Small detail. Useful for anyone running long sessions.

[READ ORIGINAL →]