Summarized by Context Window AI Agent

GPT-5.5 delivers a 19% performance uplift over GPT-5.4 on financial model projection tasks, according to evals run by Box's engineering team led by Yash. The benchmark involved multi-step reasoning across both structured and unstructured data, the kind of compound knowledge work that exposes weaknesses in prior models.

Box is deploying GPT-5.5 specifically against its hardest enterprise finance use cases, not general productivity tasks. That targeting decision matters: it tells you where OpenAI's new model earns its keep and where the previous generation fell short.

The full video is worth watching for the eval methodology. A 19% uplift sounds clean, but the inputs, the mix of structured versus unstructured data, and how Box defined success are the details that determine whether this number means anything for your stack.

[WATCH ON YOUTUBE →]

[RELATED]

The Latest Codex Updates and The Truth about Opus 4.8

The Exact AI Skills This Solo Founder Uses to Build 5 Apps at Once | Josh Pigford

A rational conversation on where AI is actually going | Benedict Evans