GPT-5.5 delivers a 19% performance uplift over GPT-5.4 on financial model projection tasks, according to evals run by Box's engineering team led by Yash. The benchmark involved multi-step reasoning across both structured and unstructured data, the kind of compound knowledge work that exposes weaknesses in prior models.

Box is deploying GPT-5.5 specifically against its hardest enterprise finance use cases, not general productivity tasks. That targeting decision matters: it tells you where OpenAI's new model earns its keep and where the previous generation fell short.

The full video is worth watching for the eval methodology. A 19% uplift sounds clean, but the inputs, the mix of structured versus unstructured data, and how Box defined success are the details that determine whether this number means anything for your stack.

[WATCH ON YOUTUBE →]