OpenAI launches GPT-5.3-Codex-Spark, an AI model optimized for programming at over 1,000 tokens per second

OpenAI has introduced GPT-5.3-Codex-Spark, a stripped-down version of GPT-5.3-Codex (interestingly, a version above the general model, GPT-5.2) designed specifically for real-time programming. The model is capable of generating more than 1,000 tokens per second when running on low-latency hardware, allowing developers to see results almost instantly while working on Codex.

This launch marks the first result of the partnership between OpenAI and Cerebras announced in January. Codex-Spark runs on Cerebras’ Wafer Scale Engine 3, an AI accelerator specifically designed for high-speed inference. The company has released the model as a research preview for ChatGPT Pro users while expanding data center capacity and improving user experience.

Performance and technical characteristics

According to OpenAI itself, Codex-Spark is optimized for interactive work where latency matters as much as model capacity. It allows you to collaborate in real time, interrupt or redirect model work while it is running, and iterate quickly with near-instant responses. By default, the model performs minimal and specific editions and does not run tests automatically unless explicitly requested.

In the SWE-Bench Pro and Terminal-Bench 2.0 benchmarks, which evaluate software engineering capabilities, GPT-5.3-Codex-Spark demonstrates solid performance, completing tasks in a fraction of the time compared to full GPT-5.3-Codex. In Terminal-Bench 2.0 it reaches 58.4% accuracy, compared to 77.3% for the full model and 46.1% for GPT-5.1-Codex-mini.

OpenAI has also implemented latency improvements that will benefit all of its models. The company has reduced round-trip overhead between client and server by 80%, token overhead by 30%, and time to first token by 50% through a persistent WebSocket connection and optimizations to its inference stack.

The model currently has a 128k token context window and only processes text. During the research preview, you will have independent usage limits that can be adjusted based on demand. Codex-Spark is available in the latest versions of the Codex app, CLI, and VS Code extension for ChatGPT Pro users, as well as accessible via API for a small group of design partners.