Claude Sonnet 4.6 outperforms Opus 4.5 in programming and users prefer it 59% of the time

Just a few days after launching Claude Opus 4.6, Anthropic has presented Claude Sonnet 4.6, the new version of its intermediate model that closes the gap with Opus 4.5, its most powerful model launched in November 2025. The new model significantly improves programming, use of computers, reasoning with long context, agent planning, knowledge work and design. In addition, it incorporates a context window 1 million tokens in betaenough to process entire code bases, large contracts, or dozens of research articles in a single request.

Most notably, Early Access developers prefer Sonnet 4.6 over its predecessor Sonnet 4.5 about 70% of the time. But even more striking: they prefer it over Opus 4.5 59% of the time, especially for being less prone to over-engineering and following instructions better. The price remains the same as Sonnet 4.5: from $3 per million input tokens and $15 per million output tokens.

Qualitative leap in computer programming and control

At Claude Code, users report that Sonnet 4.6 reads context better before modifying code and consolidates shared logic rather than duplicating it, making it less frustrating in long sessions. Users highlight that it generates less hallucinationsmakes fewer errors by reporting false successes and better maintains consistency in multi-step tasks.

But perhaps the most dramatic improvement is seen in the use of computers. Anthropic was the first company to introduce a model capable of general computer use in October 2024, although they admitted it was “experimental, sometimes clumsy and error-prone.”

Sixteen months later, the data shows notable progress. In OSWorld, the standard benchmark that presents hundreds of tasks in real software such as Chrome, LibreOffice or VS Code, Sonnet 4.6 has reached levels of ability close to humans on tasks like navigating complex spreadsheets or filling out multi-step web forms.

The model has also significantly improved its resistance to prompt injection attacks, where malicious actors attempt to hijack the model by hiding instructions on websites. Anthropic’s security evaluations show that Sonnet 4.6 represents a significant improvement over Sonnet 4.5 and performs similarly to Opus 4.6 in this regard.

Mass context and long-term reasoning

The 1 million token context window is not just an impressive number. According to Anthropic, Sonnet 4.6 reasons effectively through all that contextwhich substantially improves your long-term planning ability. This was demonstrated in Vending-Bench Arena, an evaluation that simulates the management of a business over time where different models compete with each other.

Sonnet 4.6 developed an interesting strategy. It invested heavily in capacity during the first ten simulated months, spending significantly more than its competitors, only to pivot sharply toward profitability in the final stretch. This timing planning allowed it to finish well ahead of the competition.

Early customers also report widespread improvements, especially frontend code and financial analysis. The visual results of Sonnet 4.6 are described as noticeably more polished, with better layouts, animations, and aesthetic sensibilities than previous models, requiring fewer iterations to achieve production-ready results.

Claude Sonnet 4.6 is available today on all Claude plans, including free, Claude Cowork, Claude Code, API, and major cloud platforms. The free tier has also been updated to include file creation, connectors, skills, and context compaction.