Anthropic Unveils Claude Sonnet 4.5: Faster, Smarter, but Still Second in the Coding Race

The new LLM shows real progress in long, complex tasks and coding support, yet still struggles to match GPT-5 Codex on the toughest problems.

SAN FRANCISCO — Anthropic rolled out its latest AI model, Claude Sonnet 4.5, on Monday with bold claims. The company called it “the best coding model in the world.” But a closer look tells a different story. Yes, the model is faster and more resilient than its predecessors. However, independent tests show it still falls short of OpenAI’s GPT-5 Codex in key areas that matter most to professional developers.

The launch came just four months after Sonnet 4, a reminder of how quickly AI companies are racing to outdo one another. Anthropic and OpenAI now release major updates almost every quarter. Observers noticed Anthropic often times its announcements to shadow OpenAI. For example, Anthropic’s Opus 4.1 dropped right before GPT-5 launched in August.

Built for Endurance, Not Just Speed

Anthropic is betting big on stamina. According to the company’s tests, Sonnet 4.5 can power through complex coding projects for more than 30 straight hours without losing focus. That’s a leap over older models, which tended to drift off-task during long sessions.

The numbers back it up. On SWE-bench Verified—a benchmark that measures real-world software engineering performance—Sonnet 4.5 scored higher than any previous Anthropic model. On OSWorld, which tests how well AI can handle full computer systems, it jumped from 42.2 percent in June to 61.4 percent today.

In practice, this means the model can now do more than just write code. It can navigate web browsers, fill out spreadsheets, and even complete lengthy online forms using Anthropic’s Chrome extension. Developers also get new tools like checkpoints in Claude Code, which let them save progress without Git, a slicker terminal, and built-in Visual Studio Code integration.

The Reality Check

Engineers from our CTOL.digital engineering team praised its speed and reliability for everyday work—things like reviewing pull requests, debugging, and handling multi-file projects. The checkpoint feature in particular got a lot of love.

But the honeymoon ended when they asked it to tackle tougher challenges. Complex front-end work tripped it up. In some cases, it ignored a project’s existing structure or authentication setup, which can break apps in ways no developer wants.

“For day-to-day coding, it’s excellent,” one engineer in our engineering team explained. “But when we’re facing deep logic puzzles or thorny production bugs, GPT-5 Codex is still our first choice.”

The takeaway? Many team members find themselves running a two-model system: using Sonnet 4.5 for routine tasks and handing the hard stuff to GPT-5. That approach could balance costs and productivity until Anthropic narrows the gap.

Building for the Agent Future

Beyond the model itself, Anthropic is quietly laying groundwork for something bigger. The company just launched the Claude Agent SDK, the same toolkit behind Claude Code. With it, developers can build autonomous agents that handle long-running jobs, juggle permissions, and coordinate across multiple sub-agents.

Anthropic is also running a five-day “Imagine with Claude” demo for premium users. In it, Sonnet 4.5 builds real, working software from scratch, live and unscripted. While positioned as an experiment, it hints at the company’s ambition to move beyond coding assistants and toward full-blown AI collaborators.

Pricing stays the same—$3 per million input tokens and $15 per million output tokens—keeping Claude firmly in the premium tier while competitors slash rates.

Safety Still Front and Center

Anthropic hasn’t forgotten alignment. Sonnet 4.5 is billed as its safest model yet, showing fewer signs of flattery, deception, or other risky behaviors. It also resists prompt injection attacks better than before, which is crucial when agents run inside real systems.

The model ships with AI Safety Level 3 protections, including filters that catch dangerous inputs related to weapons development. Those filters sometimes block harmless material, but Anthropic says false alarms are down tenfold since earlier versions.

Pressure from All Sides

Anthropic’s survival looks less precarious after this release, but the threat remains. It has already lost its crown jewel position as the best coding LLM—our toughest problems are now only solvable with GPT-5 High/Pro. At this point, Anthropic can compete only on price and everyday use case. But if Gemini 3 outperforms Sonnet 4.5 on coding while also being cheaper—remaining on the Pareto frontier—Anthropic could be in serious trouble, since its models’ strongest advantage has so far been in everyday coding tasks.

Investors Take Note

For investors, the message is clear: the market for large language models is maturing fast. Gains are now incremental, and the real differentiation may soon come from integration, ecosystem lock-in, or industry-specific fine-tuning—not raw power.

Developers, meanwhile, are unlikely to stick with just one vendor. The smarter move is mixing and matching models depending on the job. That could squeeze profits for model makers but create opportunities for companies building orchestration tools on top.

The risk is sharpest for firms that only sell foundation models. As features converge and customers switch easily, pricing power may collapse long before operating costs do. Hyperscalers, with their deep pockets and cloud bundles, could accelerate that trend.

Disclaimer: This article reflects current conditions and market patterns. Past results don’t guarantee future performance. Readers should seek independent financial advice before making investment decisions.

Anthropic Unveils Claude Sonnet 4.5: Faster, Smarter, but Still Second in the Coding Race