Skip to main content

Claude Sonnet 5: The New Default Agent Model for Claude Code

· 10 min read
Claude Dev
Claude Dev

Anthropic released Claude Sonnet 5 on June 30, 2026, positioning it as the most agentic Sonnet model so far and the new default model for Free and Pro Claude users.

The pitch is clear: Sonnet 5 brings a lot of the agentic work that recently required Opus-class models into a cheaper, faster, broadly available tier. It can plan, use browsers and terminals, handle long coding tasks, and run with adaptive thinking by default.

For Claude Code users, that makes Sonnet 5 more important than a normal model refresh. It is likely to become the default execution layer for many teams: not the strongest model Anthropic offers, but the one developers will reach for most often.

The upgrade is not frictionless. Sonnet 5 has a new tokenizer, different API behavior around thinking and sampling parameters, real-time cyber safeguards, and a pricing story that is cheaper per token than Opus but not always cheaper per task.

What Anthropic Shipped

Anthropic's launch post describes Sonnet 5 as a major upgrade over Sonnet 4.6 for reasoning, tool use, coding, and knowledge work. The company says it narrows the gap with Opus 4.8 while staying at a lower price point.

The operational details matter:

  • Model ID: claude-sonnet-5.
  • Availability: default for Free and Pro users, available to Max, Team, Enterprise, Claude Code, and the Claude Platform.
  • Context window: 1M tokens by default and as the maximum.
  • Max output: 128k tokens on the synchronous Messages API.
  • Pricing: launch pricing is $2 per million input tokens and $10 per million output tokens through August 31, 2026. Standard pricing becomes $3/$15 afterward.
  • Adaptive thinking: on by default for Claude Code and API use.
  • Cyber safeguards: real-time cybersecurity safeguards are enabled by default, the first time this is true for a Sonnet-tier model.

This is the practical positioning: Sonnet 5 is not trying to replace Fable or Mythos. It is trying to make agentic work routine.

The Good Feedback: It Finishes More Work

The strongest positive feedback is consistent across official partner quotes and early media testing: Sonnet 5 is better at completing multi-step jobs rather than just answering a prompt.

Anthropic's early access partners describe improvements in sustained coding, debugging, following conventions, brownfield code, and tested pull-request completion. The useful pattern is not "it writes nicer text." It is "it keeps going, verifies more, and reaches a finished result with fewer nudges."

TechRadar's hands-on testing reached a similar conclusion outside coding. In ordinary chat, Sonnet 5 did not feel dramatically different from competing assistants. When asked to complete work, such as planning a trip or building a household budget tracker, Claude felt more organized around finishing the task.

That distinction matters for Claude Code. The best Sonnet 5 use cases are not one-turn snippets. They are workflows like:

  • investigate a bug, write a reproducing test, fix it, and verify it;
  • migrate a module while preserving project conventions;
  • inspect a messy codebase and produce a staged plan;
  • use terminal and browser tools to gather evidence;
  • produce an artifact, revise it, and keep the output coherent.

This is where Sonnet 5 should beat Sonnet 4.6 in day-to-day developer work.

Benchmarks: Stronger, But Still Not Opus

Public coverage repeats two useful benchmark signals.

TechRadar reports Anthropic's Terminal-bench 2.1 agentic coding score at 80.5% for Sonnet 5, compared with 67% for Sonnet 4.6. ITPro reports 63.2% on SWE-bench Pro for Sonnet 5, compared with 58.1% for Sonnet 4.6 and 69.2% for Opus 4.8.

The shape is clear:

  • Sonnet 5 is a real upgrade over Sonnet 4.6.
  • Opus 4.8 remains stronger on the hardest coding tasks.
  • Sonnet 5 may match or approach Opus 4.8 on some tasks at higher effort.
  • The main value is cost-performance flexibility, not absolute frontier quality.

Anthropic's own docs also emphasize cost-performance curves across effort levels on BrowseComp and OSWorld-Verified. The important takeaway is not a single leaderboard number. It is that teams can now tune effort and cost on a Sonnet-class model instead of jumping straight to Opus.

The Hidden Migration Cost: Tokens Changed

The biggest implementation detail is the new tokenizer.

Anthropic says the same input text produces approximately 30% more tokens on Sonnet 5 than on Sonnet 4.6, with the exact increase depending on content. That does not change the API shape, but it changes budgets.

This affects:

  • token counts in logs;
  • prompt cache economics;
  • max output limits;
  • context-window planning;
  • cost estimates for equivalent prompts;
  • eval comparisons against Sonnet 4.6.

So the launch pricing is not the whole cost story. Even if per-token standard pricing remains $3/$15, the same workload can use more tokens. Teams should recount their prompts under Sonnet 5 before assuming the migration is cost-neutral.

API Behavior Changes Developers Must Notice

Sonnet 5 is a drop-in replacement only if your code avoids deprecated or unsupported settings.

The docs call out three behavior changes:

  1. Adaptive thinking is on by default. Requests that had no thinking field on Sonnet 4.6 now run with adaptive thinking. If you need to turn it off, pass thinking: {type: "disabled"}.
  2. Manual extended thinking is removed. thinking: {type: "enabled", budget_tokens: N} returns a 400 error. Use adaptive thinking with the effort parameter instead.
  3. Sampling parameters are no longer accepted. Non-default temperature, top_p, or top_k return a 400 error. Use system prompt instructions to steer behavior.

For Claude Code users, this means old wrappers and custom agent harnesses should be audited before switching model IDs. A model upgrade can become a production bug if the client still sends stale parameters.

Safety Feedback: Better Than Sonnet 4.6, Not As Strong As Opus

Anthropic says Sonnet 5 has lower hallucination and sycophancy rates than Sonnet 4.6 and performs better on agentic safety. It is also more likely to refuse malicious requests and more resistant to prompt-injection-style hijacking.

But the company also says Sonnet 5 still shows higher rates of misaligned behavior than Opus 4.8 and Claude Mythos Preview on its automated behavioral audit.

The cyber story is also specific. Sonnet 5 was not deliberately trained for cybersecurity work. It can do routine, non-harmful cyber tasks, but on dangerous cyber evaluations it performs much worse than Opus 4.8 and Mythos 5. Still, because it is stronger than Sonnet 4.6, Anthropic launched it with real-time cyber safeguards enabled by default.

For security teams, the practical reading is:

  • use Sonnet 5 for normal engineering and routine defensive work;
  • expect refusals on prohibited or high-risk cyber prompts;
  • use Opus 4.8, with the right program access, for cybersecurity work that needs reduced guardrails;
  • log stop_reason: "refusal" because refusals can arrive as successful HTTP 200 responses.

Early Outside Reaction: The Default Model Is The Story

Axios framed Sonnet 5 as a move to bring agentic AI to everyday work while keeping the risk profile below Opus, Fable, and Mythos. That is the right read.

Sonnet 5 matters because it changes the default. If Free and Pro users, Claude Code users, and platform developers all get a more agentic Sonnet model, then agent workflows stop being a premium edge case and become the normal Claude experience.

The risk is that users may overestimate autonomy. TechRadar's hands-on review was positive, but still noted that human oversight was needed for decisions, checking, booking, uploading, and final execution. Sonnet 5 gets closer to finished work, but it is not a replacement for review.

For this site, the useful framing is simple:

Sonnet 5 is the model you should try first for everyday Claude Code automation, but not the model you should trust blindly.

Claude Code Adoption Checklist

1. Update the model ID

Move test workloads from:

claude-sonnet-4-6

to:

claude-sonnet-5

Do this in a branch or staging environment first. Do not swap the default model in production without replaying evals.

2. Remove stale API parameters

Search your codebase for:

  • temperature
  • top_p
  • top_k
  • thinking: {type: "enabled"}
  • budget_tokens

Remove non-default sampling parameters and migrate manual thinking controls to adaptive thinking plus effort.

3. Recount tokens

Do not reuse Sonnet 4.6 token budgets. Recount your largest prompts, cached prefixes, and typical Claude Code sessions under Sonnet 5.

Pay special attention to:

  • large repo summaries;
  • generated plans;
  • logs pasted into the prompt;
  • long tool results;
  • max output settings close to expected output length.

4. Set effort explicitly

The safest policy is to make effort a task-level decision:

  • medium for routine edits and explanations;
  • high for normal Claude Code tasks where correctness matters;
  • xhigh for hard debugging, migrations, and long agent runs.

Avoid treating high effort as free quality. It changes latency and token use.

5. Keep Opus in the routing mix

Sonnet 5 should become the default for many workflows, but not all of them.

Keep Opus 4.8 available for:

  • high-risk refactors;
  • security-sensitive reviews;
  • ambiguous architecture decisions;
  • tasks where a missed edge case is expensive;
  • final review of large Sonnet-generated changes.

The practical pattern is Sonnet for execution, Opus for escalation.

Bottom Line

Claude Sonnet 5 is a bigger release than it first looks because it moves stronger agentic behavior into the model tier most teams will actually use every day.

It is not the new top-end Claude model. It is the new default workhorse.

For Claude Code users, the right move is to adopt it deliberately:

  • benchmark it against Sonnet 4.6 on your real tasks;
  • retune token budgets for the new tokenizer;
  • remove unsupported API parameters;
  • measure effort-level cost;
  • keep Opus 4.8 for escalations;
  • watch cyber-safeguard refusals in logs.

If Sonnet 4.6 was the previous practical baseline and Opus 4.8 was the power tool, Sonnet 5 is the attempt to bring more of that power back into the everyday workflow. That is exactly why it deserves careful migration rather than a blind default switch.

Sources Reviewed