NEWS

Anthropic Drops Claude Opus 4.7 With Record Coding Scores

Published

3 months ago

April 17, 2026

Anthropic’s Claude Opus 4.7, released Thursday, leads every major rival on real-world coding tests and brings triple the image resolution of its predecessor. It replaces Opus 4.6 as Anthropic’s flagship publicly available model, priced the same at $5 per million input tokens.

The launch is also notable for what Opus 4.7 cannot do. Anthropic deliberately scaled back the model’s cybersecurity capabilities during training, making it the first model in a new safety framework designed to pave the way for a broader release of the company’s most powerful AI, Claude Mythos Preview.

At a Glance:

Claude Opus 4.7 scores 64.3% on SWE-bench Pro, topping GPT-5.4 at 57.7% and Gemini 3.1 Pro at 54.2%.
Priced at $5 per million input tokens and $25 per million output tokens, same as Opus 4.6.
A new Cyber Verification Program opens model access to verified security researchers for legitimate use.
Anthropic’s annualized revenue hit $19 billion by March 2026, with eight of the Fortune 10 using Claude.

Claude Opus 4.7 Sets a New Bar on Real-World Coding Tests

On SWE-bench Pro, the industry’s most cited test for resolving real software issues from open-source repositories, Opus 4.7 scored 64.3%. That is well ahead of OpenAI’s GPT-5.4 at 57.7% and Google’s Gemini 3.1 Pro at 54.2%, according to The Next Web’s benchmark analysis of the new release.

The gap over Opus 4.6 is just as striking. Its predecessor scored 53.4% on the same test, meaning Opus 4.7 improved by 10.9 percentage points in a single generation. That is the largest single-step improvement between consecutive Claude Opus releases on SWE-bench Pro. On SWE-bench Verified, a curated version of the benchmark, Opus 4.7 reached 87.6%, compared with 80.8% for Opus 4.6 and 80.6% for GPT-5.4.

64.3% Opus 4.7’s score on SWE-bench Pro, beating GPT-5.4 and Gemini 3.1 Pro

87.6% Opus 4.7’s score on SWE-bench Verified, up from 80.8% on Opus 4.6

3x increase in maximum image resolution compared to all prior Claude models

14% improvement in multi-step agentic reasoning over Opus 4.6, with one-third of the tool errors

“Anthropic has already set the standard for coding models, and Claude Opus 4.7 pushes that further in a meaningful way as the state-of-the-art model on the market. In our internal evals, it stands out not just for raw capability, but for how well it handles real-world async workflows,” said the team at Cursor, an AI-powered development platform, in early-access testing cited in Anthropic’s official Claude Opus 4.7 announcement.

Instruction-following also received a meaningful upgrade. Anthropic noted that Opus 4.7 now takes instructions more literally, where earlier models often interpreted them loosely or skipped steps. That improvement cuts both ways: production teams will need to review and retune any prompts written for Opus 4.6 before they switch over.

Claude Opus 4.7 top coding AI model launch 2026

Why Anthropic Deliberately Gave This Model Weaker Cyber Skills

Earlier this month, Anthropic launched Project Glasswing, a cybersecurity program built around Claude Mythos Preview, its most powerful and restricted model. That model demonstrated the ability to find high-severity vulnerabilities in every major operating system and web browser, scoring 77.8% on SWE-bench Pro. About 40 companies, including Microsoft, Amazon, Apple, and Palo Alto Networks, received limited access for defensive security work, per CNBC’s report on the Mythos cybersecurity rollout.

Opus 4.7 was intentionally built with less cyber firepower than Mythos Preview. Anthropic confirmed it ran experiments during training to “differentially reduce” those capabilities in the new model. The company also deployed automatic filters that detect and block prohibited or high-risk cybersecurity requests, making Opus 4.7 the first publicly available Claude model to carry live cyber guardrails from day one of general release.

Security professionals doing legitimate work are not locked out. Anthropic launched a new Cyber Verification Program for researchers who need access for tasks such as penetration testing, vulnerability research, and red-teaming. The program invites qualified applicants to submit their credentials, and approval grants expanded access to Opus 4.7’s cyber-relevant capabilities under verified conditions.

“We are releasing Opus 4.7 with safeguards that automatically detect and block requests that indicate prohibited or high-risk cybersecurity uses. What we learn from the real-world deployment of these safeguards will help us work towards our eventual goal of a broad release of Mythos-class models.” — Anthropic

Triple the Resolution and a Suite of New Developer Features

Opus 4.7 now accepts images up to 2,576 pixels on the long edge, roughly 3.75 megapixels, which is more than three times the capacity of any prior Claude model. That unlocks new territory for computer-use agents reading dense screenshots, scanned contract analysis, and technical diagram interpretation in fields like life sciences and legal.

Memory also improved in a practical way. Opus 4.7 is more consistent at using file system-based notes across long, multi-session projects. Teams working on extended tasks no longer need to re-establish full context at the start of every new session, which reduces setup time on complex ongoing work.

Feature	Opus 4.6	Opus 4.7
SWE-bench Pro	53.4%	64.3%
SWE-bench Verified	80.8%	87.6%
Max image resolution	Standard	2,576px long edge (3.75MP)
xhigh effort level	Not available	Available
Input token price	$5 per million	$5 per million

A new effort level called “xhigh” now sits between “high” and “max,” giving developers a finer dial over the tradeoff between reasoning depth and speed. In Claude Code, Anthropic raised the default to xhigh for all plans on launch day. The company recommends starting at high or xhigh when testing Opus 4.7 on coding and agentic tasks to get the best results out of the box.

The /ultrareview slash command is new to Claude Code. It runs a focused review session that scans recent code changes for bugs and design flaws. Pro and Max users receive three free ultrareviews at launch to try it out.

Task budgets also arrive in public beta for API users. They let developers guide how Claude allocates its token capacity across longer runs, helping manage costs on multi-step pipelines. An extended auto mode for Max users rounds out the release, letting Claude make decisions without requiring manual approval at each step and enabling longer unsupervised runs.

What Developers Must Check Before Switching from Opus 4.6

Opus 4.7 uses an updated tokenizer that can map the same input to between 1.0 and 1.35 times as many tokens, according to Anthropic’s API pricing documentation. For teams running high-volume workloads, that increase could raise monthly costs even though the per-token price is unchanged. Anthropic says its internal coding benchmark showed net efficiency gains across all effort levels, but recommends measuring impact on real production traffic before assuming cost neutrality.

Key Takeaway: Prompts written for Opus 4.6 may behave very differently with Opus 4.7. The new model follows instructions far more literally, meaning vague prompts that worked before may now produce overly strict or unexpected results. Re-testing all production prompts before switching over is strongly recommended.

The model now takes instructions more literally than its predecessor, which is both its strength and its main migration risk. Prompts that relied on Opus 4.6’s forgiving interpretation could yield different outputs, tighter responses, or strictly enforced constraints that change workflow behavior. Teams should run parallel tests on real use cases before cutting over to production.

A migration guide published by Anthropic walks developers through adjusting effort levels, setting token budgets, and adapting prompt behavior. Anyone who wants to test the model today can access it via Amazon Web Services’ Amazon Bedrock platform, Google Cloud’s Vertex AI, Microsoft Foundry, or directly through the Claude API using the identifier “claude-opus-4-7”.

The Bigger Picture: Safety as a Competitive Strategy

Anthropic’s annualized revenue reached $19 billion by March 2026, up from $9 billion at the end of 2025. Eight of the Fortune 10 companies now count as Claude customers, and the company’s valuation rose to $380 billion in February, with some venture capital bids now reportedly pushing toward $800 billion.

Project Glasswing represents a new kind of safety playbook for the AI industry. Instead of keeping its most capable model locked away indefinitely, Anthropic is using Opus 4.7 to stress-test its cybersecurity filters in real-world conditions. The data gathered from those live deployments will directly inform when and how Mythos Preview is offered to the general public, giving Anthropic a measurable safety story to tell enterprise security buyers rather than promises alone.

Competition from OpenAI and Google remains intense. GPT-5.4 trades blows with Opus 4.7 on some tasks, and Gemini 3.1 Pro holds its own on multilingual benchmarks. Still, Anthropic’s clear lead on SWE-bench Pro and the strong early reception from enterprise partners give it a compelling argument heading into the second half of 2026.

Notion Agent reports a 14% improvement in multi-step workflows with one-third fewer tool errors vs. Opus 4.6.
Rakuten’s internal benchmark shows Opus 4.7 resolves three times more production software tasks than Opus 4.6.
CodeRabbit found bug-detection recall improved by more than 10% with no loss in precision on complex pull requests.
Hex reports that low-effort Opus 4.7 performance is roughly equal to medium-effort Opus 4.6 results.

“Claude Opus 4.7 is the first model to pass our implicit-need tests, and it keeps executing through tool failures that used to stop Opus cold. This is the reliability jump that makes Notion Agent feel like a true teammate,” the team at Notion said in early-access testing cited by Anthropic. Those results, multiplied across dozens of enterprise partners, suggest a model finally reliable enough to handle the most demanding automated tasks without constant human supervision.

Frequently Asked Questions

What is Claude Opus 4.7?

Claude Opus 4.7 is Anthropic’s latest flagship AI model, released on April 16, 2026. It scores 64.3% on SWE-bench Pro, leads rivals from OpenAI and Google on coding benchmarks, and includes improved vision, better memory, and stronger instruction-following than Opus 4.6.

How does Claude Opus 4.7 compare to GPT-5.4?

On SWE-bench Pro, Opus 4.7 scores 64.3% versus GPT-5.4’s 57.7%. On SWE-bench Verified, Opus 4.7 reaches 87.6% compared with GPT-5.4’s 80.6%, giving Anthropic a clear lead on the benchmarks most relevant to enterprise developers.

Is Claude Opus 4.7 safe for cybersecurity work?

Anthropic built automatic filters into Opus 4.7 to block high-risk cybersecurity requests. Security professionals doing legitimate work such as penetration testing and vulnerability research can apply to Anthropic’s new Cyber Verification Program for expanded verified access.

How much does Claude Opus 4.7 cost?

Opus 4.7 is priced at $5 per million input tokens and $25 per million output tokens, the same as Opus 4.6. However, a new tokenizer may use up to 35% more tokens for the same input, meaning actual per-request costs can rise for existing users.

What is the xhigh effort level in Claude Opus 4.7?

The xhigh level is a new reasoning option that sits between “high” and “max,” giving developers finer control over the tradeoff between reasoning depth and speed. It is now the default effort level inside Claude Code for all plans.

Where can I access Claude Opus 4.7?

Opus 4.7 is available across all Claude products, the Claude API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry, all starting April 16, 2026.

Claude Opus 4.7 arrives as both a technical milestone and a live safety test. Its 64.3% score on SWE-bench Pro gives enterprise teams a clear reason to upgrade, while its cyber safeguards set a template for how Anthropic plans to eventually release the far more powerful Mythos Preview to the broader public. With $19 billion in annualized revenue and eight of the Fortune 10 already on board, the careful-but-fast approach that defines Anthropic is clearly finding its market. Whether the safety guardrails hold in real-world conditions will be the most important story to follow in the months ahead. Share your thoughts in the comments below.