OpenAI launches EVMbench to test AI agents on smart contract security days after Claude Opus 4.6-assisted code triggered a $1.78M DeFi exploit. Smart contracts OpenAI launches EVMbench to test AI agents on smart contract security days after Claude Opus 4.6-assisted code triggered a $1.78M DeFi exploit. Smart contracts

OpenAI Drops EVMbench After Claude Vibe Code Disaster

2026/02/20 02:30
4 min read

OpenAI launches EVMbench to test AI agents on smart contract security days after Claude Opus 4.6-assisted code triggered a $1.78M DeFi exploit.

Smart contracts protect over $100 billion in open-source crypto assets. That number alone should explain why OpenAI’s latest move is drawing serious attention. The company, working alongside crypto investment firm Paradigm, rolled out EVMbench, a benchmark designed to test how well AI agents detect, exploit, and patch high-severity smart contract vulnerabilities.

The benchmark draws from 120 curated vulnerabilities pulled across 40 audits. Most of those came from open code audit competitions. What makes it different is the scope. EVMbench tests three distinct capability modes: detect, patch, and exploit, each measured separately and graded through a Rust-based harness that replays transactions in a sandboxed local environment. No live networks involved.

You might also like: Claude-Generated Code Linked to $1.78M DeFi Hack

The Number That Should Worry Everyone

In exploit mode, GPT-5.3-Codex via Codex CLI scored 72.2%. Six months back, GPT-5 sat at 31.9% on the same metric. That gap is not small. OpenAI confirmed the figures in its official announcement on X, framing EVMbench as both a measurement tool and a call to action for the security community.

Detect and patch scores remain lower. Agents in the detection setting sometimes identify a single vulnerability and then stop. They do not exhaust the codebase. In patch mode, the challenge is preserving full contract functionality while removing the flaw. That balance is still giving models trouble.

Must read: Trust Wallet Security Hack: How to Safeguard Your Assets

A $1.78M Oracle Error Nobody Caught

The backdrop to all of this matters. Security researcher evilcos flagged on X that the DeFi lending protocol Moonwell suffered a loss of approximately $1.78 million. The cause was an Oracle configuration error. A price feed formula was written incorrectly, setting cbETH’s value at $1.12 instead of approximately $2,200.

That is a low-level mistake. The kind of careful audit should catch. The GitHub pull request for proposal MIP-X43 showed commits co-authored by Claude Opus 4.6. Anthropic’s latest and most capable model at the time.

Smart contract auditor pashov posted on ,X calling it possibly the first exploit tied to vibe-coded Solidity. He was careful to note that human reviewers still hold final responsibility. A security auditor signs off before anything goes on-chain. But something in that chain broke down.

What EVMbench Is Actually Built to Do

The benchmark includes vulnerability scenarios from the security audit of the Tempo blockchain, a purpose-built L1 designed for high-throughput stablecoin payments. That extension pushes EVMbench into payment-oriented contract code, an area where OpenAI expects agentic stablecoin activity to grow.

Each exploit task runs in an isolated Anvil instance. Transactions replay deterministically. The grading setup restricts unsafe RPC methods and was red-teamed internally to stop agents from gaming results. Vulnerabilities used are historical and publicly documented.

OpenAI is also committing $10M in API credits to accelerate cyber defense, with priority given to open-source software and critical infrastructure. Its security research agent Aardvark, is expanding into private beta. Free codebase scanning for widely used open-source projects is part of that push.

The Vibe-Coding Question With Real Stakes

Pashov’s post on X raised what many in the DeFi space had been avoiding. When AI writes production Solidity code and humans approve it fast, the review layer gets thin. The Moonwell incident showed exactly how thin it can get.

OpenAI acknowledged that cybersecurity is inherently dual-use. Its response is evidence-based. Safety training, automated monitoring, and access controls for advanced capabilities are part of that. But a 72.2% exploit score on a public benchmark is the kind of number that does not stay quiet.

EVMbench’s full task set, tooling, and evaluation code are now public. The goal is to let researchers track AI cyber capabilities as they grow, and build defenses at the same pace. Whether that pace is fast enough is the question nobody has answered yet.

The post OpenAI Drops EVMbench After Claude Vibe Code Disaster appeared first on Live Bitcoin News.

Market Opportunity
Smart Blockchain Logo
Smart Blockchain Price(SMART)
$0.004518
$0.004518$0.004518
-1.82%
USD
Smart Blockchain (SMART) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Vitalik Buterin’s Strategic Decisions Cause Stir in Crypto Markets

Vitalik Buterin’s Strategic Decisions Cause Stir in Crypto Markets

The post Vitalik Buterin’s Strategic Decisions Cause Stir in Crypto Markets appeared on BitcoinEthereumNews.com. Ethereum co-founder Vitalik Buterin engaged in
Share
BitcoinEthereumNews2026/02/22 14:17
XRP Just Flashed the Same Signal Before a 114% Explosion

XRP Just Flashed the Same Signal Before a 114% Explosion

The post XRP Just Flashed the Same Signal Before a 114% Explosion appeared first on Coinpedia Fintech News XRP has just printed its largest on-chain realized loss
Share
CoinPedia2026/02/22 13:45
Fed rate decision September 2025

Fed rate decision September 2025

The post Fed rate decision September 2025 appeared on BitcoinEthereumNews.com. WASHINGTON – The Federal Reserve on Wednesday approved a widely anticipated rate cut and signaled that two more are on the way before the end of the year as concerns intensified over the U.S. labor market. In an 11-to-1 vote signaling less dissent than Wall Street had anticipated, the Federal Open Market Committee lowered its benchmark overnight lending rate by a quarter percentage point. The decision puts the overnight funds rate in a range between 4.00%-4.25%. Newly-installed Governor Stephen Miran was the only policymaker voting against the quarter-point move, instead advocating for a half-point cut. Governors Michelle Bowman and Christopher Waller, looked at for possible additional dissents, both voted for the 25-basis point reduction. All were appointed by President Donald Trump, who has badgered the Fed all summer to cut not merely in its traditional quarter-point moves but to lower the fed funds rate quickly and aggressively. In the post-meeting statement, the committee again characterized economic activity as having “moderated” but added language saying that “job gains have slowed” and noted that inflation “has moved up and remains somewhat elevated.” Lower job growth and higher inflation are in conflict with the Fed’s twin goals of stable prices and full employment.  “Uncertainty about the economic outlook remains elevated” the Fed statement said. “The Committee is attentive to the risks to both sides of its dual mandate and judges that downside risks to employment have risen.” Markets showed mixed reaction to the developments, with the Dow Jones Industrial Average up more than 300 points but the S&P 500 and Nasdaq Composite posting losses. Treasury yields were modestly lower. At his post-meeting news conference, Fed Chair Jerome Powell echoed the concerns about the labor market. “The marked slowing in both the supply of and demand for workers is unusual in this less dynamic…
Share
BitcoinEthereumNews2025/09/18 02:44