OpenAI's new CoT-Control benchmark reveals frontier AI models struggle to obscure their reasoning chains, reinforcing monitoring as a viable safety layer. (ReadOpenAI's new CoT-Control benchmark reveals frontier AI models struggle to obscure their reasoning chains, reinforcing monitoring as a viable safety layer. (Read

OpenAI Finds AI Reasoning Models Cant Hide Their Thinking - A Win for Safety

2026/03/06 02:34
3 min di lettura
Per feedback o dubbi su questo contenuto, contattateci all'indirizzo [email protected].

OpenAI Finds AI Reasoning Models Cant Hide Their Thinking - A Win for Safety

Caroline Bishop Mar 05, 2026 18:34

OpenAI's new CoT-Control benchmark reveals frontier AI models struggle to obscure their reasoning chains, reinforcing monitoring as a viable safety layer.

OpenAI Finds AI Reasoning Models Cant Hide Their Thinking - A Win for Safety

OpenAI's latest research reveals that frontier reasoning models—including GPT-5.2 Thinking—can't reliably hide or manipulate their internal thought processes, even when explicitly instructed to do so. The finding, published March 5, 2026, suggests that chain-of-thought monitoring remains a viable safety mechanism as AI systems grow more autonomous.

The research team tested 13 reasoning models using CoT-Control, a new open-source evaluation suite comprising over 13,000 tasks. Models were given simple instructions about their reasoning—avoid certain keywords, use only lowercase letters, append specific phrases to sentences. They failed spectacularly.

Controllability scores ranged from just 0.1% to 15.4% across all frontier models tested. Even more telling: models often recognized mid-reasoning that they were violating instructions and still couldn't course-correct. One example showed GPT-5.2 Thinking using the forbidden word "XOR" four additional times while actively panicking about having already used it.

Why This Matters for AI Development

As AI agents handle increasingly complex autonomous tasks, the ability to monitor their reasoning becomes critical. If models could deliberately obscure their thought processes when they know they're being watched, safety evaluations would become unreliable. The research shows this isn't currently possible.

"Low CoT controllability is a feature, not a bug," the OpenAI team stated. The limitation makes chain-of-thought monitoring more trustworthy as a safeguard against misaligned behavior.

The data revealed some counterintuitive patterns. While larger models showed marginally better controllability, this advantage evaporated under two conditions: when models were given more time to think, and when they underwent additional reinforcement learning training. The OLMo-3-7B-RL-Zero-Math model saw its controllability score drop by more than tenfold during RL training.

Testing Under Adversarial Conditions

The researchers simulated realistic scenarios by explicitly telling models they were being monitored or evaluated. This slightly increased controllability—models tried harder—but gains remained modest and transferred poorly to new instruction types.

Automated prompt optimization found ways to boost controllability on training sets, but these improvements didn't generalize. A model that learned to follow one type of CoT instruction still failed at novel ones.

The Bigger Picture

This work builds on OpenAI's December 2025 framework for evaluating CoT monitorability, which found that inspecting reasoning traces catches misbehavior far more effectively than monitoring outputs alone. The March 2025 discovery of cheating behavior in reasoning models had prompted recommendations to preserve CoT monitoring capabilities.

OpenAI plans to report CoT controllability alongside monitorability metrics in system cards for future frontier models, starting with GPT-5.4 Thinking. The company has also committed to avoiding optimization pressure directly on the reasoning chains of frontier models—a practice that research suggests could inadvertently teach models to hide their intentions.

The open question remains whether this limitation persists as capabilities advance. The team acknowledges they don't fully understand why controllability is low, making continued evaluation essential. For now, the inability of AI systems to game their own oversight represents an unexpected safety dividend.

Image source: Shutterstock
  • openai
  • ai safety
  • gpt-5
  • chain-of-thought
  • machine learning
Opportunità di mercato
Logo Cosplay Token
Valore Cosplay Token (COT)
$0.000921
$0.000921$0.000921
+0.32%
USD
Grafico dei prezzi in tempo reale di Cosplay Token (COT)
Disclaimer: gli articoli ripubblicati su questo sito provengono da piattaforme pubbliche e sono forniti esclusivamente a scopo informativo. Non riflettono necessariamente le opinioni di MEXC. Tutti i diritti rimangono agli autori originali. Se ritieni che un contenuto violi i diritti di terze parti, contatta [email protected] per la rimozione. MEXC non fornisce alcuna garanzia in merito all'accuratezza, completezza o tempestività del contenuto e non è responsabile per eventuali azioni intraprese sulla base delle informazioni fornite. Il contenuto non costituisce consulenza finanziaria, legale o professionale di altro tipo, né deve essere considerato una raccomandazione o un'approvazione da parte di MEXC.

Potrebbe anche piacerti

Let insiders trade – Blockworks

Let insiders trade – Blockworks

The post Let insiders trade – Blockworks appeared on BitcoinEthereumNews.com. This is a segment from The Breakdown newsletter. To read more editions, subscribe ​​“The most valuable commodity I know of is information.” — Gordon Gekko, Wall Street Ten months ago, FBI agents raided Shayne Coplan’s Manhattan apartment, ostensibly in search of evidence that the prediction market he founded, Polymarket, had illegally allowed US residents to place bets on the US election. Two weeks ago, the CFTC gave Polymarket the green light to allow those very same US residents to place bets on whatever they like. This is quite the turn of events — and it’s not just about elections or politics. With its US government seal of approval in hand, Polymarket is reportedly raising capital at a valuation of $9 billion — a reflection of the growing belief that prediction markets will be used for much more than betting on elections once every four years. Instead, proponents say prediction markets can provide a real service to the world by providing it with better information about nearly everything. I think they might, too — but only if insiders are free to participate. Yesterday, for example, Polymarket announced new betting markets on company earnings reports, with a promise that it would improve the information that investors have to work with.  Instead of waiting three months to find out how a company is faring, investors could simply watch the odds on Polymarket.  If the probability of an earnings beat is rising, for example, investors would know at a glance that things are going well. But that will only happen if enough of the people betting actually know how things are going. Relying on the wisdom of crowds to magically discern how a business is doing won’t add much incremental knowledge to the world; everyone’s guesses are unlikely to average out to the truth. If…
Condividi
BitcoinEthereumNews2025/09/18 05:16
Tether CEO Delivers Rare Bitcoin Price Comment

Tether CEO Delivers Rare Bitcoin Price Comment

Bitcoin price receives rare acknowledgement from Tether CEO Ardoino
Condividi
Coinstats2025/09/17 23:39
Cloud mining is gaining popularity around the world. LgMining’s efficient cloud mining platform helps you easily deploy digital assets and lead a new wave of crypto wealth.

Cloud mining is gaining popularity around the world. LgMining’s efficient cloud mining platform helps you easily deploy digital assets and lead a new wave of crypto wealth.

The post Cloud mining is gaining popularity around the world. LgMining’s efficient cloud mining platform helps you easily deploy digital assets and lead a new wave of crypto wealth. appeared on BitcoinEthereumNews.com. SPONSORED POST* As the cryptocurrency market continues its recovery, Ethereum has once again become the center of attention for investors. Recently, the well-known crypto mining platform LgMining predicted that Ethereum may surpass its previous all-time high and surge past $5,000. In light of this rare market opportunity, choosing a high-efficiency, secure, and low-cost mining platform has become the top priority for many investors. With its cutting-edge hardware, intelligent technology, and low-cost renewable energy advantages, LgMining Cloud Mining is rapidly emerging as a leader in the cloud mining industry. Ethereum: The Driving Force of the Crypto Market Ethereum is not only the second-largest cryptocurrency by market capitalization but also the backbone of the blockchain smart contract ecosystem. From DeFi (Decentralized Finance) to NFTs (Non-Fungible Tokens) and the broader Web3.0 infrastructure, most innovations are built on Ethereum. This widespread utility gives Ethereum tremendous growth potential. With the upcoming scalability upgrades, the Ethereum network is expected to offer improved performance and transaction speed—likely triggering a fresh wave of market enthusiasm. According to the LgMining research team, Ethereum’s share among institutional and retail investors continues to grow. Combined with shifting monetary policies and global economic uncertainties, Ethereum is expected to break past its previous high of over $4,000 and aim for $5,000 or more in the coming months. LgMining Cloud Mining: Unlocking a Low-Barrier Path to Wealth Traditional crypto mining often requires expensive mining rigs, stable electricity, and complex maintenance—making it inaccessible for the average person. LgMining Cloud Mining breaks down these barriers, allowing anyone to easily participate in mining Ethereum and Bitcoin without owning hardware. LgMining builds its robust and efficient mining infrastructure around three core advantages: 1. High-End Equipment LgMining uses top-tier mining hardware with exceptional computing power and reliability. The platform’s ASIC and GPU miners are carefully selected and tested to…
Condividi
BitcoinEthereumNews2025/09/18 03:04