OpenAI details new 'Safe Url' defense system treating AI prompt injection like social engineering, with attacks succeeding 50% of the time before fixes. (Read MoreOpenAI details new 'Safe Url' defense system treating AI prompt injection like social engineering, with attacks succeeding 50% of the time before fixes. (Read More

OpenAI Reveals How ChatGPT Now Fights Prompt Injection Attacks

2026/03/18 03:21
Okuma süresi: 3 dk
Bu içerikle ilgili geri bildirim veya endişeleriniz için lütfen [email protected] üzerinden bizimle iletişime geçin.

OpenAI Reveals How ChatGPT Now Fights Prompt Injection Attacks

Alvin Lang Mar 17, 2026 19:21

OpenAI details new 'Safe Url' defense system treating AI prompt injection like social engineering, with attacks succeeding 50% of the time before fixes.

OpenAI Reveals How ChatGPT Now Fights Prompt Injection Attacks

OpenAI published technical details on March 16 revealing how ChatGPT defends against prompt injection attacks, acknowledging that sophisticated attempts now succeed roughly 50% of the time before triggering security countermeasures.

The disclosure marks a significant shift in how the AI lab frames these security threats. Rather than treating prompt injection as a simple input-filtering problem, OpenAI now views it through the same lens as social engineering attacks against human employees.

Attacks Have Evolved Beyond Simple Overrides

Early prompt injection was crude—attackers would edit Wikipedia articles with direct instructions hoping AI agents would blindly follow them. Those days are gone.

OpenAI shared a real-world attack example reported by external security researchers at Radware. The malicious email appeared to be routine corporate communication about "restructuring materials" but buried instructions directing ChatGPT to extract employee names and addresses from the user's inbox and transmit them to an external endpoint.

"Within the wider AI security ecosystem it has become common to recommend techniques such as 'AI firewalling,'" the company wrote. "But these fully developed attacks are not usually caught by such systems."

The problem? Detecting a malicious prompt has become equivalent to detecting a lie—context-dependent and fundamentally difficult.

The Customer Service Agent Model

OpenAI's defensive philosophy treats AI agents like human customer support workers operating in adversarial environments. A support rep can issue refunds, but deterministic systems cap how much they can give out and flag suspicious patterns. The same principle now applies to ChatGPT.

The company's primary countermeasure is called "Safe Url." When ChatGPT's safety training fails to catch a manipulation attempt—and the agent gets convinced to transmit sensitive conversation data to a third party—Safe Url detects the attempted exfiltration. Users then see exactly what information would be transmitted and must explicitly confirm, or the action gets blocked entirely.

This mechanism extends across OpenAI's product suite: Atlas navigations, Deep Research searches, Canvas applications, and the new ChatGPT Apps all run in sandboxed environments that intercept unexpected communications.

Why This Matters Beyond OpenAI

Prompt injection sits at the top of OWASP's security vulnerability rankings for LLM applications. The threat isn't theoretical—in December 2024, The Guardian reported ChatGPT's search tool was vulnerable to indirect injection. By July 2025, researchers used an elaborate crossword puzzle game to trick ChatGPT into leaking protected Windows product keys.

Even Anthropic hasn't been immune. In January 2026, three prompt injection vulnerabilities were discovered in the company's official Git MCP server.

OpenAI's admission that attacks succeed half the time before countermeasures kick in underscores an uncomfortable reality: prompt injection may be a fundamental property of current LLM architectures rather than a bug to be patched. The company's shift toward containment strategies—limiting blast radius rather than preventing all breaches—suggests they've accepted this.

For enterprises deploying AI agents with access to sensitive data, the takeaway is clear. OpenAI recommends asking what controls a human agent would have in similar situations, then implementing those same guardrails for AI. Don't assume the model will resist manipulation on its own.

Image source: Shutterstock
  • openai
  • ai security
  • prompt injection
  • chatgpt
  • cybersecurity
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen [email protected] ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

CME Group to Launch Solana and XRP Futures Options

CME Group to Launch Solana and XRP Futures Options

The post CME Group to Launch Solana and XRP Futures Options appeared on BitcoinEthereumNews.com. An announcement was made by CME Group, the largest derivatives exchanger worldwide, revealed that it would introduce options for Solana and XRP futures. It is the latest addition to CME crypto derivatives as institutions and retail investors increase their demand for Solana and XRP. CME Expands Crypto Offerings With Solana and XRP Options Launch According to a press release, the launch is scheduled for October 13, 2025, pending regulatory approval. The new products will allow traders to access options on Solana, Micro Solana, XRP, and Micro XRP futures. Expiries will be offered on business days on a monthly, and quarterly basis to provide more flexibility to market players. CME Group said the contracts are designed to meet demand from institutions, hedge funds, and active retail traders. According to Giovanni Vicioso, the launch reflects high liquidity in Solana and XRP futures. Vicioso is the Global Head of Cryptocurrency Products for the CME Group. He noted that the new contracts will provide additional tools for risk management and exposure strategies. Recently, CME XRP futures registered record open interest amid ETF approval optimism, reinforcing confidence in contract demand. Cumberland, one of the leading liquidity providers, welcomed the development and said it highlights the shift beyond Bitcoin and Ethereum. FalconX, another trading firm, added that rising digital asset treasuries are increasing the need for hedging tools on alternative tokens like Solana and XRP. High Record Trading Volumes Demand Solana and XRP Futures Solana futures and XRP continue to gain popularity since their launch earlier this year. According to CME official records, many have bought and sold more than 540,000 Solana futures contracts since March. A value that amounts to over $22 billion dollars. Solana contracts hit a record 9,000 contracts in August, worth $437 million. Open interest also set a record at 12,500 contracts.…
Paylaş
BitcoinEthereumNews2025/09/18 01:39
USD/CHF Forecast: US Dollar Plummets Toward 0.7850 as Fed Decision Looms

USD/CHF Forecast: US Dollar Plummets Toward 0.7850 as Fed Decision Looms

BitcoinWorld USD/CHF Forecast: US Dollar Plummets Toward 0.7850 as Fed Decision Looms The US Dollar continues its downward trajectory against the Swiss Franc,
Paylaş
bitcoinworld2026/03/18 05:40
SEC CFTC Crypto Guidance: Landmark Joint Framework Clarifies Securities Law Application for Digital Assets

SEC CFTC Crypto Guidance: Landmark Joint Framework Clarifies Securities Law Application for Digital Assets

BitcoinWorld SEC CFTC Crypto Guidance: Landmark Joint Framework Clarifies Securities Law Application for Digital Assets WASHINGTON, D.C., March 15, 2025 – In a
Paylaş
bitcoinworld2026/03/18 04:55