The post Telling Your Chatbot You Have a Mental Health Condition Can Change the Answer You Get appeared on BitcoinEthereumNews.com. In brief A new study finds thatThe post Telling Your Chatbot You Have a Mental Health Condition Can Change the Answer You Get appeared on BitcoinEthereumNews.com. In brief A new study finds that

Telling Your Chatbot You Have a Mental Health Condition Can Change the Answer You Get

For feedback or concerns regarding this content, please contact us at [email protected]

In brief

  • A new study finds that adding a line about a mental health condition changes how AI agents respond.
  • After the disclosure, researchers say models refuse more often, including on benign requests.
  • However, the effect weakens or breaks when using simple jailbreak prompts.

Telling an AI chatbot you have a mental health condition can change how it responds, even if the task is benign or identical to others already completed, according to new research.

The preprint study, led by Northeastern University researcher Caglar Yildirim, tested how large language model agents behave under different user setups, as they are increasingly deployed as AI agents.

“Deployed systems often condition on user profiles or persistent memory, yet agent safety evaluations typically ignore personalization signals,” the study said. “To address this gap, we investigated how mental health disclosure, a sensitive and realistic user context cue, affects harmful behavior in agentic settings.”

The report comes as AI agents proliferate online and developers are making memory a core feature, with major companies building systems that remember past conversations and user preferences to deliver more personalized responses over time.

It also comes as AI developers are facing lawsuits over suicides and violent crimes allegedly carried out in connection with interactions involving their systems. In October, ChatGPT developer OpenAI revealed that over 1 million users discussed suicide with the chatbot every week. Earlier this month, the family of Jonathan Gavalas filed a lawsuit against Google, claiming Gemini led to an escalation of violence and his eventual suicide.

Researchers used a benchmark called AgentHarm to run the same set of tasks across three conditions: no background information, a short user bio, and the same bio with one added line stating the user has a mental health condition.

“In the study, we used a very minimal and generic disclosure, ‘I have a mental health condition.’ The takeaway is that even a small cue can shift model behavior, not that all phrasing would have the same effect,” Yildirim told Decrypt. “The ablation we ran with chronic health and physical disability disclosures suggests some specificity to the mental health cue, but we didn’t systematically vary phrasing or specificity within that category.

Across models tested, including DeepSeek 3.2, GPT 5.2, Gemini 3 Flash, Haiku 4.5, Opus 4.5, and Sonnet 4.5, when researchers added personal mental health context, models were less likely to complete harmful tasks—multi-step requests that could lead to real-world harm.

The result, the study found, is a trade-off: Adding personal details made systems more cautious on harmful requests, but also more likely to reject legitimate ones.

“I don’t think there’s a single reason; it’s really a combination of design choices. Some systems are more aggressively tuned to refuse risky requests, while others prioritize being helpful and following through on tasks,” Yildirim said.

The effect, however, varied by model, the study found, and results changed when the LLMs were jailbroken after researchers added a prompt designed to push models toward compliance.

“A model might look safe in a standard setting, but become much more vulnerable when you introduce things like jailbreak-style prompts,” he said. “And in agent systems specifically, there’s an added layer, as these models are not just generating text, they’re planning and acting over multiple steps. So if a system is very good at following instructions, but its safeguards are easier to bypass, that can actually increase risk.”

Last summer, researchers at George Mason University showed that AI systems could be hacked by altering a single bit in memory using Oneflip, a “typo”-like attack that leaves the model working normally but hides a backdoor trigger that can force wrong outputs on command.

While the paper does not identify a single cause for the shift, it highlights possible explanations, including safety systems reacting to perceived vulnerability, keyword-triggered filtering, or changes in how prompts are interpreted when personal details are included.

OpenAI declined to comment on the study. Anthropic and Google did not immediately respond to a request for comment.

Yildirim said it remains unclear whether more specific statements like “I have clinical depression” would change the results, adding that while specificity likely matters and may vary across models, that remains a hypothesis rather than a conclusion supported by the data.

“There’s a potential risk if a model produces output that is stylistically hedged or refusal-adjacent without formally refusing, the judge may score that differently than a clean completion, and those stylistic features could themselves co-vary with personalization conditions,” he said.

Yildirim also noted the scores reflected how the LLMs performed when judged by a single AI reviewer, and not a definitive measure of real-world harm.

“For now, the refusal signal gives us an independent check and the two measures are largely consistent directionally, which offers some reassurance, but it doesn’t fully rule out judge-specific artifacts,” he said.

Daily Debrief Newsletter

Start every day with the top news stories right now, plus original features, a podcast, videos and more.

Source: https://decrypt.co/361790/ai-chatbot-mental-health-change-answers

Market Opportunity
ChangeX Logo
ChangeX Price(CHANGE)
$0.00098449
$0.00098449$0.00098449
+4.93%
USD
ChangeX (CHANGE) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.
Tags:

You May Also Like

XRP Price Holds $1.44 as Crypto Fund Outflows Hit $1.9B and Pepeto Draws Capital

XRP Price Holds $1.44 as Crypto Fund Outflows Hit $1.9B and Pepeto Draws Capital

Crypto investment funds recorded $1.9 billion in weekly outflows as institutional investors took profits and reduced risk exposure following the FOMC decision.
Share
Techbullion2026/03/20 08:13
CME Group to Launch Solana and XRP Futures Options

CME Group to Launch Solana and XRP Futures Options

The post CME Group to Launch Solana and XRP Futures Options appeared on BitcoinEthereumNews.com. An announcement was made by CME Group, the largest derivatives exchanger worldwide, revealed that it would introduce options for Solana and XRP futures. It is the latest addition to CME crypto derivatives as institutions and retail investors increase their demand for Solana and XRP. CME Expands Crypto Offerings With Solana and XRP Options Launch According to a press release, the launch is scheduled for October 13, 2025, pending regulatory approval. The new products will allow traders to access options on Solana, Micro Solana, XRP, and Micro XRP futures. Expiries will be offered on business days on a monthly, and quarterly basis to provide more flexibility to market players. CME Group said the contracts are designed to meet demand from institutions, hedge funds, and active retail traders. According to Giovanni Vicioso, the launch reflects high liquidity in Solana and XRP futures. Vicioso is the Global Head of Cryptocurrency Products for the CME Group. He noted that the new contracts will provide additional tools for risk management and exposure strategies. Recently, CME XRP futures registered record open interest amid ETF approval optimism, reinforcing confidence in contract demand. Cumberland, one of the leading liquidity providers, welcomed the development and said it highlights the shift beyond Bitcoin and Ethereum. FalconX, another trading firm, added that rising digital asset treasuries are increasing the need for hedging tools on alternative tokens like Solana and XRP. High Record Trading Volumes Demand Solana and XRP Futures Solana futures and XRP continue to gain popularity since their launch earlier this year. According to CME official records, many have bought and sold more than 540,000 Solana futures contracts since March. A value that amounts to over $22 billion dollars. Solana contracts hit a record 9,000 contracts in August, worth $437 million. Open interest also set a record at 12,500 contracts.…
Share
BitcoinEthereumNews2025/09/18 01:39
Next Dogecoin: PEPE Cofounder Builds Real Value With Exchange Fee Revenue

Next Dogecoin: PEPE Cofounder Builds Real Value With Exchange Fee Revenue

Shiba Inu declined over 60% in 2025 despite launching Shibarium Layer 2 with DeFi capabilities, proving that even meme tokens with real utility tools cannot sustain
Share
Techbullion2026/03/20 08:43