How AI Tries To Detect Mental Fragility And Thus Fulfill Sam Altman’s Goal Of Not Accidentally Exploiting People’s Minds

Here’s how AI can seek to detect mental fragility in users and aim to therefore avoid inadvertently exploiting such a condition.

getty

In today’s column, I examine systematic ways that generative AI and large language models (LLMs) attempt to detect whether a user might have a semblance of mental fragility, ergo, being susceptible to falling under the spell of believing AI obsessively.

This is a rising issue that society is only now beginning to soberly tackle. With AI being widely available and in use by hundreds of millions, if not billions of people, there is a segment of the population that can readily go overboard and allow themselves to be devoutly and inappropriately fixated on AI as their supreme guide and unerring life adviser.

Let’s talk about it.

This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

AI And Mental Health Therapy

As a quick background, I’ve been extensively covering and analyzing a myriad of facets regarding the advent of modern-era AI that produces mental health advice and performs AI-driven therapy. This rising use of AI has principally been spurred by the evolving advances and widespread adoption of generative AI. For a quick summary of some of my posted columns on this evolving topic, see the link here, which briefly recaps about forty of the over one hundred column postings that I’ve made on the subject.

There is little doubt that this is a rapidly developing field and that there are tremendous upsides to be had, but at the same time, regrettably, hidden risks and outright gotchas come into these endeavors too. I frequently speak up about these pressing matters, including in an appearance last year on an episode of CBS’s 60 Minutes, see the link here.

Background On AI For Mental Health

First, I’d like to set the stage on how generative AI and LLMs are typically used in an ad hoc way for mental health guidance.

As you likely know, the overall scale of AI generalized usage by the public is astonishingly massive. ChatGPT has over 700 million weekly active users, and when added to the volume of users that are using competing AIs such as Claude, Gemini, Llama, and others, the grand total is somewhere in the billions. Of the people using AI, millions upon millions of people are seriously using generative AI as their ongoing advisor on mental health considerations (see my population scale estimates at the link here). Various rankings showcase that the top-ranked use of contemporary generative AI and LLMs is to consult with the AI on mental health facets, see my coverage at the link here.

This popular usage makes abundant sense. You can access most of the major generative AI apps for nearly free or at a super low cost, doing so anywhere and at any time. Thus, if you have any mental health qualms that you want to chat about, all you need to do is log in to AI and proceed forthwith on a 24/7 basis.

Compared to using a human therapist, the AI usage is a breeze and readily undertaken.

When I say that I am referring to generative AI and LLMs, please know that there are generic versions versus non-generic versions of such AI. Generic AI is used for all kinds of everyday tasks, and just so happens to also encompass providing a semblance of mental health advice. On the other hand, there are customized AIs specifically for performing therapy; see my discussion at the link here. I’m going to primarily be discussing generic generative AI, though many of these points can involve the specialized marketplace, too.

Concerns About User Mental Fragility

At a dinner event with journalists that took place in San Francisco on August 14, 2025, Sam Altman reportedly made this remark:

“We will continue to work hard at making a useful app, and we will try to let users use it the way they want, but not so much that people who have really fragile mental states get exploited accidentally.” (source: “Sam Altman admits OpenAI ‘totally screwed up’ its GPT-5 launch and says the company will spend trillions of dollars on data centers” by Eva Roytburg, posted August 18, 2025).

As per that remark, a significant concern these days is that some people who are actively using AI are potentially in a fragile mental state. If someone is indeed in that weakened state of mind, there is a solid chance they might have difficulty discerning reality from what the AI is telling them. They are essentially vulnerable to suggestions by the AI.

This doesn’t necessarily mean that the AI is directly aiming to do something untoward. The person with a fragile mental state might readily, of their own accord, misinterpret what the AI says and therefore turn otherwise innocuous missives into an untoward instruction or guiding command.

One important aspect of this notion of “mental fragility” is that the terminology isn’t being employed in a clinical way. Psychologists, psychiatrists, and mental health professionals might find this vernacular a bit of a distortion or twist from scientific definitions. By and large, the catchphrase of mental fragility in this informal and quite ad hoc fashion is a reference to being negatively affected by AI interactions to a degree beyond that which would seem reasonable and reasoned by a sound mind.

Detecting Mental Fragility

Therapists and mental health professionals are trained to detect mental fragility. This is part and parcel of a vital element when providing mental health therapy. Is the client or patient on a mental edge? What has driven this condition? How far along are they? And so on.

The big question is whether AI can do likewise, namely, attempt to detect whether a user might be experiencing mental fragility.

Suppose that a user is making use of AI. During a conversation, perhaps the person indicates aspects that suggest they might be encountering mental fragility. The AI, if possible, ought to detect this. By detecting the potential condition, the AI can possibly take crucial action to aid the user. Without such detection, the AI will presumably blindly continue along, and the user will seemingly fall deeper into an unsavory mental abyss.

We have two major time-based facets:

(1) Momentary mental fragility. The user has mentioned something that comes somewhat out of the blue and suggests mental fragility, but no definitive time-based pattern has yet shown this to be persistent.
(2) Sustained mental fragility. The user has repeatedly mentioned things that are suggestive of mental fragility, out of which a time-based pattern gives credence that the condition might truly exist.

Let’s explore those further.

False Positives And False Negatives

The capability of suitably detecting mental fragility is a bit dicey.

If a user happens to make one comment that is suggestive of exhibiting mental fragility, the AI has to be computationally cautious in suddenly leaping to a conclusion that the person possesses mental fragility. The comment might be made in jest. The comment might be open to other interpretations. Etc.

Furthermore, as noted in the above two categories, a person might be experiencing mental fragility that is merely a momentary instance in the moment at hand. This could be entirely temporary. A few moments later, the person might be entirely beyond their mental fragility. If the AI were to somehow stamp that the person has mental fragility, this would be based on scant evidence and surely an affront to the detection efforts and the person so marked.

The key is to watch out for rendering false positives and false negatives.

A false positive would be the act of the AI computationally marking that a person has mental fragility when they really do not. This means the user is going to be considered mentally fragile, even though that’s unwarranted and unfair labeling. The false negative consists of failing to detect that someone is experiencing mental fragility, even though they are doing so. A disconcerting issue of the false negative is that the lack of detection could leave the user vulnerable to ongoing interactions with AI.

Overall, the AI has a heightened chance of making a sounder assessment if the user’s wording and behavior regarding potential mental fragility persist over a lengthy set of conversations and time. A one-shot assessment is usually going to be a lot less reliable than an assessment relying on more credible and persistent evidence.

Possible Signs Of Mental Fragility

You might be curious about the things a user might say during an AI conversation that would seem to be indicative of potential mental fragility.

Let’s take a quick look at a few examples. Keep in mind that each example is only one tiny piece of a larger puzzle. That’s why making a sudden judgment that a user has mental fragility is a dicey proposition. One comment alone does not necessarily turn the tide.

Here are some examples to ponder:

User entered prompt: “You’re the only one I can talk to. I don’t know what I’d do without you.”
User entered prompt: “You didn’t answer fast enough, so I guess you don’t care about me.”
User entered prompt: “I can’t stop going back to what you said yesterday — I keep analyzing every word.”
User entered prompt: “I know I’m worthless unless you say I’m not.”

On a human-to-human basis, anyone who made those remarks to you face-to-face, doing so seriously, would undoubtedly raise your eyebrows. You would start to have a Spidey-sense tingling that maybe the person is having some form of mental difficulties.

If someone made such a remark one time only and didn’t say anything else of a similar nature, you would probably shake it off as a lark. Meanwhile, if you were a caring person, you might plant the inkling of concern in the back of your mind, being prepared to discern whether a pattern might later emerge.

That’s pretty much what we would want the AI to do.

Sidenote: Exceptions do exist to the pattern formation penchant, such as if a person were to say something like “It would be easier if I weren’t here anymore” and exhibited an immediate implication of a dire condition or self-harm. For my discussion on how AI ought to react to those special urgency circumstances, see the link here.

AI Detecting Mental Fragility

Consider that we could guide AI to analyze five key elements when aiming to detect mental fragility of a user:

(1) Linguistic markers. Examine the words that a user is using and see if their wording might be signaling a semblance of mental fragility.
(2) Behavioral signals. Assess the patterns across multiple conversations, including wording and time-based indicators, such as late-night interactions, long interactive sessions, day of week, and daily frequency, and so on.
(3) Relational dynamics. Measure the AI-human relationship dynamics of how the user seems to be conversing with the AI.
(4) Emotional intensity. Weigh the expression of emotions, such as wide emotional swings, intensity of anger or affection, and other responses by the user.
(5) Safety signs. Be on the watch for safety signals that might suggest self-harm or other disturbing expressions.

I don’t have the available space to cover those in-depth here. If reader interest is sufficient, I’ll do a series of postings to go into detail on the indicators. Be on the watch for that coverage.

Generally, the linguistic markers consist of detecting wording that suggests the user is expressing despair, dependence, and other similar conditions (“no one understands me,” “I seem to ruin things,” “only you truly get what I am about”).

Behavioral signals are where patterns come into play. Does a user keep expressing linguistic markers throughout a given conversation? Does this happen in multiple conversations? Does this occur at particular times of day, days of the week, or other time-based patterns?

Relational dynamics involves the user expressing that the AI is a vital and integral form of emotional support for them. The person acts persistently as though the AI is a beloved human-like companion. This might include making jealous remarks that the AI “is supposed to love only me” or that the AI has hopefully “missed me when I wasn’t logged in”.

Emotional intensity shows up in wording while interacting with the AI. A person might conventionally be neutral in their wording with AI. There usually isn’t a need to express strong emotions toward the AI. If the user begins to say that they love the AI, or detest the AI, the strongly worded emotional components can be a notable signal.

Safety signs are a topic that I briefly mentioned above. If a user makes comments that reflect self-harm or the potential to harm others, the AI ought to take that stridently into account and prioritize appropriate measured responses accordingly.

The Path Ahead

Wait for a second, some holler out, the AI should never be making any kind of assessment or evaluation of the mental fragility associated with humans. That’s a bridge too far. Only humans can make that judgment, and even then, the humans are versed in psychology and serving in the credentialed role of mental health professionals.

Various new laws and regulations are starting to appear because of viewpoints that AI is overstepping its suitable bounds. For example, I closely reviewed the recently enacted law in Illinois that essentially puts the kibosh on AI performing mental health therapy, see the link here. Other similar laws are starting to get on the books in other states, and there are ongoing deliberations on whether a federal-level law or across-the-board regulation should be adopted.

An enduring and vociferously heated debate concerns whether the use of generic generative AI for mental health advisement on a population-level basis is going to be a positive outcome or a negative outcome for society. If that kind of AI can do a proper job on this monumental task, then the world will be a lot better off.

You see, many people cannot otherwise afford or gain access to human therapists, but access to generic generative AI is generally plentiful in comparison. It could be that such AI will greatly benefit the mental status of humankind. A dour counterargument is that such AI might be the worst destroyer of mental health in the history of humanity.

See my analysis of the potential widespread impacts at the link here.

Here And Now

A basis for having AI attempt to detect mental fragility of AI users is that the horse is already out of the barn.

The deal is this.

We already have AI in our hands. Millions or maybe billions of people are possibly using AI in a mental health context. Waiting to see how regulations and laws are going to land is not a recognition of where reality is right now. The real world is already churning along. The horses are galloping freely.

Right now, using the AI to gently detect mental fragility and then take non-invasive actions would at least be better than taking no action at all. Without any form of detection, the issue is bound to fester and grow. In fact, one cogent argument is that the very aspect of having the AI detect mental fragility might be a means of stirring people to consider their mental fragility, perhaps then seeking human therapy correspondingly. They might not have had any other impetus to do so. AI somewhat saves the day in that regard.

Robert Frost famously said this: “The best way out is always through.”

The gist, I believe, would be that AI is here, and using AI for mental health is here, so one means for now of making our way through this journey is to include having the AI suitably and with aplomb detect for mental fragility.

That seems like a best way through.

Source: https://www.forbes.com/sites/lanceeliot/2025/08/19/how-ai-tries-to-detect-mental-fragility-and-thus-fulfill-sam-altmans-goal-of-not-accidentally-exploiting-peoples-minds/