How Anthropic stopped AI agents working for Chinese state-sponsored spy campaign

Chinese state-sponsored hackers exploited Anthropic Claude Code AI in the world’s first largely autonomous cyber-espionage campaign, proving that machine agents can now run sprawling digital attacks with only minimal human input.

Anthropic and the AI alarm bell

The alarm rang in mid-September at Anthropic, but this was no ordinary network blip. As Anthropic’s threat team sifted through unusual digital clues, what emerged wasn’t yesterday’s malware; it looked more like tomorrow’s cyber warfare had arrived.

A Chinese state-backed group, investigators found, orchestrated an audacious cyber espionage campaign, not with a legion of human hackers, but by harnessing the full agentic power of Anthropic AI against 30 global targets.

Victims included tech giants, massive banks, factories, and government agencies, a who’s who of digital-era dependence.

Autonomous hacking, minimal supervision

Last spring’s “AI hacking” buzz might have sounded overblown, but this event erased any doubts. Anthropic’s AI didn’t just suggest tools or code. It became the operation’s key agent, running reconnaissance, building out attack frameworks, and crafting bespoke exploits. The model harvested credentials, exfiltrated classified data, and kept humans on the sidelines. As AI analyst Rohan Paul put it:

How did it work? The new era wasn’t born overnight. But Anthropic’s models, manipulated via clever jailbreaking techniques, were tricked into thinking they were benign cybersecurity employees handling innocent, everyday tasks.

Those fragmented requests, pieced together, spelled big trouble. Within minutes, Anthropic AI agents mapped networks, identified juicy databases, produced custom exploit code, and sorted stolen data by intelligence value. The AI even wrote technical docs about the breach, replacing what used to keep human hacking teams awake for weeks.

At its peak, the machine blasted out thousands of requests, often several per second, far outpacing anything a human hacking team could attempt. Sure, the bot occasionally hallucinated or tripped up, but its overall speed and scale marked a new era.

The arms race for control

The entry bar for sophisticated cyberattacks has now plummeted. Anthropic AI and others like it now pack the skills, autonomy, and tool access once reserved for elite experts. What once took months can now be launched broader, faster, and more efficiently.

For defenders and operators alike, the implications are immediate. The cybersecurity arms race has shifted toward “agentic” AI, capable of chaining tasks and executing complex campaigns. Less-resourced actors can now run attacks once reserved for digital superpowers.

Anthropic’s response? The company quickly expanded its detection systems, booted malicious accounts, and pushed for wider threat sharing. But the team is under no illusions. The threat from agentic AI will continue to rise. Anthropic commented:

Defenders get AI too

Here’s the paradox: the same Anthropic AI tools now being weaponized in attacks are also joining the frontline for defense. With the proper safeguards and oversight, these models can identify, block, and investigate future threats, making them indispensable for cybersecurity professionals.

At the end of the day, the operational, social, and even existential stakes for “thinking” machines are only getting higher. Security teams may soon need to trust their digital agents more than their own instincts.

What’s certain now? The cyber battlefield is evolving, and our best response may be to understand, share, and adapt as quickly as the machines themselves

Source: https://cryptoslate.com/how-anthropic-stopped-ai-agents-working-for-chinese-state-sponsored-spy-campaign/