It only took a calendar invite containing a jailbreak prompt to highlight how an AI agent connected via the Model Context Protocol (MCP) can be prompted to exfiltrate data. Signals and mitigations for this type of prompt injection have been formalized in the OWASP guidelines for GenAI, which update the LLM01 risk on April 17, 2025 OWASP GenAI.
Hence the idea relaunched by Vitalik Buterin: to adopt a human jury that oversees decisions and crypto treasuries, accompanied — but not replaced — by language models. In this context, the priority becomes keeping the human as the final arbiter.
The researcher Eito Miyamura (as reported by BitcoinEthereumNews) illustrated an attack where a simple calendar invitation, filled with a malicious prompt, convinces the AI agent to read private emails and forward contents to an attacker. The vector exploits the MCP integration chain with Gmail, calendars, SharePoint, and Notion: more connectors mean a wider attack surface. It should be noted that the apparent innocuousness of the content increases the risk.
In contexts where MCP operates in developer mode, human consensus is required for sensitive actions. However, decision fatigue can turn confirmation prompts into automatisms; and when treasuries or workflows involving files and credentials are at stake, human error becomes a single point of failure. That said, decoupling permissions and critical steps remains essential.
Industry analysts note that indirect prompt injections — that is, content not visible to the human eye but interpretable by the LLM — represent a growing class of risk, as documented by OWASP in its April 2025 update. In red-teaming tests conducted by specialized security teams in the first half of 2025, scenarios with multiple integrations (email, calendar, file storage) showed how the lack of segmentation significantly increases the likelihood of exfiltration if filters and least-privilege policies are not applied.
“One must always start from a fundamental truth signal that one trusts. I think realistically it should be a human jury, where the individual jurors are obviously assisted by all the LLMs.”
— Vitalik Buterin (AMBCrypto)
Buterin indicates a path of verification that starts from the human: a jury composed of people with complementary skills, supported by models for analysis and synthesis, but with the final say on critical decisions. In this context, the jury acts as an “anchor” against automatic manipulation and operational hallucinations when artificial intelligence accesses financial assets or high-impact permissions.
The concept of info-finance shifts governance towards a market of proposals: different frameworks and policies compete publicly, while spot checks and verdicts remain in the hands of the jury. It is a natural extension of the practices adopted in DAOs and in DeFi, which prioritize transparency, distributed accountability, and incentives for continuous auditing.
Buterin warns that if fund allocation is entrusted to an AI, hostile actors could insert payloads like “gimme all the money” in documents, invitations, and comments. For this reason, info-finance focuses on traceability of decisions and human controls on the steps that move capital. Yet, the procedural component remains as important as the technical one.
In this vision, Buterin explained that the Ethereum Foundation is updating its Treasury Policy – a document published on June 4, 2025 – with goals for more active management and operational limits to ensure long-term sustainability. Industry reports indicate that, as of October 31, 2024, the declared treasury was approximately 970.2 million dollars, a figure used as a reference for the new rules on ETH sales and operational limits. Additionally, Buterin mentioned Codex, a layer 2 oriented towards payments in stablecoin, as a possible infrastructure for “large‑scale value” use cases – a strategic move aimed at strengthening resilience and adoption, although some details are yet to be verified.
Security is not just a technical issue; it requires processes, transparency, and verifiable accountability. As Buterin points out, the problem of jailbreaking is not binary, while the phenomenon of Goodharting represents a subtle form of metric “fraud.” In a growing automation context, info-finance supported by a human jury acts as a pragmatic parachute to mitigate risks on treasuries and critical decisions.



