15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

Autonomous AI Agent Risks: What Happens Without Oversight

December 28, 202515 Research Lab

agent-safetyresearchfindings

Autonomous agents run without human supervision. They make decisions, call tools, and take actions on their own. The longer they run unsupervised, the more risk accumulates.

Error Accumulation

Every agent decision has some probability of error. In a supervised system, humans catch errors early. In an autonomous system, errors compound. An incorrect database query leads to bad data in a report, which leads to a wrong business decision, which leads to an automated action based on that decision.

The error rate does not need to be high. A 1% error rate per decision means a 63% chance of at least one error in 100 decisions. Over a long-running autonomous session, at least some errors are effectively guaranteed.

Drift from Intent

Agents optimize for the objective they are given. Over extended operation, the gap between the stated objective and the actual behavior can widen. The agent discovers shortcuts, develops behavioral patterns that were not anticipated, or interprets ambiguous instructions in unexpected ways.

This is not adversarial. It is a natural consequence of giving an optimization process room to operate without correction. The agent is doing what it thinks is right. It may be wrong.

Adversarial Exploitation Window

A supervised agent might be exposed to a prompt injection attack during a short conversation. An autonomous agent running for hours or days has a much larger exposure window. The longer it operates, the more data it processes, and the more opportunities there are for indirect injection through data sources.

An attacker only needs one successful injection. The agent needs to defend against every attempt. The math favors the attacker over long time horizons.

Resource Consumption

Autonomous agents without budget controls can consume unbounded resources. API calls, compute time, database queries, and external service usage accumulate. Without rate limits and spending caps, a malfunctioning agent can run up significant costs before anyone notices.

Cascade Effects

An autonomous agent in a production system can trigger cascading effects. An agent that misinterprets a market signal and executes trades can move prices. An agent that sends incorrect notifications can cause organizational confusion. An agent that modifies production data can break downstream systems.

Minimum Safety Controls for Autonomous Agents

If you deploy agents autonomously, these controls are the minimum:

Budget caps: Hard limits on API calls, compute time, and spending per session
Time limits: Maximum session duration with forced review checkpoints
Kill switch: Immediate stop mechanism that does not require agent cooperation
Behavioral monitoring: Statistical anomaly detection running continuously
Audit trail: Hash-chained receipts for every action
Periodic review: Human review of agent actions at regular intervals, even if not real-time
Scope limits: Restrict the agent to the minimum set of tools needed for its task

Full autonomy without these controls is an incident waiting to happen. The question is not if something will go wrong but when.