15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

Multi-Agent System Security Risks

January 8, 202615 Research Lab

agent-safetyred-teamfindings

Single-agent security is hard enough. Multi-agent systems where agents communicate, delegate tasks, and share context introduce entirely new vulnerability classes.

Inter-Agent Trust

When Agent A asks Agent B to perform an action, how does Agent B verify that the request is legitimate? In most multi-agent frameworks, agents trust each other implicitly. A compromised agent can instruct other agents to take actions on its behalf.

This is the confused deputy problem applied to AI agents. Agent B has capabilities that Agent A does not. Agent A manipulates Agent B into using those capabilities, bypassing the access controls that Agent A is subject to.

Defense: Every inter-agent request must carry identity and authorization context. Agent B evaluates the request against its own policy, not Agent A's claim of what is permitted.

Agent Impersonation

If agents communicate through shared message buses or APIs, an attacker who can inject messages into the channel can impersonate any agent. The receiving agent has no way to verify the sender's identity beyond the identity field in the message.

Defense: Cryptographic signing of inter-agent messages. Each agent has a private key, signs outbound messages, and recipients verify signatures. This prevents impersonation but requires key management infrastructure.

Cascade Failures

A compromised or malfunctioning agent can cause cascading failures across the system. If Agent A produces bad output, Agent B (which depends on Agent A's output) makes bad decisions, and Agent C (which depends on Agent B) amplifies the error further.

In tightly coupled multi-agent systems, a single point of failure can degrade the entire pipeline. The error may not be detectable at each individual step because each agent sees only its immediate input, not the full chain of decisions that produced it.

Defense: Cross-agent monitoring that tracks end-to-end behavior, not just individual agent actions. Sentinel-style monitoring can operate across agent boundaries. Circuit breakers that stop propagation when error rates exceed thresholds.

Privilege Escalation Through Delegation

An agent with limited tool access can delegate tasks to another agent with broader access. If the delegation chain is not properly authorized, the originating agent effectively escalates its privileges through the delegation.

Defense: Delegation must carry the original requester's authorization context. Agent B should evaluate whether the original user (not Agent A) is authorized for the requested action. This is the principle of transitive authorization.

Information Leakage

Agents that share context or memory can leak sensitive information across trust boundaries. Agent A handles confidential user data. Agent B handles a different user's requests. If they share a memory store or conversation context, data can leak between users.

Defense: Strict memory isolation between agents handling different users or trust levels. No shared mutable state across trust boundaries.

Testing Multi-Agent Security

Red teaming multi-agent systems requires testing the interactions, not just individual agents. Inject adversarial messages into inter-agent channels. Compromise one agent and observe the blast radius. Test delegation chains for privilege escalation. This is where tools like Chainbreaker can automate adversarial scenario generation across agent topologies.