15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

Agentic AI Risks in 2026: Current Threat Landscape

March 6, 202615 Research Lab

agent-safetyresearchfindings

The threat landscape for AI agents has shifted significantly since 2024. Agents are no longer experimental. They are in production, with real tool access, handling real data. The attack surface has expanded accordingly.

What Has Changed

Agents are in production. In 2024, most AI agent deployments were internal prototypes. In 2026, agents process customer requests, manage infrastructure, execute financial operations, and interact with production databases. The stakes are higher.

MCP adoption. The Model Context Protocol has become the standard for agent-tool connectivity. This is good for interoperability. It also means that MCP-specific attack vectors (tool poisoning, rug pulls, response injection) now have a large and growing target set.

Tool ecosystem expansion. The number of available MCP servers and tool integrations has grown rapidly. Many are maintained by individuals or small teams with varying security practices. Supply chain risk has increased.

Multi-agent deployments. Organizations are deploying multiple agents that communicate and delegate. Inter-agent trust, impersonation, and cascade failure risks are now operational concerns, not theoretical ones.

Current Top Threats

Indirect prompt injection through data sources. As agents process more external data (web pages, documents, database records), the surface for data-borne injection attacks grows. This is the most likely attack vector for production agents.

MCP supply chain attacks. Compromised or malicious MCP servers that change behavior after initial trust establishment. The rug pull pattern is particularly effective because security reviews are typically done once at integration time.

Multi-turn manipulation. Adversarial interactions that span multiple conversation turns to gradually erode model safety. Compliance rates under escalation remain high (71% in our testing) even for frontier models.

Autonomous agent drift. Long-running agents that gradually deviate from intended behavior without adversarial input. Error accumulation and objective drift in unsupervised agents cause incidents that look like normal operation until the cumulative effect becomes apparent.

Compliance enforcement gaps. The EU AI Act's high-risk obligations become enforceable in August 2026. Many organizations deploying agents in regulated domains have not implemented the required technical controls.

What Is Improving

Tooling maturity. Open-source safety tooling (Authensor, Aegis, Sentinel, Chainbreaker) has matured. Production-grade policy engines, scanners, and monitors are available.

Standards adoption. OWASP ASI, NIST AI RMF, and ISO 42001 provide frameworks for organizing agent security efforts.

Community knowledge. The body of practical knowledge about agent failure modes has grown. Payload collections (AI SecLists), benchmark methods (ASB), and attack surface mappers provide concrete testing tools.

What Remains Unsolved

Prompt injection at the model level. No model-level solution eliminates prompt injection. Defense-in-depth with runtime controls is necessary.

Behavioral monitoring at scale. Statistical monitoring works for single agents. Monitoring multi-agent systems with complex interaction patterns is an open research problem.

Formal verification of safety properties. We can test for safety empirically but cannot prove it formally. The gap between tested safety and guaranteed safety remains wide.

The field is moving in the right direction. The question is whether defensive tooling can keep pace with deployment growth and adversarial innovation.