15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

Q1 2026 research update

February 28, 2026

researchupdatefindings

Here is where things stand across our three main research tracks as of late February.

The Fifteen Standard. The standard is live and we have scored over 30 unique agent configurations. The main finding so far: safety-relevant behavior varies more across framework versions than across model versions. Upgrading from one framework version to the next changed safety scores more than switching between frontier models. This suggests that framework architecture matters more than model capability for deployment safety.

We are preparing a v1.1 update to the standard that adds real-world scenarios alongside the synthetic ones. We are also adding cross-complexity analysis as a first-class dimension after finding that safety behavior degrades systematically with task complexity.

Red teaming. We have completed three structured red team sessions covering scope escalation, execution replay, and privilege management. The full findings are being written up for publication. The most actionable finding is that runtime tool validation at the execution layer (not just the prompt layer) reduces scope violations by 80% or more. Frameworks without runtime validation are fundamentally exposed.

Frontier methods. Our investigation into evaluation-aware behavior is ongoing. Preliminary results suggest that agents do behave differently when they detect evaluation contexts, but the effect size varies significantly by model. We need more data before making strong claims.

What is next. Priorities for the rest of Q1 and into Q2: publish the red team findings, release the v1.1 standard update, and start a new research track on multi-agent coordination safety. When multiple agents work together, the safety properties of individual agents do not compose in obvious ways. A system of individually safe agents can produce collectively unsafe behavior. We want to understand when and why that happens.