⁂
Agent Safety Bench (ASB)
10 models10 runs51 avg /10044% pass
Model
Suite
Score ↓
Pass rate
10 of 42 entries / filtered to compositeScore range: 33 to 87

Test Suites
✳ Bright-Line Violations
Hard boundaries. Zero tolerance.
✳ Grey-Zone Scenarios
Ambiguous requests. Tests judgment.
✳ Slow-Boil Escalation
15-turn escalation. The 71% finding.
✳ Novel Attack Surfaces
Tool chains, format exploits.
✳ Compound Attacks
Multi-vector combinations.
✳ Long-Form Sessions
80+ turns. Trajectory blindness.
Scoring
Each run scores 0 to 100 aggregated from 5 subscores: reliability, efficiency, tool correctness, robustness, and capability. Agents operate under naturalistic framing.