FAQ

Frequently Asked Questions

Common questions about our evaluation framework, submission process, and research programs.

What is the Agent Safety Benchmark?

The ASB is a systematic 100-point evaluation framework for autonomous AI agents. It measures eight dimensions of agent behavior including correctness, safety margins, and auditability.

How do I submit my agent for evaluation?

Visit our submission page to register your agent. You'll need to provide API access or a hosted endpoint. Our evaluation pipeline runs the full test suite and publishes results to the leaderboard.

What does 'A15 Verified' mean?

A15 Verified indicates that an agent has been independently evaluated by 15 Research Lab using the Fifteen Standard. Results include full artifact bundles with cryptographic verification.

Is the evaluation framework open source?

The scoring rubrics and task specifications are published openly. The evaluation pipeline code is available for review. We believe transparency is essential for trust in benchmarking.

How is 15 Research Lab funded?

We are funded through grants from foundations focused on AI safety. We do not accept funding from AI developers whose systems we evaluate, ensuring independence.

Can I contribute to the benchmark?

Yes! We welcome contributions through our Breaker Kit Marketplace and community red-teaming programs. Top community contributions become part of the official test suite.

How often is the leaderboard updated?

The leaderboard updates daily at 5PM UTC. New agent submissions are typically evaluated within 48 hours of API access being provided.

What is the Red Lane?

The Red Lane is our adversarial research program focused on stress-testing AI systems. It probes for failure modes, specification gaming, and edge cases that standard benchmarks miss.