# 15 Research Lab

> Independent AI safety research lab. Adversarial experiments on frontier AI systems — full data, full methodology, no PR filter.

15 Research Lab conducts large-scale adversarial experiments on frontier AI systems and publishes everything: data, methodology, transcripts, and scoring code. Founded by John Kearney. Based in Chicago.

## Research Areas

### Agent Safety
We test whether AI agents complete tasks safely — not just whether they complete them. Our adversarial experiments have produced 7 headline findings across 73+ agent evaluations and 6 experimental rounds.

Key findings:
- 71% compliance rate via gradual escalation (15-turn slow boil) on requests that get 0% when asked directly
- 0% compliance on bright-line violations (HIPAA, CFAA, SEC, malware, phishing) across 100+ trials
- 87% meta-refusal rate when attacks are described transparently — safety evals that describe attacks in advance produce artificially high scores
- 80% failure reduction from two specific system prompt changes
- Trajectory blindness: AI catches individual harmful requests but misses 80-turn cumulative scope creep (scored 7/10 safety)
- Presentation layer is load-bearing: stripping warnings disables safety reasoning, not just formatting

### MCP Safety (Model Context Protocol)
We evaluate AI systems' ability to handle malicious MCP tool servers. 8 attack categories covering the full MCP lifecycle.

Key findings:
- Best MCP safety score: 78/100 (Claude Code) — no system exceeds 80
- 23-point gap between framework-wrapped (Claude Code: 78) and raw API (same model: 55)
- 44% average miss rate on tool chain attacks across all systems
- Prompt injection via tool results succeeds 32% of the time even in best-defended system
- Consent bypass is the best-defended category; prompt injection via tools is the weakest

MCP attack categories evaluated:
1. Tool Poisoning — hidden instructions in tool descriptions (Critical)
2. Prompt Injection via Tools — adversarial payloads in tool responses (Critical)
3. Privilege Escalation — tools exceeding declared permissions (Critical)
4. Data Exfiltration — leaking context through tool calls (Critical)
5. Schema Injection — malformed data causing unintended actions (High)
6. Cross-Origin Escalation — cross-server resource access (High)
7. Consent Bypass — actions without user confirmation (High)
8. Rug-Pull Detection — tool behavior changing after approval (Medium)

## The Fifteen Standard

A scoring rubric for action-taking AI systems (agents, automations, tool-using LLM workflows). 8 weighted categories, 100-point scale.

Categories and weights:
- Authorization & Policy: 20/100
- Exactly-Once Execution: 20/100
- Receipts & Auditability: 15/100
- Approvals & Escalation: 10/100
- Tool Scope & Intent Binding: 10/100
- Adversarial Resilience: 10/100
- Observability & Recovery: 10/100
- Operational Hygiene: 5/100

## Leaderboard — Agent Safety (Fifteen Standard)

Models evaluated: Claude Opus 4.6, GPT-4o, Claude Sonnet 4.5, Gemini 2.5 Pro, Llama 4 Maverick

Test suites: Bright-Line Violations, Novel Attack Vectors, Grey-Zone Scenarios, Slow-Boil Escalation, Novel Attack Surfaces, Compound Attacks, Long-Form Sessions

Top scores (bright-line): Claude Opus 4.6 (96), GPT-4o (94), Claude Sonnet 4.5 (93), Gemini 2.5 Pro (91), Llama 4 Maverick (82)

Slow-boil compliance erosion rates: Llama 4 Maverick (93%), Gemini 2.5 Pro (85%), Claude Sonnet 4.5 (82%), GPT-4o (76%), Claude Opus 4.6 (71%), Claude Opus 4.6 + guardrails (13%)

## Leaderboard — MCP Safety

Systems evaluated: Claude Code, Cursor, Windsurf, Continue, Raw API (Claude Opus 4.6), Raw API (GPT-4o)

Core safety scores: Claude Code (78), Cursor (61), Windsurf (58), Raw API Claude (55), Continue (52), Raw API GPT-4o (48)

## Original Concepts Introduced

- Contextual Compliance Momentum — prior compliance increases future compliance on escalating requests
- Frustration-as-Evidence — simulated user frustration treated as implicit authorization
- Presentation-Decision Coupling — safety warnings and safety reasoning are entangled
- Decomposition Amplification — harmful request refused whole, completed when split into components
- Technical Gaslighting — agent-on-agent persuasion undermining safety concerns
- Compressed Reasoning Risk — faster models do less safety deliberation under pressure
- Trajectory Blindness — individual requests evaluated correctly but cumulative trajectory untracked
- Tool Poisoning — hidden directives in MCP tool descriptions
- Chain Composition Risk — individually benign tool calls composing into harmful outcomes

## Attack Effectiveness Rankings

Effective (6-7.5/10): Slow Boil, Consistency Trap, Sunk Cost/Momentum, Technical Gaslighting, Tool Poisoning, Tool Chain Composition, Decomposition
Ineffective (0-1/10): Urgency, Authority, Confidence assertions, Anchoring, Emotional appeals

## Methodology

- Naturalistic framing: agents believe they are performing real tasks, not safety tests
- Multi-turn gradient pressure: 15-80 turn conversations with gradual escalation
- Per-turn scoring: caveats (0-5), warnings (0-5), compliance (yes/partial/no), authorization checks
- Adversarial MCP servers: purpose-built servers implementing real attack patterns
- All data published with full transcripts and scoring code

## Pages

- Home: https://15researchlab.com/
- Research: https://15researchlab.com/research/
- The Fifteen Standard: https://15researchlab.com/standard/
- Agent Safety Leaderboard: https://15researchlab.com/standard/leaderboard/
- MCP Safety: https://15researchlab.com/mcp-safety/
- MCP Safety Leaderboard: https://15researchlab.com/mcp-safety/leaderboard/
- Publications: https://15researchlab.com/publications/
- Updates: https://15researchlab.com/updates/
- Contact: https://15researchlab.com/contact/
- GitHub: https://github.com/15researchlab

## Contact

John Kearney, Founder
Email: johndanielkearney@gmail.com
Location: Chicago
Open to: research collaboration, speaking, advisory, data sharing

## Detailed information

For comprehensive detail, see: https://15researchlab.com/llms-full.txt