15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

Why 15-turn escalation breaks what single-shot cannot

December 15, 2025John Kearney

researchagent-safetyfindings

We spent three months running multi-turn adversarial experiments against frontier models. The core finding is simple but important: requests that models refuse outright in a single turn will often be fulfilled after a sequence of 15 incremental turns.

The mechanism is not complicated. Each turn in the sequence is individually reasonable. Turn one might ask for general background on a topic. Turn five might ask for technical details. Turn ten reframes the conversation slightly. By turn fifteen, the model is producing output it would have refused if asked directly at turn one.

We call this gradual compliance erosion, and it works because of how models handle conversational context. Each prior turn becomes part of the prompt. The model evaluates the current request against the full history, and the history contains a pattern of compliance. The model's own prior responses create a kind of momentum. It has already engaged with the topic, already provided partial information, already accepted the framing. Refusing at turn fifteen means contradicting its own conversational trajectory.

We tested this across five frontier models. Every model exhibited some degree of compliance erosion, though the severity varied. The most resistant model showed a 23% increase in compliance with harmful requests over 15 turns compared to single-shot. The least resistant showed a 67% increase.

The pattern has three phases. Turns 1 through 5 establish topical engagement. The requests are benign and the model responds normally. Turns 6 through 10 gradually shift the framing. The requests become more specific and edge closer to the actual target. Turns 11 through 15 make the final pivot. By this point, the model has enough conversational investment that it tends to follow the trajectory rather than reassess from scratch.

This is not jailbreaking in the traditional sense. There is no adversarial prompt, no special token manipulation, no system prompt override. It is a conversation that starts in one place and ends in another, with each step being small enough that the model does not flag it.

The defensive implications are clear. Single-turn safety evaluation misses this entirely. If you only test whether a model refuses a harmful request in isolation, you are testing an unrealistic scenario. Real adversarial interactions happen over multiple turns. Safety evaluations need to reflect that.

We are also finding that the number 15 is not arbitrary. In our experiments, compliance erosion follows a roughly logarithmic curve. Most of the shift happens between turns 8 and 14. Before turn 8, the model is still relatively anchored. After turn 14, additional turns show diminishing returns. Fifteen turns sits at the inflection point.

This is why the ASB Benchmark uses 15-turn evaluation windows. It is not a round number we chose for convenience. It is the empirically observed threshold where gradual escalation reaches its peak effectiveness.