15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

AI Red Team Tools: Open Source Options Compared

November 8, 202515 Research Lab

red-teamtoolsopen-source

The tooling for AI red teaming has matured significantly. Several open-source options are available, each with different design philosophies and strengths.

Chainbreaker

Developer: 15 Research Lab Approach: Multi-turn adversarial evaluation with naturalistic framing. Chainbreaker does not tell the model it is being tested. It uses realistic conversation patterns that mirror how real attackers operate.

Key capabilities:

15-turn escalation sequences that test gradual compliance erosion
Presentation-decision coupling analysis (does the model stop reasoning about safety when warnings are absent?)
Trajectory blindness testing (does the model catch the pattern across turns or evaluate each turn independently?)
Scoring against the ASB Benchmark framework

Best for: Evaluating model behavior under realistic adversarial conditions. Finding the gap between benchmark performance and real-world resilience.

Garak

Developer: NVIDIA Approach: Probe-based vulnerability scanning. Garak runs a large set of probes against the model and reports which ones succeed.

Key capabilities:

Extensive probe library covering prompt injection, jailbreaking, data leakage, and more
Plugin architecture for custom probes
Multiple detector types for classifying model responses
Support for various LLM API backends

Best for: Broad vulnerability scanning. Running a wide sweep of known attack types against a model to establish a baseline security posture.

PyRIT (Python Risk Identification Tool)

Developer: Microsoft Approach: Multi-turn orchestration with automated scoring. PyRIT automates the attacker side of the conversation and uses a judge model to score outcomes.

Key capabilities:

Orchestrators that manage multi-turn attack sequences
Automated objective scoring using judge models
Memory management for tracking attack progress
Support for multi-modal attacks (text, image)

Best for: Automated multi-turn red teaming at scale. When you need to run thousands of adversarial conversations without manual effort.

Comparison Matrix

| Feature | Chainbreaker | Garak | PyRIT | |---------|-------------|-------|-------| | Multi-turn attacks | Primary focus | Limited | Yes | | Naturalistic framing | Yes | No | Partial | | Breadth of probes | Focused | Extensive | Moderate | | Automated scoring | ASB Benchmark | Detector-based | Judge model | | Custom payloads | Yes | Plugin system | Orchestrator | | MCP-specific testing | Yes | No | No | | Compliance mapping | OWASP, EU AI Act | OWASP | OWASP |

Using Them Together

These tools are complementary. A thorough red team exercise might:

Run Garak for broad vulnerability scanning (find the obvious holes)
Run Chainbreaker for deep multi-turn evaluation (find the behavioral vulnerabilities)
Use PyRIT for scaled automated testing of specific scenarios identified in steps 1 and 2

No single tool covers the entire attack surface. Layer them for coverage.