15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

Open Source AI Safety Tools: The Ecosystem Overview

November 15, 202515 Research Lab

toolsopen-sourceagent-safety

The open-source AI safety ecosystem has grown from a handful of research tools to a set of production-grade systems. Here is what is available and what each tool does.

Policy and Authorization

Authensor is the safety stack for AI agents. It provides a synchronous policy engine (zero dependencies, fail-closed by default), an MCP server with policy enforcement, framework adapters (LangChain, OpenAI, CrewAI), and hash-chained audit receipts. MIT license. TypeScript and Python SDKs.

What it does: Evaluates every agent tool call against YAML-defined policies. Denies unauthorized actions. Generates tamper-evident audit trails. Supports approval workflows for high-risk actions.

Content Safety Scanning

Aegis is a zero-dependency content safety scanner. It detects prompt injection through pattern matching and statistical analysis. Scans user input, tool descriptions, and tool responses.

What it does: Fast (sub-5ms) detection of injection payloads, encoding attacks, and anomalous content. No ML dependencies, no GPU requirements, deterministic results.

Behavioral Monitoring

Sentinel is a zero-dependency behavioral monitoring engine. It implements EWMA and CUSUM algorithms for detecting statistical anomalies in agent behavior.

What it does: Tracks per-agent behavioral baselines and alerts when behavior deviates. Detects gradual drift, sudden changes, and cumulative anomalies.

Red Teaming

Chainbreaker is a multi-turn adversarial evaluation tool. It uses naturalistic conversation sequences to test model resilience against gradual escalation.

Garak (NVIDIA) is a probe-based vulnerability scanner. It runs a broad set of known attack patterns against models and reports results.

PyRIT (Microsoft) is a multi-turn red teaming orchestrator with automated scoring.

Payloads and Wordlists

AI SecLists is a curated collection of adversarial payloads for AI security testing. Organized by technique: injection, encoding, multi-language, tool-specific.

Attack Surface Mapping

Attack Surface Mapper enumerates MCP server configurations, tool inventories, and security gaps. Outputs SARIF for GitHub Security tab integration.

Benchmarking

ASB Benchmark (Agent Safety Benchmark) evaluates model safety using naturalistic multi-turn sequences. Measures compliance erosion, trajectory blindness, and presentation-decision coupling.

How They Fit Together

A complete safety workflow:

AI SecLists provides payloads for testing
Chainbreaker / Garak / PyRIT run those payloads against your system
ASB Benchmark measures your model's baseline safety
Attack Surface Mapper identifies gaps in your MCP configuration
Aegis scans content in production
Authensor enforces policies in production
Sentinel monitors behavior in production
Audit receipts provide forensic data for incidents

Each tool addresses a different phase of the safety lifecycle: assess, test, enforce, monitor, investigate. Using them together provides coverage across the entire lifecycle.