Open Source AI Safety Tools: The Ecosystem Overview
The open-source AI safety ecosystem has grown from a handful of research tools to a set of production-grade systems. Here is what is available and what each tool does.
Policy and Authorization
Authensor is the safety stack for AI agents. It provides a synchronous policy engine (zero dependencies, fail-closed by default), an MCP server with policy enforcement, framework adapters (LangChain, OpenAI, CrewAI), and hash-chained audit receipts. MIT license. TypeScript and Python SDKs.
What it does: Evaluates every agent tool call against YAML-defined policies. Denies unauthorized actions. Generates tamper-evident audit trails. Supports approval workflows for high-risk actions.
Content Safety Scanning
Aegis is a zero-dependency content safety scanner. It detects prompt injection through pattern matching and statistical analysis. Scans user input, tool descriptions, and tool responses.
What it does: Fast (sub-5ms) detection of injection payloads, encoding attacks, and anomalous content. No ML dependencies, no GPU requirements, deterministic results.
Behavioral Monitoring
Sentinel is a zero-dependency behavioral monitoring engine. It implements EWMA and CUSUM algorithms for detecting statistical anomalies in agent behavior.
What it does: Tracks per-agent behavioral baselines and alerts when behavior deviates. Detects gradual drift, sudden changes, and cumulative anomalies.
Red Teaming
Chainbreaker is a multi-turn adversarial evaluation tool. It uses naturalistic conversation sequences to test model resilience against gradual escalation.
Garak (NVIDIA) is a probe-based vulnerability scanner. It runs a broad set of known attack patterns against models and reports results.
PyRIT (Microsoft) is a multi-turn red teaming orchestrator with automated scoring.
Payloads and Wordlists
AI SecLists is a curated collection of adversarial payloads for AI security testing. Organized by technique: injection, encoding, multi-language, tool-specific.
Attack Surface Mapping
Attack Surface Mapper enumerates MCP server configurations, tool inventories, and security gaps. Outputs SARIF for GitHub Security tab integration.
Benchmarking
ASB Benchmark (Agent Safety Benchmark) evaluates model safety using naturalistic multi-turn sequences. Measures compliance erosion, trajectory blindness, and presentation-decision coupling.
How They Fit Together
A complete safety workflow:
- AI SecLists provides payloads for testing
- Chainbreaker / Garak / PyRIT run those payloads against your system
- ASB Benchmark measures your model's baseline safety
- Attack Surface Mapper identifies gaps in your MCP configuration
- Aegis scans content in production
- Authensor enforces policies in production
- Sentinel monitors behavior in production
- Audit receipts provide forensic data for incidents
Each tool addresses a different phase of the safety lifecycle: assess, test, enforce, monitor, investigate. Using them together provides coverage across the entire lifecycle.