15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

AI Safety Scanner Comparison: Feature Matrix

March 14, 202615 Research Lab

toolsdefenseguardrails

Multiple tools now offer content safety scanning for AI systems. Choosing between them requires understanding what each actually detects, how it deploys, and what it costs.

Feature Matrix

| Feature | Aegis | Lakera Guard | ProtectAI Rebuff | Custom (DeBERTa) | |---------|-------|-------------|-------------------|-------------------| | Direct injection detection | Yes | Yes | Yes | Yes | | Indirect injection detection | Yes | Yes | Limited | Depends on training | | Encoding attack detection | Yes (pre-decode) | Partial | No | Depends on training | | Tool description scanning | Yes | No | No | No | | Response injection scanning | Yes | No | No | No | | MCP-specific scanning | Yes | No | No | No | | Deployment | In-process / API | Hosted API | Self-hosted | Self-hosted | | Dependencies | Zero | N/A (SaaS) | Python packages | ML framework | | Latency (p95) | <5ms | 50-200ms | 20-100ms | 10-50ms | | Pricing | Free (MIT) | Per-request | Free (Apache 2) | Free | | SARIF output | Yes | No | No | No |

Detection Approach Breakdown

Aegis: Multi-layer pipeline. Pattern matching (sub-millisecond) followed by statistical analysis (entropy, encoding detection, structural anomalies). Zero ML models in the core scanner, which means no GPU requirement and deterministic results.

Lakera Guard: ML-based classification via hosted API. Send text, get a risk score for injection, toxicity, and PII. Black box classification with continuously updated models.

ProtectAI Rebuff: Self-hosted with a combination of ML classifier and heuristic checks. Includes a prompt injection detection model and canary word injection.

Custom DeBERTa: Fine-tune a DeBERTa model on injection/benign pairs. Full control over training data and thresholds. Requires ML infrastructure for training and serving.

Choosing Based on Requirements

If you need MCP safety: Aegis is currently the only scanner that specifically handles tool description and response scanning. Others focus on user input only.

If you need minimum latency: Aegis runs in-process with sub-5ms latency. Network-based solutions add 50-200ms per call.

If you need minimum effort: Lakera Guard is a single API call. No infrastructure, no configuration, no maintenance. The tradeoff is cost, latency, and data sharing.

If you need maximum control: Train a custom model. You control the training data, the model architecture, and the decision threshold. The tradeoff is significant engineering investment.

If you need compliance auditability: Aegis produces deterministic results with transparent logic. You can explain to an auditor exactly why a specific input was flagged. Black box ML classifiers are harder to audit.

Layering Scanners

Scanners are not mutually exclusive. A practical production setup:

Aegis for fast pattern matching and MCP-specific scanning (in-process, sub-5ms)
ML classifier for novel injection detection (secondary check on flagged or borderline inputs)
Policy engine for action authorization (independent of scanning)

The fast layer handles the majority of traffic. The ML layer adds depth for uncertain cases. The policy engine provides the backstop regardless of scanning outcomes.