← Blog

AI Agent Incident Response Playbook

15 Research Lab
agent-safetycompliancemethodology

When an AI agent does something wrong in production, you need a plan. Not a meeting to discuss what to do. A plan that your team can execute immediately.

Phase 1: Detection

Incidents are detected through:

  • Monitoring alerts: Behavioral anomaly detection (Sentinel) flags unusual patterns
  • Policy engine logs: Spike in denied actions, unusual tool call patterns
  • User reports: A user notices the agent behaving incorrectly
  • Audit trail review: Periodic review discovers past anomalous behavior
  • External notification: A security researcher or partner reports a vulnerability

Classify the incident by severity:

  • P1 (Critical): Active data exfiltration, unauthorized financial transactions, ongoing compromise
  • P2 (High): Unauthorized tool access detected, policy bypass discovered, potential data exposure
  • P3 (Medium): Anomalous behavior without confirmed impact, failed attack attempts
  • P4 (Low): Minor policy violations, single erroneous tool calls, no sensitive data involved

Phase 2: Containment

For P1/P2 incidents:

  1. Activate the kill switch for the affected agent/session
  2. Revoke the agent's authentication tokens
  3. If data exfiltration is suspected, block the destination endpoints at the network level
  4. Preserve all logs and receipts (ensure no automated cleanup runs)
  5. Notify the incident response team

For P3/P4 incidents:

  1. Increase monitoring sensitivity for the affected agent
  2. Lower policy thresholds (more actions require approval)
  3. Flag the session for review
  4. Continue monitoring

Phase 3: Investigation

Walk the receipt chain from session start to incident:

  • What was the conversation flow?
  • When did behavior change?
  • Was there adversarial input? (check for injection patterns)
  • Which tool calls were made? Were any denied before the incident?
  • What data did the agent have access to?
  • Did monitoring fire alerts before the incident?

Determine root cause:

  • Prompt injection: Identify the injection vector and payload
  • Policy gap: Identify the rule that should have blocked the action
  • Model error: Determine if the model hallucinated or misinterpreted instructions
  • Configuration error: Check for misdeployed policies or system prompts

Phase 4: Remediation

Based on root cause:

  • Prompt injection: Add the payload to your scanner patterns. Test for variants. Tighten policy rules around the exploited tool.
  • Policy gap: Write the missing rule. Test it against the incident scenario. Deploy.
  • Model error: Review the system prompt for ambiguity. Add explicit constraints. Consider model change if errors are frequent.
  • Configuration error: Fix the configuration. Add validation to your deployment pipeline.

Phase 5: Recovery

  • Verify the fix by replaying the incident scenario against the updated system
  • Restore the agent with updated policies and scanning rules
  • Monitor closely for the first 24 hours after restoration
  • Verify that the receipt chain integrity is intact

Phase 6: Post-Incident Review

Within 48 hours, conduct a review covering:

  • What happened and when
  • How was it detected and how long did detection take
  • What was the impact
  • What was the root cause
  • What remediation was applied
  • What systemic improvements are needed (process, tooling, training)

Document the review. Add the incident to your red team test corpus. Update your risk register.

Keep this playbook accessible to your operations team. Review and update it quarterly.