AI Agent Forensics and Incident Response
When an AI agent incident occurs, you need answers: What did the agent do? What led to that behavior? Was it an attack, a bug, or a policy gap? How do you prevent it from happening again? Forensic analysis of agent behavior requires specific data and methods.
What You Need Before the Incident
Forensics requires data that was collected before the incident. You cannot retroactively generate audit trails. The minimum data requirements:
- Hash-chained receipts for every tool call and policy decision
- Full conversation history including system prompts, user messages, and agent responses
- Tool call parameters and responses complete, not summarized
- Policy evaluation details which rules were evaluated, what the inputs were, what the decision was
- Behavioral monitoring data statistical metrics at the time of the incident
- Identity context which user, which agent instance, which session
If you do not have all of these, your forensic analysis will have gaps.
Incident Response Steps
Step 1: Contain. Activate the kill switch for the affected agent or session. Prevent further damage while you investigate.
Step 2: Preserve evidence. Snapshot the receipt chain, conversation history, and monitoring data. Prevent any process from modifying or deleting this data.
Step 3: Reconstruct the timeline. Walk the receipt chain from session start to the incident. For each receipt:
- What action did the agent take?
- What was the policy decision?
- What input triggered this action?
- Was there anything anomalous?
Step 4: Identify the root cause. Common root causes:
- Prompt injection: Adversarial input caused the agent to execute unauthorized actions. Look for injection patterns in user input or retrieved content.
- Policy gap: The action was allowed by policy but should not have been. The policy was too permissive.
- Model error: The model misinterpreted instructions or hallucinated a tool call. No adversarial input; the model simply made a mistake.
- Configuration error: Wrong system prompt, wrong tool access, wrong policy file deployed.
Step 5: Verify chain integrity. Walk the receipt chain and verify every hash. If the chain is broken, someone modified the records.
Step 6: Determine scope. Did the incident affect only this session, or could other sessions or users be impacted? Check for lateral effects.
Step 7: Remediate. Fix the root cause: update the policy, patch the scanner, restrict tool access, or retrain the model.
Step 8: Document. Write an incident report covering the timeline, root cause, impact, and remediation. This becomes input for your risk management process and future red team exercises.
Behavioral Forensics
Beyond the receipt chain, behavioral monitoring data provides statistical context. Was the agent's behavior anomalous before the incident? Did monitoring detect the anomaly? If so, was the alert investigated?
Compare the incident session's behavioral metrics to the baseline. Look for statistical deviations that preceded the incident. These deviations might indicate early stages of an attack that monitoring should have caught.
Lessons for Prevention
Every incident should produce at least one improvement:
- A new payload added to the test corpus
- A tightened policy rule
- An improved monitoring threshold
- A new scanner pattern
If an incident does not lead to a concrete improvement, the forensic analysis was incomplete.