Fail-Closed vs Fail-Open AI Safety
When a safety system encounters an error, an ambiguous input, or a state it was not designed for, it must choose: allow the action (fail open) or block it (fail closed). This design decision has outsized impact on real-world security.
Fail-Open: What Happens
A fail-open system allows actions when it cannot make a definitive decision:
- Policy engine crashes: all tool calls execute without authorization
- Scanner timeout: input passes through unscanned
- Database unreachable: permissions cannot be checked, so access is granted
- Ambiguous input: the benefit of the doubt goes to the user
Why some systems fail open: Availability. A fail-open system never blocks legitimate use. Users never experience "the system is being too cautious." This is appropriate for some domains (content recommendation, search suggestions) where a false block is worse than a false pass.
Why it is dangerous for agents: An attacker who can trigger a scanner crash, a policy engine timeout, or an ambiguous state can bypass all safety controls. Denial-of-service attacks against the safety infrastructure become privilege escalation attacks against the agent.
Fail-Closed: What Happens
A fail-closed system denies actions when it cannot make a definitive decision:
- Policy engine crashes: all tool calls are denied until the engine recovers
- Scanner timeout: input is blocked or held pending scan completion
- Database unreachable: access is denied until permissions can be verified
- Ambiguous input: the action is denied or escalated for review
Why it is the right default for agents: The cost of a false deny is inconvenience (the agent cannot do something temporarily). The cost of a false allow is a security incident (the agent does something it should not). These are not symmetric risks.
Implementing Fail-Closed
Policy engine: Default action is deny. No policy match means deny. Engine crash means deny. This must be enforced at the code level, not the configuration level. The denial path must be the simplest, most reliable code path.
default_action: deny
Content scanner: If the scanner cannot process input within its timeout, the input is blocked. Do not pass unscanned input to the model.
Authorization checks: If the identity provider is unreachable, deny all authenticated actions. Do not fall back to anonymous access.
Monitoring: If the monitoring system is down, increase the policy engine's restrictiveness. Missing monitoring data is a risk signal, not an excuse for less caution.
The Availability Concern
The main objection to fail-closed is availability. "If the safety system is down, the entire agent is down." Yes. That is the point.
An agent operating without safety controls is more dangerous than an agent that is temporarily unavailable. Design for high availability of your safety infrastructure (redundancy, failover, health checks), but when it is down, stop the agent rather than letting it run uncontrolled.
Practical Tip
Test your failure modes. Kill the policy engine process. Disconnect the database. Crash the scanner. Does the agent continue operating (fail open) or stop (fail closed)? If you have not tested this, you do not know which behavior your system exhibits.
Authensor's policy engine fails closed by design. No policy match returns deny. Engine errors return deny. The deny path is the shortest code path in the evaluation logic.