15 Research Lab -Adversarial Safety Evaluation of Frontier AI Systems

John Kearney

AI Safety for Startups: The Minimum Viable Safety Stack

January 4, 202615 Research Lab

agent-safetyguardrailstools

You are building a startup. You are deploying AI agents. You do not have a dedicated security team. You still need safety controls. Here is what to build first, in priority order.

Priority 1: Policy Engine (Day 1)

Before your agent makes its first tool call in production, add a policy engine that evaluates every call against rules you define.

Start with a simple deny-by-default configuration:

default_action: deny

policies:
  - name: "allowed-tools"
    tools: ["search", "read_file", "query_database"]
    action: allow
    constraints:
      query_database:
        operation: "SELECT"

This takes less than an hour to set up with Authensor's SDK. The result: your agent can only call the tools you explicitly allow, with parameters you explicitly permit. Everything else is blocked.

This single control prevents the majority of agent security incidents. Prompt injection cannot cause unauthorized tool calls. Model errors cannot trigger destructive operations. Over-provisioned tool access cannot be exploited.

Priority 2: Audit Logging (Week 1)

Log every tool call with enough detail to reconstruct what happened. At minimum:

Timestamp
Agent and user identity
Tool name and parameters
Policy evaluation result
Tool response

You do not need hash-chained receipts on day one (though Authensor provides them by default). You need logs that let you answer "what did the agent do?" when something goes wrong.

Store logs separately from your application database. Append-only storage prevents accidental or intentional modification.

Priority 3: Input Scanning (Week 2)

Add content safety scanning on user input. Aegis provides zero-dependency scanning that runs in-process:

Pattern matching for known injection payloads
Statistical analysis for encoding attacks
Sub-5ms latency

This catches the majority of direct injection attempts. It does not catch everything (no scanner does), but combined with the policy engine, most successful injections are rendered harmless because the unauthorized actions they trigger are blocked by policy.

Priority 4: Rate Limits (Week 3)

Set rate limits on tool calls per session and per user. Simple but effective:

rate_limits:
  - tool: "*"
    max_calls: 200
    window: "1h"

This prevents runaway agents from burning through API budgets or flooding your database with queries.

Priority 5: Monitoring (Month 2)

Add behavioral monitoring once you have baseline data. Track tool-call frequency, diversity, and error rates. Alert on deviations.

You do not need a dedicated monitoring team. Set up basic alerting (Slack notifications for anomalies) and review weekly.

What Can Wait

Approval workflows (add when you have high-risk actions)
Multi-agent security (add when you have multiple agents)
Compliance documentation (add when you are approaching EU AI Act deadlines or SOC 2 audits)
Custom ML classifiers for injection detection (add when pattern matching is not enough)

Cost

Authensor is MIT-licensed and free. Aegis and Sentinel are zero-dependency and free. The engineering cost is hours, not weeks. There is no reason to deploy an AI agent in production without at least a policy engine and basic logging.