Documentation

Policy Reference

Warden evaluates every tool call against compiled policy categories. Each category has different enforcement behavior.

Safety

Hard deny. Commands matching safety patterns are blocked immediately. Covers: rm -rf, sudo, reverse shells, credential theft, disk formatting.

Destructive

Hard deny. Commands that make irreversible changes: force push, database drops, system prune.

Hallucination

Hard deny. Commands using non-existent flags, tools, or APIs that the AI hallucinated.

Substitution

Transform + teach. Commands using slower tools are rewritten to faster alternatives (grep → rg, find → fd, cat → bat, curl → xh).

Advisory

Soft guidance. Non-blocking messages injected when the agent could benefit from a hint (e.g., “4 files edited since last build — consider running tests”).

Prompt Injection

Detection. Tool output is scanned for prompt injection attempts (instruction hijack, role manipulation, exfiltration).

Rule Precedence

Safety denials (highest priority)
Destructive denials
Hallucination denials
Substitution transforms
Advisory suggestions (lowest priority)

Custom rules in rules.toml can override or disable any compiled rule by ID.