Policy Reference
Warden evaluates every tool call against compiled policy categories. Each category has different enforcement behavior.
Safety
Hard deny. Commands matching safety patterns are blocked immediately. Covers: rm -rf, sudo, reverse shells, credential theft, disk formatting.
Destructive
Hard deny. Commands that make irreversible changes: force push, database drops, system prune.
Hallucination
Hard deny. Commands using non-existent flags, tools, or APIs that the AI hallucinated.
Substitution
Transform + teach. Commands using slower tools are rewritten to faster alternatives (grep → rg, find → fd, cat → bat, curl → xh).
Advisory
Soft guidance. Non-blocking messages injected when the agent could benefit from a hint (e.g., “4 files edited since last build — consider running tests”).
Prompt Injection
Detection. Tool output is scanned for prompt injection attempts (instruction hijack, role manipulation, exfiltration).
Rule Precedence
- Safety denials (highest priority)
- Destructive denials
- Hallucination denials
- Substitution transforms
- Advisory suggestions (lowest priority)
Custom rules in rules.toml can override or disable any compiled rule by ID.