Wardenby Bitmill
Documentation

Output Compression

AI agents dump raw command output into the context window — build logs, test suites, stack traces, progress bars. This wastes 60-95% of the context budget on noise.

Warden’s truncation filter runs on every tool output before it reaches the model:

  • Preserves: errors, warnings, the final result, working-set file paths
  • Removes: progress bars, passing tests, redundant log lines, npm install output
  • Compresses: stack traces to relevant frames, build logs to error summary

A 40,000-token build log becomes a 2,000-token error summary. The agent gets exactly what it needs for the next decision.

The Problem in Detail

Consider what happens without compression. The agent runs cargo build on a Rust project with a type error. The raw output might be 800 lines: compiler metadata, dependency compilation progress, warnings in unrelated files, and then the one error the agent needs. All 800 lines go into the context window. Multiply this by 20-30 build attempts in a debugging session, and you’ve consumed half your context budget on build noise.

The same applies to:

  • npm install — 200+ lines of dependency resolution, tree printout, audit warnings. The agent needs to know if it succeeded and whether there were errors. That’s 2-3 lines.
  • Test suites — A 500-test suite produces 500 lines of passing test names. The agent only needs the 3 that failed.
  • git log — Fifty commits of history when the agent asked for the last change.
  • ls -la on node_modules — Thousands of entries when the agent wanted to check if a package exists.

Before and After Examples

Build log compression (cargo build):

Before (raw output, ~200 lines):

   Compiling proc-macro2 v1.0.78
   Compiling unicode-ident v1.0.12
   Compiling quote v1.0.35
   ... (180 more dependency lines)
   Compiling myapp v0.1.0
error[E0308]: mismatched types
  --> src/main.rs:42:5
   |
42 |     "hello"
   |     ^^^^^^^ expected `i32`, found `&str`

After (compressed, ~8 lines):

error[E0308]: mismatched types
  --> src/main.rs:42:5
   |
42 |     "hello"
   |     ^^^^^^^ expected `i32`, found `&str`

[warden: compressed 203 lines → 8 lines, kept errors + summary]

Test suite compression (jest):

Before (raw output, ~300 lines):

 PASS src/utils/format.test.ts (0.8s)
 PASS src/utils/parse.test.ts (0.4s)
 ... (95 more passing suites)
 FAIL src/handlers/auth.test.ts (1.2s)
  ● login() › should reject expired tokens
    Expected: 401
    Received: 200

After (compressed, ~10 lines):

 FAIL src/handlers/auth.test.ts (1.2s)
  ● login() › should reject expired tokens
    Expected: 401
    Received: 200

Test Suites: 1 failed, 97 passed, 98 total
[warden: compressed 312 lines → 10 lines, stripped 97 passing suites]

What’s Preserved vs Removed

PreservedRemoved
Error messages and stack tracesProgress bars and spinners
Warning linesPassing test names
Final summary/status linesDependency compilation output
File paths in the working setRedundant blank lines
Exit codes and failure indicatorsnpm audit informational output
Build error locations (file:line)Download progress percentages

Command Filters

Warden uses command-specific filters to apply the right compression strategy. When it sees cargo build, it uses the build filter. When it sees jest or vitest, it uses the test filter. When it sees npm install, it uses the install filter.

Each filter has a strategy:

StrategyBehavior
strip_matchingRemove lines that match a pattern (e.g., strip progress bars)
keep_matchingKeep only lines that match a pattern (e.g., keep only errors)
dedupRemove duplicate consecutive lines
head_tailKeep the first N and last N lines, drop the middle
passthroughNo compression (for commands where full output matters)

Built-in Filter Compression Ratios

The compiled filters achieve aggressive compression on common command output:

CommandStrategyTypical Compression
cargo testStrip passing tests, keep failures + summary~99%
cargo build / cargo check / cargo clippyStrip “Compiling” lines, keep errors + warnings~90%
npm install / pnpm installStrip progress bars, keep warnings + errors~90%
pytest / vitest / jestStrip passing test names, keep failures + summary~95%

These ratios come from real-world sessions. A 40,000-token cargo test output with 2 failures compresses to ~400 tokens — the failures, the summary line, and the Warden compression annotation.

Filters are TOML-extensible: you can define custom filters in ~/.warden/rules.toml or .warden/rules.toml using any of the five strategies above. Custom filters merge with the compiled defaults, so you only need to define filters for your project-specific tools.

Progressive Compression

Compression gets more aggressive as the session progresses. Early turns get generous output limits; late turns get tighter ones:

Session phaseMax output lines
Turn 1-1580 lines
Turn 16-3060 lines
Turn 31+40 lines

This is because late-session context is more expensive — you’re closer to the context limit, and every token matters more. The agent is also more likely to be running repetitive commands (rebuild, re-test) where full output adds no new information.

Custom Filters

You can define custom command filters in ~/.warden/rules.toml or .warden/rules.toml:

[[command_filters]]
match = "my-custom-build-tool"
strategy = "keep_matching"
keep = ["error", "warning", "FAIL"]
strip = ["^\\s*ok\\s"]

Custom filters are merged with the compiled defaults. They’re useful for project-specific build tools, test runners, or deployment scripts that produce verbose output your agent doesn’t need.