Wardenby Bitmill
Documentation

Anchor Engine

Anchor is the stateful core of Warden. While Reflex makes binary safety decisions on individual tool calls, Anchor tracks the session as a whole — where it started, where it is now, whether it is drifting, and how much the assistant can be trusted.

Anchor contains five modules: Compass, Focus, Ledger, Debt, and Trust.


Compass: Phase Detection

Every coding session follows a natural arc. Compass models this arc as five phases:

PhaseDescriptionTypical behavior
OrientationUnderstanding the codebase and taskReading files, searching symbols, asking questions
ExploringInvestigating approaches and gathering contextRunning tests, reading documentation, trying small experiments
BuildingActive implementationWriting code, creating files, running builds
VerifyingTesting and validating the implementationRunning tests, reviewing diffs, checking output
WrappingFinalizing and cleaning upCommitting, formatting, writing docs, closing issues

Compass detects the current phase by analyzing 8 parameters over a rolling window:

  1. Read/write ratio — high reads suggest Orientation/Exploring; high writes suggest Building
  2. Test invocation rate — spikes during Verifying
  3. File diversity — many distinct files suggest Exploring; few files suggest focused Building
  4. Error rate — increasing errors during Building suggest a transition to Verifying is needed
  5. Command repetition — high repetition in Building is normal (edit-compile-test); high repetition in Exploring is a loop
  6. Turn count since last phase change — phases that persist too long may indicate drift
  7. Commit/save signals — indicate Wrapping
  8. User message frequency — high frequency suggests Orientation (back-and-forth); low frequency suggests autonomous Building

Hysteresis

Phase transitions use hysteresis to prevent oscillation. A phase must score above the entry threshold for 3 consecutive evaluations before Compass commits to the transition. Once in a phase, it must score below the exit threshold (lower than entry) for 3 evaluations before leaving.

This prevents the common case where a single test run during Building briefly scores as Verifying, causing a phase flip-flop that would confuse injection targeting.

Phase budgets

Each phase has a different injection budget and signal sensitivity:

PhaseMax injections per turnTrust sensitivity
Orientation2Low — exploring is expected
Exploring3Medium — drift detection active
Building1High — interruptions are costly
Verifying2Medium — errors are expected
Wrapping1Low — finishing up

Focus: Coherence Tracking

Focus maintains a score from 0 to 100 representing how coherent the session’s current activity is. A focused session works on a small set of related files toward a clear goal. An unfocused session jumps between unrelated files and directories.

Focus is computed as a weighted combination of:

  • File-set stability (40%) — how much the set of recently-touched files overlaps with the set from 5 turns ago
  • Directory concentration (30%) — what fraction of file operations target a single directory tree
  • Goal alignment (20%) — whether recent tool calls are consistent with the detected phase
  • Topic coherence (10%) — whether file names and command arguments share lexical similarity

A Focus score below 40 triggers a FocusDrift signal. Below 20 triggers a FocusCritical signal that increases the injection budget to deliver a re-centering reminder.

Focus naturally drops during phase transitions (Orientation to Exploring, or Building to Verifying) and this is expected. The signal is suppressed during the first 3 turns after a phase change.

Ledger: Turn Tracking

Ledger is the simplest Anchor module. It counts turns (tool invocations) and tracks gaps between verification events (test runs, build checks, lint passes).

Key metrics:

  • Total turns — lifetime count for the session
  • Turns since last verify — resets when a test/build/lint command is detected
  • Turns since last user message — measures autonomous run length
  • Phase duration — turns spent in the current phase

Ledger feeds into Debt and Trust calculations but does not emit signals directly. It is a bookkeeper, not a decision-maker.

Debt: Verification Tracking

Debt tracks how much unverified work has accumulated. Every file write increments debt; every successful test run decrements it. The formula:

debt = unverified_writes - (successful_tests * 2) - (successful_builds * 1)

Debt is clamped to [0, 100]. When debt exceeds 30, Anchor emits a VerificationNeeded signal. When it exceeds 60, the signal escalates to VerificationUrgent.

The multipliers reflect that a single test run typically validates multiple file changes, while a build check validates fewer (compilation success does not mean behavioral correctness).

Debt resets to 0 when the phase transitions to Wrapping, on the assumption that the developer has accepted the current state.

Trust: Session Confidence Score

Trust is the most consequential Anchor metric because it directly controls the injection budget — how many context injections Warden delivers per tool call.

The Formula

trust = 100
      - (errors * 5)
      - (debt * 3)
      - (phase_switches * 2)
      - (dead_ends * 4)
      - (denials * 3)
      + bonuses

Where:

  • errors — count of tool calls that produced stderr output in the last 20 turns
  • debt — current verification debt (0-100, scaled to 0-10 for this formula)
  • phase_switches — number of phase transitions in the last 30 turns (frequent switching suggests confusion)
  • dead_ends — count of sequences where the assistant tried an approach, hit an error, and reverted (detected by Loopbreaker)
  • denials — count of Reflex denials in the last 20 turns
  • bonuses — positive signals: successful test runs (+3 each), clean builds (+2), phase progression in natural order (+5)

Trust is clamped to [0, 100].

Trust Gates

Trust directly controls the injection budget through a tiered gate system:

Trust RangeMax InjectionsInterpretation
85-1001High confidence — minimal guidance needed
50-843Moderate confidence — occasional nudges
25-495Low confidence — active guidance
0-2415Very low confidence — heavy guardrails

The counterintuitive inversion — more injections at lower trust — reflects the design philosophy that struggling sessions need more help, not less. A high-trust session is humming along and extra injections would only waste context window space.

Gate transitions

Gate transitions use the same hysteresis as Compass phase transitions. Trust must remain in a new tier for 3 consecutive evaluations before the injection budget changes. This prevents a single error from flooding the context with injections.

Signal Categories

Anchor emits signals in 7 categories, each with a utility function that determines whether it is worth injecting:

CategorySignalUtility thresholdEffect
PhasePhaseShiftAlways emittedUpdates injection targeting
PhasePhaseStall> 30 turns in phaseSuggests phase transition
FocusFocusDriftFocus < 40Re-centering reminder
FocusFocusCriticalFocus < 20Strong re-centering + file list
DebtVerificationNeededDebt > 30Test/build reminder
DebtVerificationUrgentDebt > 60Escalated test reminder
TrustTrustDropTrust crosses gate boundaryBudget adjustment + explanation

Signals that fall below their utility threshold are logged but not injected. This prevents low-value noise from consuming the injection budget.


Interaction with Other Engines

Anchor does not make safety decisions (that is Reflex’s job) and does not learn across sessions (that is Dream’s job). Its role is strictly intra-session state management.

However, Anchor’s outputs feed the other engines:

  • Reflex reads the current trust score to adjust Loopbreaker thresholds (low-trust sessions have tighter loop detection)
  • Dream reads the full session state at session end to extract patterns worth remembering
  • Harbor reads signals and trust gates to determine the injection budget and format context blocks

This one-way data flow keeps the engine boundaries clean while allowing cross-engine coordination.