Skip to content

Loop ticks should use evolution, judges, and memory - not bypass them #8

@electronicBlacksmith

Description

@electronicBlacksmith

Problem

phantom_loop ticks bypass the entire intelligence stack - no evolved config, no memory recall, no post-session evolution, no cross-model judges. Each tick is a naive LLM call with no benefit from anything Phantom has learned.

The loop is a context-window management strategy, not a reason to skip the intelligence layer. Interactive sessions get the full stack; loops - the primary mechanism for long-running autonomous work - get none of it.

Proposed fix (3 phases)

Phase 1: Inject evolved config + memory into tick prompts

Each tick gets persona, domain knowledge, error recovery strategies, and recalled memories. Zero extra LLM calls (one local embedding query per tick for memory recall).

  • Extend RunnerDeps with optional memoryContextBuilder, evolvedConfig, roleTemplate
  • Wire them from src/index.ts (same instances already in the router scope)
  • Extend buildTickPrompt() to inject evolved config sections before the goal and memory context before the state file
  • Call contextBuilder.build(loop.goal) per tick for memory recall

Modified: src/loop/runner.ts, src/loop/prompt.ts, src/index.ts

Phase 2: Post-loop evolution and memory consolidation

After a loop finishes, synthesize a SessionSummary from accumulated tick transcripts and feed it through afterSession() and consolidateSessionWithLLM(). Phantom learns from autonomous work, not just interactive conversations.

  • Accumulate tick prompt/response pairs in-memory during the run
  • On finalize, build a SessionSummary (loop status maps to outcome: done->success, stopped->abandoned, budget_exceeded/failed->failure)
  • Call evolution.afterSession(summary) - the pipeline is completely channel-agnostic, no changes needed
  • Call consolidateSessionWithLLM() to store the run as a vector-backed episode

Modified: src/loop/runner.ts, src/index.ts

Phase 3: Mid-loop critique checkpoints

For long loops (10+ ticks), run a Sonnet 4.6 judge every N ticks to detect drift before the budget burns out. This is the only phase that adds a new LLM call.

  • New src/loop/critique.ts module: reads state file + tick history, asks quality judge if the loop is making progress or stuck
  • Configurable checkpoint_interval (default 5 ticks, disabled for short loops)
  • Critique injected into next tick prompt as "Reviewer feedback" section

Modified: src/loop/runner.ts, src/loop/prompt.ts
New: src/loop/critique.ts

Key notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions