Skip to content

feat(loop): integrate evolution, memory, and mid-loop critique#9

Open
electronicBlacksmith wants to merge 5 commits intomainfrom
worktree-fix+loop-slack-feedback
Open

feat(loop): integrate evolution, memory, and mid-loop critique#9
electronicBlacksmith wants to merge 5 commits intomainfrom
worktree-fix+loop-slack-feedback

Conversation

@electronicBlacksmith
Copy link
Copy Markdown
Owner

Summary

  • Phase 1: Memory context injection - cached once at loop start, injected into every tick prompt. Cleared on finalize, rebuilt on resume.
  • Phase 2: Post-loop evolution and memory consolidation - bounded transcript accumulation, SessionData synthesis, fire-and-forget pipeline with cost-cap guards matching the interactive session path.
  • Phase 3: Mid-loop critique checkpoints - optional checkpoint_interval triggers Sonnet 4.6 review every N ticks. Guarded by judge availability and cost cap. Awaited before next tick to prevent race conditions.

New files: src/loop/critique.ts, src/loop/post-loop.ts, and 3 test files.

Test plan

  • 981 tests pass (4 new Phase 3 guard tests, prompt injection tests, post-loop synthesis tests)
  • Biome lint clean
  • TypeScript strict mode typecheck clean
  • Dual code review (Haiku 4.5 + Sonnet 4.6) - all critical issues addressed
  • Manual verification: start a loop from Slack with Qdrant + Ollama up, verify tick prompts contain recalled memories
  • Manual verification: run loop to completion, verify observations appear in evolution tables

Closes #8

Closes #5. The feedback pipeline in LoopRunner already existed but was
gated on loop.channelId, which was always null because the agent never
plumbed channel_id/conversation_id into the in-process MCP tool call,
that context only lived in the router.

- AsyncLocalStorage<SlackContext> captures the Slack channel/thread/
  trigger-message for the current turn so phantom_loop can auto-fill
  them when the agent omits them. Explicit tool args still win.
- Reaction ladder on the operator's original message: hourglass ->
  cycle -> terminal (check/stop/warning/x). Restart-safe via
  iteration === 1 check, no in-memory flag.
- Inline unicode progress bar in the edited status message.
- New trigger_message_ts column on loops, appended as migration ghostwright#11.
- Extracted LoopNotifier into src/loop/notifications.ts, runner.ts
  was already at the 300-line cap.

34 new tests, 938 pass / 0 fail.
…tion

Two defects surfaced during the first Slack end-to-end test of the loop
feedback fix:

1. Stop button disappeared after the first tick. Slack's chat.update
   replaces the message wholesale and strips any blocks the caller does
   not include. postStartNotice attached the button but postTickUpdate
   called updateMessage without blocks, so the button was wiped on the
   first progress edit. Extract buildStatusBlocks() and re-send it on
   every tick edit. Final notice still omits blocks intentionally so the
   button disappears when the loop is no longer interruptible.

2. No end-of-loop summary. The agent curates the state.md body every
   tick (Goal, Progress, Next Action, Notes), but that content never
   reached the operator. Post it as a threaded reply when the loop
   finalizes. No extra agent cost: we surface content the agent already
   wrote. Frontmatter stripped, truncated at 3500 chars, silently
   skipped if the file is missing or empty.

+7 tests covering both regressions. 945 pass / 0 fail.
…l message

1. Tick update race: postTickUpdate was fire-and-forget, so a stop on
   tick N+1 could race with tick N's Slack write. If the tick update's
   HTTP response arrived after postFinalNotice, it overwrote the final
   message and re-sent the Stop button blocks. Awaiting postTickUpdate
   serializes Slack writes so finalize always runs after the last tick
   update completes.

2. Final message now includes the progress bar at its halted position,
   visually consistent with tick updates. A stopped loop at 3/10 shows
   the bar frozen at 3/10 with "stopped" instead of a terse one-liner.
…oop ticks

Loop ticks now use Phantom's full intelligence stack instead of running blind:

Phase 1 - Memory context injection: cached once at loop start from the goal,
injected into every tick prompt via TickPromptOptions. Cleared on finalize,
rebuilt on resume.

Phase 2 - Post-loop evolution and consolidation: bounded transcript
accumulation (first tick + rolling 10 summaries + last tick), SessionData
synthesis in finalize(), fire-and-forget evolution pipeline and LLM/heuristic
memory consolidation with cost-cap guards matching the interactive path.

Phase 3 - Mid-loop critique checkpoints: optional checkpoint_interval param
lets the agent request Sonnet 4.6 review every N ticks. Guard requires
evolution enabled, LLM judges active, and cost cap not exceeded. Critique
is awaited before next tick to avoid race conditions.

Closes #8
- Decouple postLoopDeps so evolution and memory run independently
  (evolution works when memory is down and vice versa)
- Skip mid-loop critique on terminal ticks to avoid wasted Sonnet calls
- Track judge cost on failure paths via JudgeParseError carrying usage data
- Extract recordTranscript/clamp from runner.ts to post-loop.ts (292 < 300 lines)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Loop ticks should use evolution, judges, and memory - not bypass them

1 participant