feat(loop): integrate evolution, memory, and mid-loop critique by electronicBlacksmith · Pull Request #9 · electronicBlacksmith/phantom

electronicBlacksmith · 2026-04-06T01:05:54Z

Summary

Phase 1: Memory context injection - cached once at loop start, injected into every tick prompt. Cleared on finalize, rebuilt on resume.
Phase 2: Post-loop evolution and memory consolidation - bounded transcript accumulation, SessionData synthesis, fire-and-forget pipeline with cost-cap guards matching the interactive session path.
Phase 3: Mid-loop critique checkpoints - optional checkpoint_interval triggers Sonnet 4.6 review every N ticks. Guarded by judge availability and cost cap. Awaited before next tick to prevent race conditions.

New files: src/loop/critique.ts, src/loop/post-loop.ts, and 3 test files.

Test plan

981 tests pass (4 new Phase 3 guard tests, prompt injection tests, post-loop synthesis tests)
Biome lint clean
TypeScript strict mode typecheck clean
Dual code review (Haiku 4.5 + Sonnet 4.6) - all critical issues addressed
Manual verification: start a loop from Slack with Qdrant + Ollama up, verify tick prompts contain recalled memories
Manual verification: run loop to completion, verify observations appear in evolution tables

Closes #8

Closes #5. The feedback pipeline in LoopRunner already existed but was gated on loop.channelId, which was always null because the agent never plumbed channel_id/conversation_id into the in-process MCP tool call, that context only lived in the router. - AsyncLocalStorage<SlackContext> captures the Slack channel/thread/ trigger-message for the current turn so phantom_loop can auto-fill them when the agent omits them. Explicit tool args still win. - Reaction ladder on the operator's original message: hourglass -> cycle -> terminal (check/stop/warning/x). Restart-safe via iteration === 1 check, no in-memory flag. - Inline unicode progress bar in the edited status message. - New trigger_message_ts column on loops, appended as migration #11. - Extracted LoopNotifier into src/loop/notifications.ts, runner.ts was already at the 300-line cap. 34 new tests, 938 pass / 0 fail.

…tion Two defects surfaced during the first Slack end-to-end test of the loop feedback fix: 1. Stop button disappeared after the first tick. Slack's chat.update replaces the message wholesale and strips any blocks the caller does not include. postStartNotice attached the button but postTickUpdate called updateMessage without blocks, so the button was wiped on the first progress edit. Extract buildStatusBlocks() and re-send it on every tick edit. Final notice still omits blocks intentionally so the button disappears when the loop is no longer interruptible. 2. No end-of-loop summary. The agent curates the state.md body every tick (Goal, Progress, Next Action, Notes), but that content never reached the operator. Post it as a threaded reply when the loop finalizes. No extra agent cost: we surface content the agent already wrote. Frontmatter stripped, truncated at 3500 chars, silently skipped if the file is missing or empty. +7 tests covering both regressions. 945 pass / 0 fail.

…l message 1. Tick update race: postTickUpdate was fire-and-forget, so a stop on tick N+1 could race with tick N's Slack write. If the tick update's HTTP response arrived after postFinalNotice, it overwrote the final message and re-sent the Stop button blocks. Awaiting postTickUpdate serializes Slack writes so finalize always runs after the last tick update completes. 2. Final message now includes the progress bar at its halted position, visually consistent with tick updates. A stopped loop at 3/10 shows the bar frozen at 3/10 with "stopped" instead of a terse one-liner.

…oop ticks Loop ticks now use Phantom's full intelligence stack instead of running blind: Phase 1 - Memory context injection: cached once at loop start from the goal, injected into every tick prompt via TickPromptOptions. Cleared on finalize, rebuilt on resume. Phase 2 - Post-loop evolution and consolidation: bounded transcript accumulation (first tick + rolling 10 summaries + last tick), SessionData synthesis in finalize(), fire-and-forget evolution pipeline and LLM/heuristic memory consolidation with cost-cap guards matching the interactive path. Phase 3 - Mid-loop critique checkpoints: optional checkpoint_interval param lets the agent request Sonnet 4.6 review every N ticks. Guard requires evolution enabled, LLM judges active, and cost cap not exceeded. Critique is awaited before next tick to avoid race conditions. Closes #8

- Decouple postLoopDeps so evolution and memory run independently (evolution works when memory is down and vice versa) - Skip mid-loop critique on terminal ticks to avoid wasted Sonnet calls - Track judge cost on failure paths via JudgeParseError carrying usage data - Extract recordTranscript/clamp from runner.ts to post-loop.ts (292 < 300 lines)

PR #7 was squash-merged into main while PR #9's branch still had the original commits. Conflicts were all additive - kept PR #9's features (checkpoint_interval, memory context, critique, post-loop pipeline) while adopting main's improved error formatting and race condition comment in the tick update await.

PR #7 was squash-merged into main while this branch still had the original commits. Kept all PR #9 features (checkpoint_interval, memory context, critique, post-loop pipeline) while adopting main's improved error formatting and race condition comment.

Wire setTriggerDeps before startServer so the handler is ready on the first request. Use server.url.origin instead of manually building the URL from server.port which can race in CI. Add a health check fetch to confirm the server is accepting connections before tests run.

- Decouple postLoopDeps so evolution and memory run independently (evolution works when memory is down and vice versa) - Skip mid-loop critique on terminal ticks to avoid wasted Sonnet calls - Track judge cost on failure paths via JudgeParseError carrying usage data - Extract recordTranscript/clamp from runner.ts to post-loop.ts (292 < 300 lines)

electronicBlacksmith · 2026-04-06T23:48:57Z

Superseded by #14 (consolidated clean branch)

* feat(loop): integrate evolution, memory, and mid-loop critique into loop ticks Loop ticks now use Phantom's full intelligence stack instead of running blind: Phase 1 - Memory context injection: cached once at loop start from the goal, injected into every tick prompt via TickPromptOptions. Cleared on finalize, rebuilt on resume. Phase 2 - Post-loop evolution and consolidation: bounded transcript accumulation (first tick + rolling 10 summaries + last tick), SessionData synthesis in finalize(), fire-and-forget evolution pipeline and LLM/heuristic memory consolidation with cost-cap guards matching the interactive path. Phase 3 - Mid-loop critique checkpoints: optional checkpoint_interval param lets the agent request Sonnet 4.6 review every N ticks. Guard requires evolution enabled, LLM judges active, and cost cap not exceeded. Critique is awaited before next tick to avoid race conditions. Closes #8 * fix(loop): address code review findings from PR #9 - Decouple postLoopDeps so evolution and memory run independently (evolution works when memory is down and vice versa) - Skip mid-loop critique on terminal ticks to avoid wasted Sonnet calls - Track judge cost on failure paths via JudgeParseError carrying usage data - Extract recordTranscript/clamp from runner.ts to post-loop.ts (292 < 300 lines) * fix(evolution): support OAuth tokens for LLM judge auth resolveJudgeMode() and judge client now check ANTHROPIC_AUTH_TOKEN and CLAUDE_CODE_OAUTH_TOKEN in addition to ANTHROPIC_API_KEY. Enables LLM judges on Max subscription deployments using OAuth bearer tokens. * docs: add phantom_loop documentation for upstream PR Covers MCP tool parameters, state file contract, tick lifecycle, Slack integration, mid-loop critique, post-loop evolution pipeline, memory context injection, and tips for writing effective goals. Closes #12 * fix(test): stabilize trigger-auth and judge-activation tests for CI trigger-auth: use inline Bun.serve instead of startServer to avoid module-level globals and disk I/O that can race across test files. judge-activation: save/restore ANTHROPIC_AUTH_TOKEN and CLAUDE_CODE_OAUTH_TOKEN alongside ANTHROPIC_API_KEY so tests that expect "no credentials" actually clear all auth env vars. --------- Co-authored-by: electronicBlacksmith <electronicBlacksmith@users.noreply.github.com>

electronicBlacksmith added 5 commits April 5, 2026 22:07

electronicBlacksmith mentioned this pull request Apr 6, 2026

Loop ticks missing evolved config and feedback buttons #11

Open

electronicBlacksmith self-assigned this Apr 6, 2026

electronicBlacksmith mentioned this pull request Apr 6, 2026

docs: add phantom_loop documentation #13

Closed

9 tasks

electronicBlacksmith added 2 commits April 6, 2026 23:11

electronicBlacksmith mentioned this pull request Apr 6, 2026

feat(loop): evolution, critique, OAuth auth, and documentation #14

Merged

5 tasks

electronicBlacksmith closed this Apr 6, 2026

electronicBlacksmith deleted the worktree-fix+loop-slack-feedback branch April 6, 2026 23:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(loop): integrate evolution, memory, and mid-loop critique#9

feat(loop): integrate evolution, memory, and mid-loop critique#9
electronicBlacksmith wants to merge 7 commits into
mainfrom
worktree-fix+loop-slack-feedback

electronicBlacksmith commented Apr 6, 2026 •

edited

Loading

Uh oh!

electronicBlacksmith commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

electronicBlacksmith commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

electronicBlacksmith commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

electronicBlacksmith commented Apr 6, 2026 •

edited

Loading