diff --git a/.dev/roadmap/memory-consolidation/current-progress-and-next-steps.md b/.dev/roadmap/memory-consolidation/current-progress-and-next-steps.md index 490917b..3745b42 100644 --- a/.dev/roadmap/memory-consolidation/current-progress-and-next-steps.md +++ b/.dev/roadmap/memory-consolidation/current-progress-and-next-steps.md @@ -1,20 +1,20 @@ # Memory Consolidation Current Progress and Next Steps Status: AI-authored draft. Not yet human-approved. -Last updated: 2026-05-13 00:07 KST +Last updated: 2026-05-13 00:53 KST -## v0.1.141 + G5d completed checkpoint and next five-step runway +## v0.1.142 + G5e completed checkpoint and next five-step runway -This document is the restartable checkpoint after the v0.1.141 release/runtime rollout, fresh G4 diagnostics, merged G5a/G5b/G5c reviewed-candidate/scoring runway, and completed G5d read-only repeated activation -> reinforcement refinement preview. +This document is the restartable checkpoint after the v0.1.142 release/runtime rollout, fresh G4 diagnostics, merged G5a/G5b/G5c/G5d reviewed-candidate/scoring/reinforcement runway, and completed G5e read-only stale weak evidence -> decay/collapse candidate preview. Current verified release state: -- Release: `v0.1.141`. -- GitHub Release: `https://github.com/cafitac/agent-memory/releases/tag/v0.1.141`. -- npm: `@cafitac/agent-memory@0.1.141`. -- PyPI: `cafitac-agent-memory==0.1.141`. -- Runtime: `/Users/reddit/.agent-memory/runtime/v0.1.141/.venv/bin/agent-memory`. -- Runtime smoke report: `/Users/reddit/.agent-memory/runtime/v0.1.141/g5d-live-smoke.json`. +- Release: `v0.1.142`. +- GitHub Release: `https://github.com/cafitac/agent-memory/releases/tag/v0.1.142`. +- npm: `@cafitac/agent-memory@0.1.142`. +- PyPI: `cafitac-agent-memory==0.1.142`. +- Runtime: `/Users/reddit/.agent-memory/runtime/v0.1.142/.venv/bin/agent-memory`. +- Runtime smoke report: `/Users/reddit/.agent-memory/runtime/v0.1.142/g5e-live-smoke.json`. - Fresh G4 report directory retained: `/Users/reddit/.agent-memory/reports/g4-v0138-20260512-132253/`. Fresh diagnostics: @@ -23,25 +23,25 @@ Fresh diagnostics: - `fresh-epoch-v0138.json`: `quality_gate.pass=true`, decision `fresh_epoch_ready_to_compare_against_historical`. - `g4-review-queue-preview-v0138-fresh.json`: `quality_gate.pass=true`, decision `review_queue_ready_for_manual_review`, `read_only=true`, `mutated=false`. - `scheduled-dry-run.json`: historical/full-window broad G4 still blocks on `trace_quality_needs_more_dogfooding`, `decay_risk_above_threshold`, and `background_quality_warnings_present`. -- G5a/G5b/G5c/G5d: `dogfood trace-cluster-preview`, `dogfood trace-candidate-persist/list/update/apply`, read-only `review_score`/`review_recommendation`, and `dogfood reinforcement-refinement-preview` are merged/released through v0.1.141. -- G5d reinforcement-refinement preview does not persist review state, increment reinforcement counts, promote memories, auto-approve ordinary conversation, or change retrieval defaults. Full release CI, publish, manual true-distribution smoke, and live Hermes runtime rollout passed. +- G5a/G5b/G5c/G5d/G5e: `dogfood trace-cluster-preview`, `dogfood trace-candidate-persist/list/update/apply`, read-only `review_score`/`review_recommendation`, `dogfood reinforcement-refinement-preview`, and `dogfood decay-collapse-preview` are merged/released through v0.1.142. +- G5e decay-collapse preview does not persist review state, delete/deprecate/collapse memories, promote memories, auto-approve ordinary conversation, or change retrieval defaults. Full release CI, publish, manual true-distribution smoke, and live Hermes runtime rollout passed. Progress estimate: -- Overall north-star: 60-62%. -- Substrate/evidence plumbing: about 74-76%. -- Safe automatic mutation/promotion: about 42-45%. -- Remaining work: about 38-40% overall. +- Overall north-star: 62-64%. +- Substrate/evidence plumbing: about 75-77%. +- Safe automatic mutation/promotion: about 43-46%. +- Remaining work: about 36-38% overall. Current interpretation: -Fresh v0.1.141 evidence and merged G5a/G5b/G5c/G5d are healthy enough to continue the brain-like reviewed-candidate runway. Broad G4/background apply remains blocked. G5d added repeated activation -> reinforcement refinement candidates as a ref-safe preview only; it is not approval for persistence, promotion, auto-approval, reinforcement mutation, decay, supersession, or retrieval-default changes. +Fresh v0.1.142 evidence and merged G5a/G5b/G5c/G5d/G5e are healthy enough to continue the brain-like reviewed-candidate runway. Broad G4/background apply remains blocked. G5e added stale weak evidence -> decay/collapse candidates as a ref-safe preview only; it is not approval for persistence, promotion, auto-approval, reinforcement mutation, decay/collapse mutation, supersession, or retrieval-default changes. Recommended sequence from here: -1. Start G5e: stale weak evidence -> decay/collapse candidate preview, still preview/review-first and read-only. -2. Preserve G5e safety shape: review scores and recommendations remain review-priority signals only; they must not delete, decay, collapse, persist review state, promote memories, auto-approve ordinary conversation, or change retrieval defaults. -3. If a later G5d/G5e slice introduces mutation, keep it behind a separate explicit apply policy with backup, audit, approval phrase, actor, reason hash, and rollback. +1. Start conflict -> supersession/replacement candidate preview, still preview/review-first and read-only. +2. Preserve G5d/G5e safety shape: review scores/recommendations, reinforcement refinement, and decay/collapse candidates remain review-priority signals only; they must not delete, decay, collapse, persist review state, promote memories, auto-approve ordinary conversation, or change retrieval defaults. +3. If a later conflict/supersession or decay/collapse slice introduces mutation, keep it behind a separate explicit apply policy with backup, audit, approval phrase, actor, reason hash, and rollback. 4. G4 broad apply contract: preserve explicit policy/approval/actor/reason/backup/expected-queue/audit/rollback requirements and keep raw-content/default-retrieval/ordinary-auto-approval forbidden. 5. Historical telemetry reconciliation: use only a reviewed telemetry-only `telemetry-reset-v1` corridor for historical rows older than a selected epoch; protected memory tables must not mutate. diff --git a/.dev/status/current-handoff.md b/.dev/status/current-handoff.md index 005d637..73278cf 100644 --- a/.dev/status/current-handoff.md +++ b/.dev/status/current-handoff.md @@ -1,43 +1,43 @@ # agent-memory current handoff Status: AI-authored draft. Not yet human-approved. -Last updated: 2026-05-13 00:07 KST +Last updated: 2026-05-13 00:53 KST -## v0.1.141 + G5d completed checkpoint +## v0.1.142 + G5e completed checkpoint Use `.dev/status/next-agent-memory-action.md` as the shortest current source of truth. Current verified state: -- Latest completed release/runtime rollout: `v0.1.141`. -- Runtime: `/Users/reddit/.agent-memory/runtime/v0.1.141/.venv/bin/agent-memory`. -- Runtime smoke report: `/Users/reddit/.agent-memory/runtime/v0.1.141/g5d-live-smoke.json`. -- GitHub Release: `https://github.com/cafitac/agent-memory/releases/tag/v0.1.141`. -- npm/PyPI latest verified as `0.1.141`. -- Hermes configs updated from v0.1.140 to v0.1.141 and backed up as `/Users/reddit/.hermes/config.yaml.bak-v0141-20260513T000411` plus matching profile backups. +- Latest completed release/runtime rollout: `v0.1.142`. +- Runtime: `/Users/reddit/.agent-memory/runtime/v0.1.142/.venv/bin/agent-memory`. +- Runtime smoke report: `/Users/reddit/.agent-memory/runtime/v0.1.142/g5e-live-smoke.json`. +- GitHub Release: `https://github.com/cafitac/agent-memory/releases/tag/v0.1.142`. +- npm/PyPI latest verified as `0.1.142`. +- Hermes configs updated from v0.1.141 to v0.1.142 and backed up as `/Users/reddit/.hermes/config.yaml.bak-v0142-20260512T155012Z` plus matching profile backups. - Hermes hook doctor is green across default, `personal-oss`, `earlypay`, and `infra-admin` profiles after `--accept-hooks` smoke. - Fresh G4 report directory retained: `/Users/reddit/.agent-memory/reports/g4-v0138-20260512-132253/`. - Fresh linkage diagnosis retained from G4 diagnostics: `g4-linkage-gap-diagnose-v0138-fresh.json` passed with decision `fresh_trace_linkage_gap_not_detected`. - Fresh epoch readiness retained: `fresh-epoch-v0138.json` passed with decision `fresh_epoch_ready_to_compare_against_historical`. - Fresh review queue preview retained: `g4-review-queue-preview-v0138-fresh.json` passed with decision `review_queue_ready_for_manual_review`, `read_only=true`, and `mutated=false`. -- G5a/G5b/G5c/G5d source checkpoint: `dogfood trace-cluster-preview`, `dogfood trace-candidate-persist/list/update/apply`, read-only trace-cluster scoring, and `dogfood reinforcement-refinement-preview` are merged/released through v0.1.141. -- G5d is merged/released via PR #302 and v0.1.141: repeated activation -> reinforcement refinement preview emits read-only/ref-safe candidates with `review_score` and `review_recommendation`; it writes JSON reports only, keeps `mutated=false`, and does not increment reinforcement counts, persist review state, promote memories, auto-approve ordinary conversation, or change retrieval defaults. +- G5a/G5b/G5c/G5d/G5e source checkpoint: `dogfood trace-cluster-preview`, `dogfood trace-candidate-persist/list/update/apply`, read-only trace-cluster scoring, `dogfood reinforcement-refinement-preview`, and `dogfood decay-collapse-preview` are merged/released through v0.1.142. +- G5e is merged/released via PR #306 and v0.1.142: stale weak evidence -> decay/collapse candidate preview emits read-only/ref-safe candidates and guardrails; it writes JSON reports only, keeps `mutated=false`, and does not persist review state, delete/deprecate/collapse memories, auto-approve ordinary conversation, or change retrieval defaults. - Historical scheduled dry-run still blocks broad G4/background apply on `trace_quality_needs_more_dogfooding`, `decay_risk_above_threshold`, and `background_quality_warnings_present`. - Broad G4/background apply remains blocked until the contract, historical reconciliation, narrow reviewed apply, decay/collapse, conflict/supersession, ranking eval, and rollback runway are verified. Progress estimate: -- Overall north-star: 60-62%. -- Substrate/evidence plumbing: about 74-76%. -- Safe automatic mutation/promotion: about 42-45%. -- Remaining work: about 38-40% overall. +- Overall north-star: 62-64%. +- Substrate/evidence plumbing: about 75-77%. +- Safe automatic mutation/promotion: about 43-46%. +- Remaining work: about 36-38% overall. Current interpretation: -- The fresh hook/runtime linkage blocker is resolved for v0.1.138-v0.1.141-era evidence. -- G5d completes another review-first brain-like signal loop, but it is not approval for automatic memory creation or reinforcement mutation. -- Broad G4/background apply remains blocked; fresh readiness, reviewed candidate apply support, G5c scoring, and G5d reinforcement-refinement preview do not authorize automatic memory creation. -- The next safe sequence is stale weak evidence -> decay/collapse candidate preview as review/preview-first G5e work; keep G4 broad apply and historical reconciliation as separate guarded corridors. +- The fresh hook/runtime linkage blocker is resolved for v0.1.138-v0.1.142-era evidence. +- G5e completes another review-first brain-like lifecycle signal loop, but it is not approval for automatic memory creation, decay/collapse mutation, or reinforcement mutation. +- Broad G4/background apply remains blocked; fresh readiness, reviewed candidate apply support, G5c scoring, G5d reinforcement-refinement preview, and G5e decay-collapse preview do not authorize automatic memory creation. +- The next safe sequence is conflict -> supersession/replacement candidate preview as review/preview-first work; keep G4 broad apply and historical reconciliation as separate guarded corridors. - Existing broad-G4 baseline remains a docs/RED-test-only guardrail; do not advertise broad G4 consolidation apply mode as ready. Current safe mutation boundaries: @@ -52,8 +52,8 @@ Brain-like next design axis: - `candidate -> reviewed fact/procedure/preference promotion` is available only through explicit G5b review/apply commands. - `trace cluster -> review-priority scoring` is released G5c and remains human-review-only. - `repeated activation -> reinforcement refinement preview` is released G5d and remains human-review-only; preview scores are not apply approval. -- Next: stale weak evidence -> decay/collapse candidate preview, read-only/ref-safe. -- Later: conflict -> supersession review. +- `stale weak evidence -> decay/collapse candidate preview` is released G5e and remains human-review-only; candidates are not delete/deprecate/collapse approval. +- Next: conflict -> supersession/replacement candidate preview, read-only/ref-safe. - Retrieval ranking changes only behind opt-in eval before any default change. --- @@ -77,30 +77,30 @@ For prompts like "다음으로 뭐해야 해?" or "다음 할 거 추천해줘", - `.dev/status/next-agent-memory-action.md` -Current recommendation: start the next G5d repeated activation -> reinforcement refinement slice as review/preview-first work. Broad G4/background apply remains blocked by historical scheduled-dry-run debt and must not be enabled from a generic continuation prompt. +Current recommendation: start the next conflict -> supersession/replacement candidate preview slice as review/preview-first work. Broad G4/background apply remains blocked by historical scheduled-dry-run debt and must not be enabled from a generic continuation prompt. ## Ready-to-say answer -agent-memory is currently released/runtime-verified through `v0.1.140`. The installed Hermes hooks point at `/Users/reddit/.agent-memory/runtime/v0.1.140/.venv/bin/agent-memory`; package smoke reports `agent_memory.__version__ == 0.1.140`. G5a/G5b/G5c are merged/released for ref-safe trace-cluster preview, explicit reviewed trace-candidate persist/list/update/apply, and read-only `review_score`/`review_recommendation` signals for ref-safe clusters. +agent-memory is currently released/runtime-verified through `v0.1.142`. The installed Hermes hooks point at `/Users/reddit/.agent-memory/runtime/v0.1.142/.venv/bin/agent-memory`; package smoke reports `agent_memory.__version__ == 0.1.142`. G5a-G5e are merged/released for ref-safe trace-cluster preview, explicit reviewed trace-candidate persist/list/update/apply, read-only `review_score`/`review_recommendation` signals, repeated activation -> reinforcement refinement preview, and stale weak evidence -> decay/collapse candidate preview. Broad G4/background consolidation apply mode remains blocked. Fresh linkage diagnostics passed, but historical scheduled-dry-run still blocks on `trace_quality_needs_more_dogfooding`, `decay_risk_above_threshold`, and `background_quality_warnings_present`. The first mutating review-queue/candidate corridors remain deliberately narrow and require explicit policy, approval phrase, actor, reason hash, backup, audit row, and rollback hint. -Historical G4 contract checkpoint remains docs/RED-test-only: PR #200, PR #202, PR #204, v0.1.99 runtime `/Users/reddit/.agent-memory/runtime/v0.1.99/.venv/bin/agent-memory`, and report `/Users/reddit/.agent-memory/reports/v0.1.99-runtime-qa-20260507T074118` are retained as the broad-G4-blocked baseline. Later releases hardened only narrow cleanup/restore/audit safety corridors, blocker diagnostics, and future trace/observation quality; they did not enable broad background consolidation mutation. +Historical G4 contract checkpoint remains docs/RED-test-only: PR #200, PR #202, PR #204, v0.1.99 runtime `/Users/reddit/.agent-memory/runtime/v0.1.99/.venv/bin/agent-memory`, and report `/Users/reddit/.agent-memory/reports/v0.1.99-runtime-qa-20260507T074118` are retained as the broad-G4-blocked baseline. Later releases hardened only narrow cleanup/restore/audit safety corridors, blocker diagnostics, fresh linkage, reviewed candidates, reinforcement review signals, and decay/collapse review signals; they did not enable broad background consolidation mutation. ## Current next slice -Completed release baseline: G4 fresh linkage/mutation safety landed by v0.1.136; G5a/G5b/G5c reviewed-candidate/scoring runway is released through v0.1.140; current active slice is the next G5d repeated activation -> reinforcement refinement, review/preview-first. +Completed release baseline: G4 fresh linkage/mutation safety landed by v0.1.136; G5a-G5e reviewed-candidate/scoring/reinforcement/decay preview runway is released through v0.1.142; current active slice is the next conflict -> supersession/replacement candidate preview, review/preview-first. -Current slice status: v0.1.140 is installed and live-smoked. G5c is complete and still read-only: it improves review prioritization for trace clusters without enabling broad background consolidation apply, telemetry reset apply, ordinary conversation auto-approval, raw transcript/query storage, or default retrieval ranking changes. +Current slice status: v0.1.142 is installed and live-smoked. G5e is complete and still read-only: it improves review prioritization for stale weak evidence and decay/collapse candidates without enabling broad background consolidation apply, telemetry reset apply, decay/delete/collapse mutation, ordinary conversation auto-approval, raw transcript/query storage, or default retrieval ranking changes. -Target shape for this slice: +Target shape for the next slice: -- `agent-memory dogfood fresh-epoch --epoch-start ` emits `dogfood_fresh_epoch_readiness` with `read_only=true`, `mutated=false`, and `automation_policy.apply_supported=false`. -- The report excludes historical rows by timestamp and reports only aggregate counts for observation/trace/activation coverage, empty retrieval outcomes, trace distributions, candidate signals, and historical rows excluded. +- `agent-memory dogfood supersession-preview ` or equivalent emits a read-only/ref-safe conflict/supersession candidate report with `read_only=true`, `mutated=false`, `default_retrieval_unchanged=true`, and `automation_policy.apply_supported=false`. +- The report identifies same-claim-slot conflicts, replacement/supersedes chains, lifecycle status context, and copy-paste review commands using refs and aggregate counts only. - No raw prompt/query/transcript/trace summary/sample values are printed. -- The first live source smoke against `/Users/reddit/.agent-memory/memory.db` with epoch `2026-05-09T21:57:33Z` wrote `/tmp/agent-memory-fresh-epoch-v0128-source.json`: 21 observations, 21 traces, 21 activations, coverage ratio 0.2381, 10 empty retrievals, blocked by `low_epoch_observation_trace_coverage` and `epoch_empty_retrieval_outcome_unknown`; no mutation. +- The first live G5e smoke against `/Users/reddit/.agent-memory/memory.db` wrote `/Users/reddit/.agent-memory/runtime/v0.1.142/g5e-live-smoke.json`: `read_only=true`, `mutated=false`, default retrieval unchanged, candidate count `0`, blocked only by `no_decay_collapse_candidates_ready`; no mutation. -Next safe slice: continue repeated activation -> reinforcement as preview/review-first G5d work. Do not live-apply queue/candidate mutations without an explicit operator decision and the exact guarded command shape. Broad G4 apply remains a separate, still-blocked slice. +Next safe slice: continue conflict -> supersession/replacement as preview/review-first work. Do not live-apply queue/candidate mutations without an explicit operator decision and the exact guarded command shape. Broad G4 apply remains a separate, still-blocked slice. Recommended local backup commands before any future live mutation: diff --git a/.dev/status/next-agent-memory-action.md b/.dev/status/next-agent-memory-action.md index 35a6305..8981a33 100644 --- a/.dev/status/next-agent-memory-action.md +++ b/.dev/status/next-agent-memory-action.md @@ -1,7 +1,7 @@ # agent-memory next action Status: AI-authored draft. Not yet human-approved. -Last updated: 2026-05-13 00:07 KST +Last updated: 2026-05-13 00:53 KST ## Use this first when the user asks @@ -16,7 +16,7 @@ Then verify the repo/runtime state briefly and answer from the recommendation be ## One-sentence current state -`agent-memory` is released and live-runtime-smoked through `v0.1.141`; the installed Hermes hooks are healthy on the v0.1.141 runtime across default, personal-oss, earlypay, and infra-admin profiles. Fresh linkage diagnostics no longer show a hook linkage bug, G5a/G5b/G5c/G5d are merged/released for ref-safe trace-cluster previews, reviewed trace-candidate persist/list/update/apply, read-only review scoring, and repeated activation -> reinforcement refinement preview. Broad G4/background apply remains blocked. +`agent-memory` is released and live-runtime-smoked through `v0.1.142`; the installed Hermes hooks are healthy on the v0.1.142 runtime across default, personal-oss, earlypay, and infra-admin profiles. Fresh linkage diagnostics no longer show a hook linkage bug, and G5a-G5e are merged/released for ref-safe trace-cluster preview, reviewed trace-candidate persist/list/update/apply, read-only review scoring, repeated activation -> reinforcement refinement preview, and stale weak evidence -> decay/collapse candidate preview. Broad G4/background apply remains blocked. ## Current progress estimate toward the north-star @@ -24,41 +24,42 @@ The north-star is a human-memory-like, mostly automatic, graph-based memory cons Approximate progress: -- Overall north-star: 60-62%. -- Substrate/evidence plumbing: about 74-76%. -- Safe automatic mutation/promotion: about 42-45%. -- Remaining work: about 38-40% overall, concentrated in guarded apply, conflict/supersession, decay/collapse, ranking evaluation, and rollback confidence. +- Overall north-star: 62-64%. +- Substrate/evidence plumbing: about 75-77%. +- Safe automatic mutation/promotion: about 43-46%. +- Remaining work: about 36-38% overall, concentrated in guarded apply, conflict/supersession, decay/collapse apply, ranking evaluation, and rollback confidence. Reasoning: -- Done: trace substrate, retrieval observations, activation/reinforcement/decay evidence, graph/review primitives, background dry-runs, fresh-epoch comparison, persisted review queue, first narrow approved mutation (`apply_reinforcement_marker`), fresh linkage health, G5a ref-safe `trace cluster -> consolidation candidate` preview, G5b reviewed trace-candidate persist/list/update/apply for explicit fact/preference/procedure promotion, G5c read-only cluster scoring, and G5d read-only repeated activation -> reinforcement refinement preview. +- Done: trace substrate, retrieval observations, activation/reinforcement/decay evidence, graph/review primitives, background dry-runs, fresh-epoch comparison, persisted review queue, first narrow approved mutation (`apply_reinforcement_marker`), fresh linkage health, G5a ref-safe `trace cluster -> consolidation candidate` preview, G5b reviewed trace-candidate persist/list/update/apply for explicit fact/preference/procedure promotion, G5c read-only cluster scoring, G5d read-only repeated activation -> reinforcement refinement preview, and G5e read-only stale weak evidence -> decay/collapse candidate preview. - Not done: broad background consolidation apply, automatic long-term memory promotion, conflict-aware automatic supersession, weak-trace decay/collapse apply, automatic graph-cluster-to-fact/procedure/preference generation, retrieval-ranking changes behind eval, and large-scope rollback confidence. ## Latest verified checkpoint -- Release: `v0.1.141` -- GitHub Release: `https://github.com/cafitac/agent-memory/releases/tag/v0.1.141` -- npm: `@cafitac/agent-memory@0.1.141` -- PyPI: `cafitac-agent-memory==0.1.141` -- Runtime: `/Users/reddit/.agent-memory/runtime/v0.1.141/.venv/bin/agent-memory` -- Runtime smoke report: `/Users/reddit/.agent-memory/runtime/v0.1.141/g5d-live-smoke.json` -- Hermes config backups from v0.1.141 rollout: `/Users/reddit/.hermes/config.yaml.bak-v0141-20260513T000411` plus matching `personal-oss`, `earlypay`, and `infra-admin` profile backups. +- Release: `v0.1.142` +- GitHub Release: `https://github.com/cafitac/agent-memory/releases/tag/v0.1.142` +- npm: `@cafitac/agent-memory@0.1.142` +- PyPI: `cafitac-agent-memory==0.1.142` +- Runtime: `/Users/reddit/.agent-memory/runtime/v0.1.142/.venv/bin/agent-memory` +- Runtime smoke report: `/Users/reddit/.agent-memory/runtime/v0.1.142/g5e-live-smoke.json` +- Hermes config backups from v0.1.142 rollout: `/Users/reddit/.hermes/config.yaml.bak-v0142-20260512T155012Z` plus matching `personal-oss`, `earlypay`, and `infra-admin` profile backups. - Fresh report directory retained from G4 diagnostics: `/Users/reddit/.agent-memory/reports/g4-v0138-20260512-132253/` - Fresh linkage diagnosis retained: `/Users/reddit/.agent-memory/reports/g4-v0138-20260512-132253/g4-linkage-gap-diagnose-v0138-fresh.json` - Fresh epoch readiness retained: `/Users/reddit/.agent-memory/reports/g4-v0138-20260512-132253/fresh-epoch-v0138.json` - Fresh review queue preview retained: `/Users/reddit/.agent-memory/reports/g4-v0138-20260512-132253/g4-review-queue-preview-v0138-fresh.json` - Historical scheduled dry-run retained: `/Users/reddit/.agent-memory/reports/g4-v0138-20260512-132253/scheduled-dry-run.json` -- Source G5a/G5b/G5c/G5d checkpoint: `dogfood trace-cluster-preview`, `dogfood trace-candidate-persist/list/update/apply`, read-only `review_score`/`review_recommendation`, and `dogfood reinforcement-refinement-preview` are merged and released through v0.1.141. -- Release/published-install smoke passed; runtime rollout is doctor-green across default, personal-oss, earlypay, and infra-admin Hermes profiles. +- Source G5a/G5b/G5c/G5d/G5e checkpoint: `dogfood trace-cluster-preview`, `dogfood trace-candidate-persist/list/update/apply`, read-only `review_score`/`review_recommendation`, `dogfood reinforcement-refinement-preview`, and `dogfood decay-collapse-preview` are merged and released through v0.1.142. +- Release/published-install smoke passed; manual true-distribution PyPI/npm smoke passed; runtime rollout is doctor-green across default, personal-oss, earlypay, and infra-admin Hermes profiles. ## Current blocker -Fresh v0.1.141 runtime plus v0.1.138 fresh telemetry evidence are healthy enough for continued brain-like reviewed-candidate planning: +Fresh v0.1.142 runtime plus v0.1.138 fresh telemetry evidence are healthy enough for continued brain-like reviewed-candidate planning: - `g4-linkage-gap-diagnose-v0138-fresh.json`: quality gate pass, decision `fresh_trace_linkage_gap_not_detected`, observation/trace linkage coverage `1.0`, unlinked observations `0`. - `fresh-epoch-v0138.json`: quality gate pass, decision `fresh_epoch_ready_to_compare_against_historical`. - `g4-review-queue-preview-v0138-fresh.json`: quality gate pass, decision `review_queue_ready_for_manual_review`, `read_only=true`, `mutated=false`. - `g5d-live-smoke.json`: quality gate decision `reinforcement_refinement_preview_ready_for_human_review`, `read_only=true`, `mutated=false`, candidate count `1`. +- `g5e-live-smoke.json`: quality gate decision `continue_decay_collapse_dogfooding_before_review`, `read_only=true`, `mutated=false`, candidate count `0`, default retrieval unchanged. However, historical scheduled-dry-run still blocks broad G4/background apply on: @@ -72,10 +73,10 @@ Interpretation: this is no longer a fresh hook linkage bug. It is historical tel Proceed in this sequence: -1. Start G5e as review/preview-first work: stale weak evidence -> decay/collapse candidate preview. It should be read-only, ref-safe, and should not delete, decay, collapse, or rewrite any memory by default. -2. Keep G5d/G5e semantics narrow: review scores and recommendations are review-priority signals only; they do not persist review state, increment/decrement reinforcement, promote memories, auto-approve ordinary conversation, or change retrieval defaults. -3. Add explicit contract tests for future G5e safety: `read_only=true`, `mutated=false`, `apply_supported=false`, no raw prompt/query/transcript/sample output, and protected memory tables unchanged. -4. If a later G5d/G5e apply slice needs mutation, add only a separate explicit narrow apply policy with backup/audit/rollback; generic continuation does not authorize it. +1. Start the next review-first slice after G5e: conflict -> supersession/replacement candidate preview, read-only/ref-safe. This should identify same-claim-slot conflicts and replacement chains for human review without mutating facts, relations, status, retrieval ranking, or prompts. +2. Keep G5d/G5e semantics narrow: review scores/recommendations, reinforcement refinement, and decay/collapse candidates are review-priority signals only; they do not persist review state, increment/decrement reinforcement, promote memories, auto-approve ordinary conversation, or change retrieval defaults. +3. Add explicit contract tests for future conflict/supersession safety: `read_only=true`, `mutated=false`, `apply_supported=false`, no raw prompt/query/transcript/sample output, and protected memory tables unchanged. +4. If a later conflict/supersession or decay/collapse apply slice needs mutation, add only a separate explicit narrow apply policy with backup/audit/rollback; generic continuation does not authorize it. 5. G4 broad apply contract remains blocked/guardrail-only. Required future shape: explicit policy, approval phrase, actor, reason hash, backup path, expected queue ids/hash, audit row, rollback hint, raw-content exclusion, and ordinary-conversation auto-approval forbidden. 6. Historical telemetry reconciliation remains a separate reviewed `telemetry-reset-v1` corridor for historical telemetry rows older than a chosen epoch; never delete facts/procedures/episodes/relations/source records/status history. @@ -83,7 +84,7 @@ Proceed in this sequence: Do not start with live broad G4/background apply. -Do not treat fresh linkage health, G5b reviewed candidate apply support, G5c review scores, or G5d reinforcement-refinement preview as approval for automatic memory creation. They only make the review runway safer and more inspectable. +Do not treat fresh linkage health, G5b reviewed candidate apply support, G5c review scores, G5d reinforcement-refinement preview, or G5e decay-collapse preview as approval for automatic memory creation. They only make the review runway safer and more inspectable. Do not live-apply persisted queue/candidate mutations unless the operator intentionally uses the exact guarded command shape with backup, policy, approval phrase, actor, and reason. Generic continuation does not authorize broad apply, ordinary conversation auto-approval, raw transcript storage, decay/delete, promotion, supersession, retrieval-ranking changes, or treating review scores as apply approval. @@ -93,21 +94,21 @@ Do not silently delete, reset, or rewrite telemetry. Historical reconciliation m If asked "다음으로 뭐해야 해?", answer: -> 지금은 v0.1.141까지 릴리즈/설치/스모크가 끝났고 Hermes hook도 default/personal-oss/earlypay/infra-admin 전부 doctor-green입니다. G5a/G5b/G5c/G5d는 merged/released이고, G5d는 repeated activation -> reinforcement refinement를 read-only preview로 보여주는 단계입니다. 전체 목표 대비 대략 60-62% 정도 왔고, 남은 38-40%는 자동 apply/승격/decay/supersession/rollback 쪽입니다. 다음은 G5e stale weak evidence -> decay/collapse candidate preview를 read-only로 여는 게 맞습니다. broad G4/background apply는 historical scheduled-dry-run debt 때문에 아직 금지입니다. +> 지금은 v0.1.142까지 릴리즈/설치/스모크가 끝났고 Hermes hook도 default/personal-oss/earlypay/infra-admin 전부 doctor-green입니다. G5a-G5e는 merged/released이고, G5e는 stale weak evidence -> decay/collapse candidate를 read-only preview로 보여주는 단계입니다. 전체 목표 대비 대략 62-64% 정도 왔고, 남은 36-38%는 자동 apply/승격/decay/supersession/rollback 쪽입니다. 다음은 conflict -> supersession/replacement candidate preview를 read-only로 여는 게 맞습니다. broad G4/background apply는 historical scheduled-dry-run debt 때문에 아직 금지입니다. ## Quick verification commands ```bash cd /Users/reddit/Project/agent-memory git status --short --branch -/Users/reddit/.agent-memory/runtime/v0.1.141/.venv/bin/python - <<'PY' +/Users/reddit/.agent-memory/runtime/v0.1.142/.venv/bin/python - <<'PY' import agent_memory print(agent_memory.__version__) PY -/Users/reddit/.agent-memory/runtime/v0.1.141/.venv/bin/agent-memory dogfood reinforcement-refinement-preview \ +/Users/reddit/.agent-memory/runtime/v0.1.142/.venv/bin/agent-memory dogfood decay-collapse-preview \ /Users/reddit/.agent-memory/memory.db \ - --limit 20 --top 3 --frequent-threshold 3 \ - --output /tmp/agent-memory-next-g5d-reinforcement-refinement-preview.json + --limit 200 --top 10 --min-decay-score 0.5 \ + --output /tmp/agent-memory-next-g5e-decay-collapse-preview.json ``` -Expected: read-only/no-mutation. G5d should remain `reinforcement_refinement_preview_ready_for_human_review`; fresh linkage should remain `fresh_trace_linkage_gap_not_detected`; broad apply remains blocked until the reviewed contract/reconciliation/apply runway is explicitly completed. +Expected: read-only/no-mutation. G5e should remain a decay/collapse review-priority preview (`read_only=true`, `mutated=false`, default retrieval unchanged); fresh linkage should remain `fresh_trace_linkage_gap_not_detected`; broad apply remains blocked until the reviewed contract/reconciliation/apply runway is explicitly completed. diff --git a/tests/test_roadmap_contract.py b/tests/test_roadmap_contract.py index 5d93bcb..b536944 100644 --- a/tests/test_roadmap_contract.py +++ b/tests/test_roadmap_contract.py @@ -59,7 +59,7 @@ def test_current_handoff_does_not_advertise_broad_g4_apply_as_ready() -> None: assert "docs/RED-test-only" in handoff -def test_v0141_status_docs_record_g5d_completion_and_next_brainlike_steps() -> None: +def test_v0142_status_docs_record_g5e_completion_and_next_brainlike_steps() -> None: next_action = _read_doc(".dev/status/next-agent-memory-action.md") handoff = _read_doc(".dev/status/current-handoff.md") current_progress = _read_doc(".dev/roadmap/memory-consolidation/current-progress-and-next-steps.md") @@ -67,11 +67,11 @@ def test_v0141_status_docs_record_g5d_completion_and_next_brainlike_steps() -> N stage_g = _read_doc(".dev/roadmap/memory-consolidation/stage-g-cautious-automation.md") for doc in (next_action, handoff, current_progress): - assert "v0.1.141" in doc - assert "/Users/reddit/.agent-memory/runtime/v0.1.141/.venv/bin/agent-memory" in doc + assert "v0.1.142" in doc + assert "/Users/reddit/.agent-memory/runtime/v0.1.142/.venv/bin/agent-memory" in doc assert "fresh_trace_linkage_gap_not_detected" in doc assert "g4-v0138-20260512-132253" in doc - assert "Overall north-star: 60-62%" in doc + assert "Overall north-star: 62-64%" in doc assert "broad g4/background apply remains blocked" in doc.lower() assert "dogfood trace-cluster-preview" in next_action @@ -81,8 +81,10 @@ def test_v0141_status_docs_record_g5d_completion_and_next_brainlike_steps() -> N assert "G5e" in next_action assert "review_score" in next_action assert "dogfood reinforcement-refinement-preview" in next_action + assert "dogfood decay-collapse-preview" in next_action assert "repeated activation -> reinforcement" in next_action assert "stale weak evidence -> decay/collapse candidate preview" in next_action + assert "conflict -> supersession/replacement candidate preview" in next_action assert "G4 broad apply contract" in next_action assert "historical telemetry reconciliation" in next_action.lower() assert "trace cluster -> consolidation candidate" in stage_g