v2.4.2: POV Token Budget Restructure — dedupe + boilerplate refactor (Closes #140)#141
Merged
Merged
Conversation
added 13 commits
May 17, 2026 18:38
… C1) Single-purpose script for measuring Parzival session-cost token surfaces (activation / [ST] / [DA] / [CL]) using Anthropic SDK count_tokens (authoritative) + tiktoken cl100k_base (cross-reference). Implements: - DEC-089 tokenizer stack (SDK primary + tiktoken secondary) - DEC-090 tokenizer choice is user preference (graceful tiktoken fallback; API key never required, never coerced) - TASK-059 condition C1 (real-tokenizer measurements replace 4-char/token projections for AC pass/fail verdict) - Chain-walking surface file discovery via workflow.md firstStep + step nextStepFile traversal (rather than fragile hand-coded lists) - --profile module-only mode for AC measurement scoped to Parzival module files (per r2 brief Section 5 Track B carveout)
CREED-template.md: drop 'Critical dispatch constraints' sub-block (echoed GC-19/20/21 from constraints files) + strip 'Triggers GC-X.' sentences from Anti-Pattern bullets. 699w -> 613w (-86w). Brief estimate -120w was optimistic on per-sentence sizes; actual reduction is 12.3%. PERSONA-template.md: drop ## Principles section (duplicated CREED ## Core Values), replace with 1-line pointer. Add canonical-confidence-schema note at top of ## Communication Style per dedupe map. 510w -> 430w (-80w). Init-sanctum.py still produces valid 8-file sanctum (AC-10 verified at /tmp/sanctum-test).
NEW lazy-loaded reference at _ai-memory/pov/references/auto-memory-best-practices.md. Documents the per-prompt token cost of ~/.claude/projects/<project>/memory/MEMORY.md (loads on every user message), what belongs vs does-not-belong in MEMORY.md, maintenance cadence (~200w total cap), and anti-patterns. Zero activation cost (on-demand load only). Track B operator-side reference for Parzival's post-merge MEMORY.md hygiene work.
Add ## Standard Step Frame section to STEP-PREAMBLE.md enumerating the 5 boilerplate elements step-files reference via scaffold: frontmatter rather than inlining. STEP-SCAFFOLD.md already contains the templated frame content (Scope Block Format, Completion Note Format, Sequence Header Format, TERMINATION STEP PROTOCOLS) so E2 is verify-only — no content change to STEP-SCAFFOLD in this commit. Prepares for EDIT-F step-file refactor (62 step files drop inline boilerplate).
NEW lazy-loaded reference at _ai-memory/pov/references/workflow-map-details.md. Receives the content extracted from WORKFLOW-MAP.md by EDIT-C (Phase Summaries, Phase Transition Rules, project-status.md Schema, Workflow File Header Standard, End of Session Protocol). EDIT-K schema hardening: project-status.md Schema section includes explicit ≤80-word caps on last_session_summary and notes fields, DO/DO-NOT examples (anti-pattern quotes the testV2 active_task overflow case verbatim), and explicit pointer to session-logs/SESSION_HANDOFF_<DATE>.md as the proper home for long-form session narrative. Zero activation cost (load: on-demand frontmatter). Consulted only when authoring/auditing workflows or writing project-status.md.
…es (EDIT-C) Extract 5 reference-only sections from WORKFLOW-MAP.md to lazy-loaded references/workflow-map-details.md (created by EDIT-D): - Phase Workflows Summary (7 phase blocks, ~520w) - Phase Transition Rules table (10-row, ~190w) - Project Status File Schema (35 lines YAML, ~140w) - Workflow File Header Standard (13 lines code block, ~70w) - End of Session Protocol prose (2 code blocks, ~190w) Each extraction replaced by a 1-line pointer to the lazy reference. Routing logic (Master Decision Tree, Entry Points, Reusable Cycle Workflows, User-Invoked Commands, Verification Hierarchy, Routing Errors) stays eager. Verification Hierarchy table + decision rule preserved verbatim in eager surface (load-bearing per feedback memory). Word count: 2488w -> 1714w (-774w, -31%). Brief estimate -1340w was ~30% optimistic on prose density; pattern consistent with Batch 1 CREED+PERSONA edits. AC-04 GC count == 21 (regression-protect verified).
Add ## Load Policy section immediately below frontmatter clarifying that aim-parzival-bootstrap is invoked only on demand from [ST] step-01b, not auto-loaded at activation. parzival.md activation step 4 'Load core skills ... if present' means verify-presence, not read-into-context, per rule r6. Eliminates a documentation contradiction that could cause future regression. NO content change to the skill's executable logic. Coupled with EDIT-A A9 parzival.md wording change which lands separately via C2 diff-gate handshake.
…(EDIT-I)
git mv _ai-memory/pov/knowledge/parzival-master-plan.md ->
oversight/docs/parzival-master-plan-history.md
Rationale: per feedback_keep_meta_out_of_skills_workflows.md — skill/workflow
files hold operational content only; rationale, pilot status, and historical
context belong in oversight/, never in runtime files themselves. Master plan
is build-history rationale, not runtime knowledge.
Reference repoints (Parzival-direct C4 grep found 3 external refs, not the
brief's expected 2; pov-index.csv was an addition not in original brief):
- _ai-memory/pov/knowledge/pov-index.csv L7: registry entry REMOVED (file no
longer in runtime knowledge/ scope)
- _ai-memory/pov/workflows/WORKFLOW-MAP.md L6: Reference repointed
- _ai-memory/pov/workflows/WORKFLOW-MAP.md L262: Reference repointed (line
shifted from brief's L427 after Batch 2 EDIT-C slim)
Creates new top-level oversight/docs/ directory in repo for archival
historical artifacts. This tree is for content moved out of runtime
install scope; it is NOT for per-operator session-work oversight files
(which live in each operator's project tree outside the install).
EDIT-I3 audit per brief (5 sibling knowledge files): all 5 verified to have
external references (escalation 1, document-maintenance 1, complexity 2,
confidence-levels 4, bug-status-workflow 2). None are orphans. All stay
in knowledge/. No additional moves this batch.
Removes identity content duplicated in CREED.md + PERSONA.md (loaded at activation step 5). Operational rules only remain in <rules>; persona + critical constraints collapse to one-line sanctum pointers. Confidence-levels behavior body delegates to PERSONA.md ## Communication Style. Step 4 + step 5 line 31 rewording clarifies existing load behavior (no functional change at step 5 lines 32-40). Per DEC-086 C2 diff-gate handshake; rationale at oversight/dispatch-briefs/v242-edit-a-rationale.md (Session 52, 2026-05-15). Functional gates: <step n=> == 7 | <item cmd=> == 19 | <rule n=> == 7 | <constraint> == 0 (replaced by sanctum note) | NEVER guess removed | Verified -- source removed. Measured delivery: 1,803w -> 1,216w (-587w / -32.6%), 196L -> 136L (-60L). Per-edit savings 81% of brief's per-edit sum (587/725w), DEC-091 band PASS on per-edit-sum projection (108% of 75% floor). Pending post-apply SDK measurement to determine AC-01/AC-03 trajectory. Refs: r2 brief Section 3 A1-A9; DEC-086 C2; DEC-091 band-widening protocol.
Deletes Self-Check Schedule + Violation Severity Reference sections (lines 46-101 pre-edit). Both subsections re-stated information already present in the Constraint Summary table (lines 21-44, KEEP unchanged) and in each individual GC body file at constraints/global/GC-NN-*.md. Self-Check schedule (every 10 messages) is enforced by <behavior name="self-check"> in parzival.md; per-constraint violation severity lives in each GC body file. Replaces deletion with single 3-line pointer block. Per DEC-086 C2 diff-gate handshake; rationale at oversight/dispatch-briefs/v242-edit-b-rationale.md (Session 52, 2026-05-15). Functional gates: ^| GC- == 21 | Self-Check Schedule == 0 | Violation Severity Reference == 0 | Constraint Summary == 1 | Critical Rule == 1. Measured delivery (per Will DEC-094 measurement-first methodology): 1,171w -> 493w (-678w / -57.9%), 101L -> 48L (-53L). Empirical SDK delta -1,579 tokens via Anthropic SDK count_tokens (claude-opus-4-7); 153% of brief headline -1,030 projection — markdown tables tokenize denser (2.385 tok/w) than EDIT-A's XML prose (1.722 tok/w). Refs: r2 brief Section 3 B1-B3; DEC-094 measurement-first directive.
Removes inline preamble pointer line (F1) and collapses verbose sequence header to '## Sequence' (F2) across 9 of 10 sample files. step-01-load- context.md already lacked both anchors; included in inventory as no-op (drift D-F1, kept per Will Q2 directive). Per DEC-094 + DEC-095 measurement-first + STOP-GATE-before-commit; rationale at oversight/dispatch-briefs/v242-edit-f-sample-rationale.md. Will-approved 2026-05-15 via wb relay. Sample uniform delta: -2L / -20w per file. Sample SDK delta: -800 SDK content-only (-88.9/file avg) via empirical count_tokens on extracted blobs pre-authoring. Density 4.369 tok/w (step-file boilerplate is densest content type measured to date). 71% of brief per-file projection — passes Will's <50% stop-condition; within DEC-091 75%-band. Gate 2 bulk approval gated on post-apply measurement within ±15% of projections (-800 SDK content-only / -356 SDK at AC-03 surfaces). Refs: r2 brief Section 3 EDIT-F; DEC-094, DEC-095; kickoff memo 10-file sample list.
Per Will Q1 P3 (Partial bulk on AC-03 surfaces only) + Will Q2 BULK APPROVAL. Removes inline preamble pointer line (F1) and collapses verbose sequence header to '## Sequence' (F2) across 12 step files: - session/start/steps-c: step-02-compile-status, step-03-present-and-wait - session/close/steps-c: step-03-create-handoff, step-04-save-and-confirm - cycles/agent-dispatch/steps-c: step-02 through step-09 (8 files) Per DEC-094 measurement-first + DEC-095 STOP-GATE-before-commit + DEC-095 C3 sample-then-bulk. Sample (commit 852e051) measured empirical per-file surface delta -60 SDK/file (-32.5% chunking-boundary discount on -88.9 content-only avg). Bulk projection: 12 files x -60 SDK/file = -720 SDK at AC-03 surfaces. Gate 2 acceptance band (Will DEC-PM291-D12): -612 to -828 SDK; outside triggers STOP + re-analyze. Scope-discrepancy note: Will's relay specified 21 files; filesystem inventory at HEAD 852e051 showed 12 eligible (anchor-bearing step files in AC-03 measured surfaces). Surfaced honestly in Gate 2 brief; Will Q1 PROCEED WITH 12-FILE SCOPE — pre-apply filesystem audit wins over inherited brief estimates. Functional gates: 0 F1 anchors + 0 F2 anchors in AC-03 steps-c dirs. Uniform per-file delta: -2L / -20w. Refs: r2 brief Section 3 EDIT-F; DEC-094, DEC-095; sample commit 852e051; Gate 2 brief PARZIVAL-RELAY-EDIT-F-GATE2-2026-05-15.md.
…estructure Documents the cumulative scope of EDIT-A through EDIT-K + EDIT-F sample/bulk: - 3 acceptance criteria: AC-01 (act ≤9K) misses by 1,546, deferred to v2.4.4; AC-02 (act+ST ≤25K) PASSES +6,706; AC-03 (full ≤35K) PASSES +896 - Per-EDIT highlights with measured SDK surface deltas - Methodology lessons captured: content-type density variance + chunking- boundary discount with empirical citations across EDIT-B and EDIT-F - Tooling: scripts/measure_tokens.py new measurement tool - Out-of-scope deferrals documented (Track B operator hygiene, ~30 v2.4.4 cleanup files, AC-01 full close, TD-529 path-truncation bug) - Compatibility note: sanctum template changes don't auto-propagate to existing operator sanctums (First Breath only) Per Will Q3 directive (DEC-095): source-repo CHANGELOG.md in pov-work branch; AC-03 closure narrative + density methodology lesson + chunking- boundary discount rule + scope clarity + honest AC-01 P1 disposition. Professional tone; no AI/process meta tokens.
3331dea to
f49ecc1
Compare
…t edits Repoint the every-10-messages self-check behavior to the per-constraint checklist in self-check-constraints.md and remove the circular redirect left in constraints.md. Replace 5 unresolvable relative paths to workflow-map-details.md with project-root-absolute paths. Reconcile the STEP-PREAMBLE Standard Step Frame and the bootstrap SKILL Load Policy with the actual shipped step files. Add "maintenance" to the measure_tokens.py phase choices and pin DEFAULT_MODEL to claude-opus-4-7 so AC measurements reproduce from the default; replace brittle inline line-number references with symbolic anchors. Drop the stale parzival-master-plan manifest row, make the auto-memory-best-practices reference reachable via step-01b and pov-index.csv, and correct the CHANGELOG EDIT-F file count and deferred-scope list. Also drops the removed PERSONA Principles section and a non-existent CREED section from parzival.md pointers, and reconciles the step-5 BOND.md activation-vs-session-start wording. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…p files The Sequence Header Format section prescribed the strict-order header as "## Sequence of Instructions (Do not deviate, skip, or optimize)", which contradicted the STEP-PREAMBLE Standard Step Frame and all 21 step files using "## Sequence". Updated the strict-order prescription to match; the parallel-gather header form is unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bump version.txt, pyproject.toml, and src/memory/__version__.py to 2.4.2. Promote the CHANGELOG [Unreleased] section to [2.4.2] - 2026-05-17 and add a fresh empty [Unreleased] section. Add Upgrade Instructions for v2.4.1 -> v2.4.2. Complete the compare-link block: add [2.4.2] and backfill the missing [2.4.0], [2.3.2], [2.3.1], and [2.3.0] links; repoint [Unreleased] (TD-541). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reflect the POV token-budget restructure in user-facing docs: - README-POV.md: add the new _ai-memory/pov/references/ directory and its lazy-loaded reference files, and STEP-PREAMBLE.md, to the project structure tree and key-files table. - INSTALL-GUIDE-POV.md: add references/ to the project structure tree; generalize stale-file cleanup wording (was version-specific) so the update guidance is accurate for relocated files; clarify that re-running install.sh is the entire update process with no manual file surgery, data migration, or breaking changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The v2.4.2 Upgrade Instructions referenced the internal 'master-plan' file, which is meaningless to users. Reworded the post-step-2 paragraph in plain language: re-running the installer refreshes all POV files in place, adding new files and removing obsolete or superseded ones, while preserving user data and sanctum identity.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Reduces Parzival activation surface and full-session token cost by deduplicating identity content between
parzival.md, sanctum templates (CREED + PERSONA), and constraint summaries; collapses inline boilerplate in step files to single-source references; extracts low-frequency reference material from eager-loaded workflow files into lazy-loaded reference docs.Cumulative reduction: full-session 37,160 → 34,104 SDK tokens (−8.2%); activation 12,642 → 10,546 SDK tokens (−16.6%).
Related Issue
Part of #140 — AC-01 (activation ≤9,000 tokens) is deferred to a v2.4.4 cleanup PR; issue #140 intentionally stays open to track it. This PR meets AC-02 and AC-03.
Type of Change
Changes Made
_ai-memory/pov/agents/parzival.md): dedupes 11 rules → 7 operational rules; collapses<persona>and<constraints critical="true">blocks to one-line sanctum pointers; replaces verbose confidence-levels behavior body with PERSONA.md reference. Activation −1,011 SDK tokens._ai-memory/pov/constraints/global/constraints.md): removes Self-Check Schedule + Violation Severity Reference sections (re-stated info already in Constraint Summary table + per-GC body files). Replaces with one 3-line pointer block. Activation −1,085 SDK tokens.WORKFLOW-MAP.md+ newworkflow-map-details.md): 5-section eager/lazy split. Routing logic + Verification Hierarchy + Master Decision Tree stay eager. Phase summaries + transition rules + project-status.md schema + workflow header standard + end-of-session protocol → new lazy reference doc.STEP-PREAMBLE.md): single-source Standard Step Frame.oversight/docs/):git mvparzival-master-plan.md→oversight/docs/parzival-master-plan-history.md. Out of runtime scope.auto-memory-best-practices.md): lazy-loaded reference for per-Claude-user MEMORY.md hygiene.scripts/measure_tokens.py): measurement script using Anthropic SDKcount_tokens(authoritative) + tiktoken (cross-reference); module-only profile for AC-binding measurements.## [Unreleased]entry with AC verdicts + methodology lessons + scope clarity + compatibility notes.Testing
Test Environment
/mnt/e/projects/ai-memory-testV2/pov-work/)Test Checklist
pytest tests/) — see Test Cases Covered below for context on 199 failures (pre-existing, unrelated to v2.4.2 markdown-only changes)pytest tests/integration/) — require live Qdrant/Postgres/etc. stack; deferred to CI environmentTest Cases Covered
Pre-merge tests T-01..T-09 (r2 brief Section 4.1):
Full results:
oversight/audits/v242-pre-merge-tests-T01-T09.md(in testV2 oversight tree).wc -w parzival.md ≤900)wc -w constraints.md ≤450)wc -w WORKFLOW-MAP.md ≤1,200)scripts/measure_tokens.py(lint-clean per 3-tool gate); failures cluster in integration tests requiring live Qdrant (e.g.,test_decay_integration.py, "Unable to close grpc_channel"). Re-run in CI environment with full stack.Post-install tests T-10..T-17 (r2 brief Section 4.2) and behavioral regression T-18..T-20 (Section 4.3): owner is testV2 per Will Q2 split; deferred to ready-for-review gate. Will inspects results, not runs tests (per Will Q2).
3-tool lint:
black --check,ruff check,isort --check-onlyall exit 0 onscripts/measure_tokens.py(the only Python file in PR scope).Documentation
scripts/measure_tokens.py)## [Unreleased]entry with AC verdicts + methodology + scope + compatibility)Breaking Changes
Impact:
No behavioral breaking changes. Two soft-compat notes:
Migration Guide:
No migration required for v2.4.2. Operators upgrade via normal install path; sanctum content is preserved.
Security Considerations
scripts/measure_tokens.pyargparse with type checkingPerformance Impact
Details:
This is a token-cost optimization PR. Activation surface reduced 16.6% (12,642 → 10,546 SDK tokens); full session reduced 8.2% (37,160 → 34,104 SDK tokens). Measured via Anthropic SDK
count_tokensagainstclaude-opus-4-7model. No runtime CPU/IO regression — changes are markdown content reductions and Python new-file additions.Deployment Notes
Methodology Notes (For Future Token-Budget Work)
Two empirical rules validated in this release:
Content-type density variance: token cost per word varies materially by content type. Measured: XML prose 1.722 tok/w; markdown tables 2.385 tok/w (+38%); inline-code-heavy markdown 3.545 tok/w (+106%); step-file boilerplate 4.369 tok/w (+154%). Word-count proxies are insufficient; direct
count_tokensmeasurement is reliable.Chunking-boundary discount: content-only blob measurement systematically overstates resulting file-surface delta by ~32%. EDIT-B observed 31.3% loss; EDIT-F sample observed 32.5%. Sample-then-bulk methodology using surface-measured per-file delta as extrapolation baseline avoided this entirely (EDIT-F bulk landed at 0.0% divergence vs projection).
Density-aware-projection feedback memory addendum will capture both rules post-merge.
Checklist
Additional Context
This PR is draft per DEC-086 pre-v2.4.1-tag-cut draft authorization. Ready-for-review flip pending T-10..T-20 execution (Will Q2 split: testV2 owns; Will inspects results).
Related parallel PRs (file-disjoint per Will's brief at
/mnt/e/projects/dev-ai-memory/oversight/tasks/v2.4.1-pov-hygiene-pr/brief.md):Branch state: 13 commits ahead of
main @ 93ad34b.Methodology validation: EDIT-F bulk projection landed at 0.0% divergence (measured −720 SDK = projected −720 SDK) on sample-grounded surface-level extrapolation — durable empirical-rule confirmation per Will's DEC-095 closure note.
For Maintainers: