Add provider-calibrated token accounting by Niko4417 · Pull Request #1802 · oscharko-dev/Keiko

Niko4417 · 2026-07-02T14:49:25Z

Summary

Adds governed per-model calibrated token accounting for context budgeting and prompt assembly, while preserving the existing conservative fallback estimator and reporting the active accounting source in diagnostics.

Resolves #1766

Scope

In scope: model capability metadata for calibrated token accounting, context profile derivation, allocator/prompt/compaction token counting, evidence redaction, and context-quality coverage.
Out of scope: exact tokenizer dependency integration, model-written structured summaries, retrieval/indexing changes, and conversation compaction semantics beyond replacing the token-count source.

Reuse And No-Duplication

Existing Keiko functionality was inspected before implementation.
This PR reuses, extends, generalizes, or consolidates existing functionality where practical.
Any new implementation is limited to a documented capability gap in the linked issue.
This PR does not introduce a parallel workspace, graph, relationship, policy, evidence, memory, connector, workflow, or UI subsystem where an existing subsystem can be extended.
Refactoring or consolidation was considered when existing functionality was close but not shaped for this change.

Delivery Board

Product Impact

UI or user workflow
CLI or developer workflow
Core generation engine
Evidence, audit, or compliance artifact
Security or supply chain
Packaging, release, or npm publication
Documentation or repository hygiene
No user-facing behavior change

Update Impact

Not release-impacting.
Release-impact catalog or issue metadata updated.
Release-note category: improvements.
Priority: high.
User-visible change: Prompt-budget behavior becomes more accurate for configured models while retaining safe fallback accounting.
Release-note bullet: Improves model-context budgeting with provider-calibrated token accounting and explicit fallback diagnostics.
Supported-from versions: next release after merge to dev.
Affected state stores: none.
User action required and remediation: none.
Internal-only rationale, if applicable: not internal-only.
Release-owner review evidence: issue Repo Context Audit: add provider-calibrated token accounting #1766 update-impact metadata.

Verification

Required:

Required GitHub checks pass before merge.
Local verification commands or rationale are listed below.
Reuse/extension/generalization evidence or gap rationale is listed below.

Local verification:

PATH="/Users/nikolaos.vasilopoulos/.cache/codex-runtimes/codex-primary-runtime/dependencies/node/bin:$PATH" .keiko-scripts/verify-receipt.sh 1766
- typecheck: pass
- check:version-consistency: pass
- lint: pass
- arch:check: pass
- arch:check:negative: pass
- check:qi-supply-chain: pass
- test: pass (913 files, 15,867 passed, 1 skipped)
- verify receipt written for 507f31fb

.keiko-scripts/audit-receipt.sh 1766 --findings 0 --user-facing false
- audit receipt written for 507f31fb

npm run check:context-quality
- calibrated-token-accounting=true

Reuse / gap rationale:

The implementation extends the existing ModelCapability, ContextProfile, context-budget allocator, chat prompt budgeting, conversation compaction, diagnostics, and evidence-redaction paths. It does not add a parallel tokenizer subsystem. Exact tokenizers remain out of scope until governed dependencies are approved; unsupported exact accounting is rejected rather than surfaced as evidence.

Select only what applies:

UI behavior manually verified or covered by tests.
CLI behavior verified with command output or tests.
Core logic covered by unit, integration, property, or fixture tests.
Security-sensitive change reviewed for trust boundaries, secrets, external calls, and generated artifacts.
Supply-chain or package-surface change verified with package, license, lockfile, SBOM, or npm dry-run checks.
Documentation or Markdown change verified by the repository link check or a targeted local equivalent.
Release-impacting change verified with npm run check:release-impact or an explicit rationale.
Not applicable items are explained below.

Not applicable rationale:

No UI, CLI command surface, Markdown documentation, dependency, lockfile, or persistent state migration is changed.
Release-impact metadata is already captured on issue Repo Context Audit: add provider-calibrated token accounting #1766; this PR implements that issue without adding a separate release catalog artifact.

Review And Closure

The PR implements only the linked issue scope.
Actionable review findings are fixed or explicitly dispositioned.
Unresolved review threads are resolved before merge.
Checks are repeated after the latest pushed fix.
Issue acceptance criteria and closure evidence are updated only where evidence exists.
The linked issue records reuse, extension, generalization, or new-gap rationale before closure.
Delivery board status is updated before requesting final maintainer review.
Use Resolves #1766 only when this PR should close the issue.

Risk Notes

Exact tokenizer support is intentionally not exposed because no governed tokenizer dependency was added; configured support is calibrated-only plus explicit fallback.
tokenAccounting.counterId is content-free metadata, is trimmed/validated at config boundaries, and is redacted from persisted evidence.
The local full receipt had to be run with bundled Node v24.14.0 because the machine default Node v25.9.0 is rejected by dependency-cruiser; no repo code was changed for this environment issue.

Niko4417 · 2026-07-02T22:46:36Z

Issue-audit findings — PR #1802 (Resolves #1766) · HEAD `507f31fb`

Read-first audit wave: pr-reviewer (correctness/regression) + security-auditor (config boundary, evidence redaction, supply chain). Findings verified against source before posting. CI is fully green at this commit (ci, ui, CodeQL, SBOM/smoke, cross-platform) and verify + audit receipts exist for 507f31fb.

Verdict: no merge blocker. One substantive diagnostics-attribution finding and a matching test-coverage gap worth addressing; the rest are minor. Feature is correct, deterministic, bounded, and well-tested against the acceptance criteria.

Acceptance-criteria check

✅ Configured models can use calibrated token accounting instead of the bytes/3.5 fallback (governed ModelCapability.tokenAccounting, allow-listed strict parsing).
✅ Conservative estimator remains and is explicitly reported (source: "fallback-estimated" default; diagnostics carry the source).
✅ Prompt assembly stays deterministic and bounded across chat/document/memory/tool-observation inputs; per-segment ceil+offset calibration preserves the ADR-0052 gate-3 single-currency identity end-to-end (allocator + gateway paths verified).
✅ No new dependency/tokenizer added — exact source is intentionally rejected until a governed tokenizer lands; supply-chain surface unchanged (no lockfile/manifest in the diff).

Note: the AC text mentions "exact or calibrated". Exact is explicitly deferred (documented in the PR + issue engineering notes). Calibrated + fallback satisfy the deliverable; flag it so the AC checkbox wording isn't read as "exact shipped".

Security — PASS (0 findings)

Every boundary this PR touches is closed: config parsing rejects unknown keys (own-enumerable Object.keys, __proto__ rejected), restricts source to calibrated, and enforces scaleMilli positive-int / offsetTokens non-negative-int / counterId non-empty-trimmed. counterId is content-free (traced to a label field only — never reaches readFile/import/fetch/new URL) and is redacted on both evidence paths (construction-time field redaction + whole-object deepRedactStrings). ConfigInvalidError scrubs echoed key names via the GatewayError redactor. No runtime dependency added.

Correctness / regression

Major — 1. Compacted-path history-lane token misattribution · packages/keiko-server/src/chat-prompt-budget-diagnostics.ts:114-127
estimateHistoryLaneTokens sums all history-message tokens and subtracts only the bare CONVERSATION_SYSTEM_PROMPT cost when messages[0].role === "system". On the compacted path messages[0].content is buildSystemScopedCompactionContent(systemMessage.content, summaryContent) (conversation-compaction.ts:141) — system prompt plus the compaction-summary block. So the summary block's tokens are attributed to the history-summary lane while system-contract stays pinned at the bare-prompt figure. The aggregate identity (Σ lanes === totalEstimatedTokens) is correct and budget enforcement is unaffected — this is a diagnostics-only per-lane attribution question. It is also a deliberate choice per the PR's own notes ("attribute system-scoped compaction summary tokens to the history-summary lane"), so it's a semantics debate rather than a defect: arguably the summary belongs to system-contract since it is physically embedded in the system message. Recommend deriving the system-lane cost from the actual embedded messages[0].content and history = total − thatActualSystemCost, which fixes attribution on both paths while keeping the aggregate identity. Not a blocker.

Major — 2. Test does not pin the compacted-path lane split · packages/keiko-server/src/chat-gateway-assembly.test.ts:263-293
The one compacted-path lane-diagnostics test asserts history > 0 and Σ lanes === total but never asserts the history-summary vs system-contract split — a single-line mutation swapping the attribution would pass. Add an assertion tying system-contract tokens to the real embedded messages[0].content. (Pairs with #1.)

Minor

chat-prompt-budget-diagnostics.ts is now 434 LOC, crossing the 400-LOC cap (was 387; this PR tips it over). Suggest extracting the token-summary helpers into a sibling file (mirrors the memory-validation.ts split precedent). context-engineering.ts (583) was already over-cap pre-PR and grew further.
context-engineering-validation.ts collectTokenAccountingNumbers accepts a source: "fallback-estimated" object that also carries scaleMilli/offsetTokens; countContextTokens silently ignores them. No current call site produces this, but a self-contradictory-yet-"valid" object is constructible — reject or document.
calibratedTokenCount (context-engineering.ts:487-491) Math.max(0, …) clamp is unreachable from validated input; add a one-line WHY (defends direct object-literal callers that bypass the validator).
redactTokenAccounting (new surface in context-assembly-redaction.ts) has no direct unit test (pre-existing zero-coverage file). A small test asserting counterId redacted / numeric fields pass through / undefined round-trips would close it.

Positives

Per-segment-then-sum calibration is the correct gate-3 design (avoids sum-then-scale divergence, verified end-to-end); the new evaluateCalibratedTokenAccounting context-quality gate independently recomputes per-lane totals and asserts calibrated ≠ fallback (mutation-robust); config helpers reuse existing primitives and the exactOptionalPropertyTypes conditional-spread pattern; assertWorkflowEligibleForKind de-dups two identical inline guards in scope.

Audit roles: pr-reviewer, security-auditor (a11y/design-system axis N/A — non-user-facing core-engine change). No confirmed merge blockers; findings above are follow-ups.

…counting (#1766) - Split chat-prompt-budget-diagnostics.ts under the 400-LOC cap by extracting the token-summary helpers into chat-prompt-budget-token-summary.ts. - Document the deliberate lane attribution: the compaction summary's tokens are charged to the history-summary lane (which owns the +1 summary item), while system-contract reports only the bare canonical system prompt. - Pin the compacted-path lane split with a mutation-robust test. - Reject stray scaleMilli/offsetTokens under source "fallback-estimated". - Document the defensive clamp in calibratedTokenCount; add redactTokenAccounting and validator-rejection test coverage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Niko4417 · 2026-07-02T23:08:48Z

Audit findings fixed — pushed `507f31fb..67510043`

Fix wave from the audit above. Two disjoint implementor slices, then a clean full-verify at the new HEAD.

Correctness / #1 + #2 (history-lane attribution + test) — On tracing the full lane-construction path, the original attribution is correct by design, not a defect: the history-summary lane deliberately owns the compaction summary — it counts the summary as its +1 included item (buildHistorySummaryLane) and charges its tokens there, so each lane's item-count stays consistent with its token-count and Σ lanes == totalEstimatedTokens. Flipping the tokens to system-contract (my initial suggestion) would have broken that item/token consistency. So the fix is to lock the design in, not change it:

Added a WHY comment documenting the deliberate attribution (summary → history-summary; system-contract reports only the bare canonical system prompt).
Added a mutation-robust compacted-path test pinning the split: system-contract == countContextTokens(CONVERSATION_SYSTEM_PROMPT, accounting) (bare prompt only), history-summary tokens include the summary block with includedItems == retained + 1, and the lane sum equals the total.

Minor #3 (400-LOC cap) — Extracted the token-summary helpers into a new sibling chat-prompt-budget-token-summary.ts. chat-prompt-budget-diagnostics.ts is now 302 LOC (was 434), new module 151 LOC — both under cap. Mechanical, behavior-identical extraction.

Minor #4 (validator) — context-engineering-validation.ts now rejects a source: "fallback-estimated" object carrying scaleMilli/offsetTokens (only meaningful for calibrated), reasons tokenAccounting.scaleMilli unexpected / tokenAccounting.offsetTokens unexpected. Test coverage added.

Minor #5 (WHY comment) — Documented the defensive Math.max(0, …) clamp in calibratedTokenCount (guards callers that build ContextTokenAccounting literals without the validator).

Minor #6 (redaction test) — Added a redactTokenAccounting test (via the persistConnectedContextEvidence entry point): counterId redacted, source/scaleMilli/offsetTokens pass through, tokenAccounting: undefined round-trips.

Verification at HEAD `67510043`

.keiko-scripts/verify-receipt.sh 1766 → GREEN (913 files, 15,871 passed, 1 skipped) — verify receipt written.
.keiko-scripts/audit-receipt.sh 1766 --findings 0 --user-facing false → written (all confirmed findings resolved).
Security posture unchanged (validator only tightened; no new dependency). No behavior change to prompt assembly, budgeting, or determinism.

Files: chat-prompt-budget-diagnostics.ts, chat-prompt-budget-token-summary.ts (new), chat-gateway-assembly.test.ts, context-engineering-validation.ts, context-engineering.ts, context-engineering.test.ts, connected-context-evidence.test.ts.

Niko4417 requested a review from oscharko as a code owner July 2, 2026 14:49

Niko4417 mentioned this pull request Jul 2, 2026

Repo Context Audit: add provider-calibrated token accounting #1766

Open

53 tasks

Niko4417 force-pushed the issue/1766-provider-calibrated-token-accounting branch from 276b6e8 to bb83465 Compare July 2, 2026 14:55

feat(context): add calibrated token accounting

507f31f

Niko4417 force-pushed the issue/1766-provider-calibrated-token-accounting branch from bb83465 to 507f31f Compare July 2, 2026 15:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add provider-calibrated token accounting#1802

Add provider-calibrated token accounting#1802
Niko4417 wants to merge 2 commits into
devfrom
issue/1766-provider-calibrated-token-accounting

Niko4417 commented Jul 2, 2026 •

edited

Loading

Uh oh!

Niko4417 commented Jul 2, 2026

Uh oh!

Niko4417 commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Niko4417 commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope

Reuse And No-Duplication

Delivery Board

Product Impact

Update Impact

Verification

Review And Closure

Risk Notes

Uh oh!

Niko4417 commented Jul 2, 2026

Issue-audit findings — PR #1802 (Resolves #1766) · HEAD 507f31fb

Acceptance-criteria check

Security — PASS (0 findings)

Correctness / regression

Positives

Uh oh!

Niko4417 commented Jul 2, 2026

Audit findings fixed — pushed 507f31fb..67510043

Verification at HEAD 67510043

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Niko4417 commented Jul 2, 2026 •

edited

Loading

Issue-audit findings — PR #1802 (Resolves #1766) · HEAD `507f31fb`

Audit findings fixed — pushed `507f31fb..67510043`

Verification at HEAD `67510043`