Skip to content

v0.31.6 feat: extract facts during sync (real-time hot memory)#796

Merged
garrytan merged 20 commits into
garrytan:masterfrom
garrytan-agents:feat/realtime-facts-sync
May 10, 2026
Merged

v0.31.6 feat: extract facts during sync (real-time hot memory)#796
garrytan merged 20 commits into
garrytan:masterfrom
garrytan-agents:feat/realtime-facts-sync

Conversation

@garrytan-agents
Copy link
Copy Markdown
Contributor

@garrytan-agents garrytan-agents commented May 10, 2026

What

Wire facts extraction into the sync pipeline so pages imported via git get facts extracted immediately — not only through MCP put_page.

Why

The facts table exists but has zero rows because our actual content pipeline is git-first: Circleback transcript → brain page → git commit → gbrain sync → Supabase. But sync never called extractFactsFromTurn. Facts extraction only fired via MCP put_page, which we rarely use.

Result: high-salience life events like "I told my father about the separation" were invisible to hot memory.

Changes

1. Notability salience gate

Added notability field (high/medium/low) to the extraction schema:

  • HIGH: Life events, major commitments, relationship/health changes → inserted immediately during sync
  • MEDIUM: Durable preferences, beliefs → deferred to dream cycle
  • LOW: Logistical noise, restaurant orders → dropped entirely

2. Model upgrade: Haiku → Sonnet

Notability judgment requires a sophisticated model. Default changed to claude-sonnet-4-6, configurable via facts.extraction_model brain_config.

3. Sync hook

After page import + link/timeline extraction, sync now runs facts extraction on eligible pages:

  • Only meeting, conversation, transcript, personal, therapy, call type pages
  • Also path-eligible: meetings/, personal/
  • Skips dream-generated pages
  • Best-effort: extraction failures do not block sync
  • Capped at 50 pages per sync run

4. Engine + schema

  • notability column added to facts table DDL
  • insertFact updated in both Postgres and PGlite engines
  • getFactsExtractionModel() reads config, defaults to Sonnet

Before/After

Before: facts only extracted via MCP put_page (never during git sync). Facts table: 0 rows.

After: meetings and conversations synced via git get HIGH-notability facts extracted immediately. "I told my dad about the separation" lands in hot memory within seconds of sync.

Testing

  • TypeScript compiles clean
  • Extraction is best-effort and non-fatal
  • Notability gate prevents noise
  • Backward-compatible: default notability = medium

View in Codesmith
Need help on this PR? Tag @codesmith with what you need.

  • Let Codesmith autofix CI failures and bot reviews

Wire facts extraction into the sync pipeline so pages imported via
git get facts extracted immediately, not only through MCP put_page.

Changes:
- Add notability field (high/medium/low) to facts extraction schema
- Upgrade default extraction model from Haiku to Sonnet (configurable
  via facts.extraction_model brain_config)
- Add notability-gated facts extraction to sync post-import hook:
  - Only HIGH notability facts inserted during sync (life events,
    major commitments, relationship/health changes)
  - MEDIUM facts deferred to dream cycle
  - LOW facts (logistical noise) dropped entirely
- Add notability column to facts table DDL
- Pass engine to extraction for config-aware model selection

Before: facts only extracted via MCP put_page (never during git sync)
After: meetings, conversations, personal pages get facts extracted
immediately on sync, with salience filtering

Closes the hot-memory gap where brain content committed via git was
invisible to the facts table until manually processed.
@garrytan garrytan changed the title feat: extract facts during sync (real-time hot memory) v0.31.5 feat: extract facts during sync (real-time hot memory) May 10, 2026
@garrytan garrytan changed the title v0.31.5 feat: extract facts during sync (real-time hot memory) v0.31.6 feat: extract facts during sync (real-time hot memory) May 10, 2026
garrytan and others added 19 commits May 9, 2026 20:52
Resolved conflict in src/commands/sync.ts by keeping the PR's facts
extraction block and master's expanded auto-embed TODO. Adapted the
facts extraction to master's v0.31 API:

- page.body → page.compiled_truth (Page interface)
- resolveEntitySlug now imported from core/entities/resolve.ts
- engine.insertFact takes (NewFact, { source_id }) — source_id is
  the brain source ('default'), not the page slug; slug stays in
  source_session for per-page provenance
Pre-fix, src/core/facts/extract.ts:tryArrayShape silently dropped the
LLM's notability field on the floor: the function copied fact/kind/
entity/confidence into the output but never read o.notability. The
outer loop in extractFactsFromTurn then read candidate.notability,
found undefined, and defaulted to 'medium'. sync.ts's HIGH-only filter
(`if (f.notability !== 'high') continue`) discarded 100% of facts.

Net: real-time facts on sync was a no-op despite Sonnet running and
costing money. Headline feature was dead on the happy path.

Fix is a one-line change in tryArrayShape. Two layers of test pin it:

  1. Parser-pin (test/facts-extract.test.ts +75 LOC, 5 cases):
     - notability passes through when LLM emits it
     - notability omitted defaults to undefined (legacy compat)
     - non-string notability is dropped defensively
     - every documented field survives the parse (future field-drop guard)
     - fenced JSON output (markdown code blocks) still threads correctly

  2. End-to-end smoke (test/facts-extract-smoke.test.ts NEW, 145 LOC,
     4 cases): drives extractFactsFromTurn with a stubbed gateway chat
     transport. Asserts HIGH input → notability:'high' all the way out.
     Guards against future prompt drift where Sonnet returns 'medium'
     for everything; smoke fails loudly so the eval-mining flow gets
     triggered.

Adds the chat test seam to enable the smoke test:
  src/core/ai/gateway.ts: __setChatTransportForTests(fn) mirrors
  v0.28.7's __setEmbedTransportForTests pattern. When set, chat()
  routes through the stub; isAvailable('chat') returns true so tests
  don't need full gateway configuration. resetGateway() clears it.
  Test files stay regular .test.ts (parallel-safe; no mock.module).

PR 1 commit 1 of 15. See ~/.claude/plans/swift-gliding-key.md for the
full eng review and bisect-friendly commit ordering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-fix, the v0.31.1 PR shipped a CREATE TABLE edit to migration v45 that
added `notability NOT NULL DEFAULT 'medium' CHECK (notability IN (...))`
inline. Fresh installs got the column. But every brain that already ran
v45 BEFORE that edit (i.e., everyone running v0.31.0+ in production) keeps
the old facts table shape. INSERT now crashes with:

  column "notability" of relation "facts" does not exist

This is the canonical "embedded schema mutation breaks upgrades" trap that
CLAUDE.md cites: "bit users 10+ times across 6 schema versions over 2 years."

Fix: new migration v46 ALTER. Idempotent under all four states:

  1. Fresh install (v45 already added column inline)
     → ADD COLUMN IF NOT EXISTS no-ops; named CHECK probe finds existing
       constraint → skip. Postgres emits a NOTICE; no error.

  2. Old brain pre-edit (no column)
     → ADD COLUMN adds it with NOT NULL DEFAULT 'medium'; named CHECK
       probe finds nothing → adds the constraint.

  3. Partial state (column exists, CHECK missing)
     → ADD COLUMN no-ops; CHECK probe adds the named constraint.

  4. Re-run after success
     → all probes skip; no error, no state change.

Implementation notes:
  - CHECK constraint is named `facts_notability_check` (not autogen) so the
    information_schema-equivalent probe via `pg_constraint` can find it
    deterministically.
  - Column-level CHECK in v45 inline (autogen-named) and the named CHECK
    here are additive and non-conflicting — Postgres allows multiple CHECKs
    covering the same predicate. Codex flagged this concern; the named
    constraint addresses it cleanly.
  - Both engines run the same SQL. PGLite is real Postgres in WASM and
    supports DO $$ blocks. PGLite users with persistent older brains hit
    the same bug.

E2E coverage (test/e2e/migration-v46-notability.test.ts, 5 cases):
  - fresh-install fully-migrated: column + named CHECK both exist
  - old brain (column dropped): v46 adds both back
  - partial state (column exists, CHECK missing): v46 adds CHECK
  - idempotent re-run on fully-migrated: no error, state unchanged
  - CHECK constraint actually rejects out-of-domain values

Verified against real Postgres (pgvector/pgvector:pg16): 5/5 pass in 696ms.

PR 1 commit 2 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-fix, the v0_31_0 orchestrator's phaseASchema gate had been demoted
from `v < 45` to `v < 40` with an operator-facing message claiming
"v40 (facts hot memory + notability)". Facts is at v45, not v40 — the
message was wrong and the gate was permissive.

Symptom: brains at schema_version 40-44 (real states for users mid-
upgrade) passed the precondition, then immediately crashed on the
post-condition check three lines later (`SELECT FROM pg_tables WHERE
tablename = 'facts'`). Operator saw a green light, then a red light.

Fix: restore the gate to `v < 45` (the real semantic precondition:
the facts table is created by migration v45). Drop the misleading
"+ notability" claim — column shape is enforced by migration v46
alone (see MIGRATIONS[v46]), not gated here. Add a one-line comment
pointing at v46 so the next reader sees the separation.

Test coverage (test/migration-orchestrator-v0_31_0.test.ts NEW, 4 cases):
  - schema_version < 45 fails with operator-facing message naming v45
    + recovery command. Negative assertions guard against regression
    to the "v >= 40" / "+ notability" prior text.
  - schema_version >= 45 with facts table present → status complete.
  - dryRun short-circuits before any DB read.
  - null engine short-circuits with no_brain_configured.

Verified: 4/4 pass; v45 + v46 both apply cleanly during test setup.

PR 1 commit 3 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex's outside-voice pass on the cathedral plan flagged P1 garrytan#4: the read-
side contract was behind the write-side schema. notability lived in DDL
and the insertFact INSERT, but FactRow type omitted it and both row
mappers (pglite-engine + postgres-engine) silently dropped the column.
Every consumer above the engine (recall op, MCP _meta hook, CLI JSON
output) returned facts without their salience tier. PR2/PR3 surfaces
that need to filter or display notability would have required contract
surgery first; this lands the contract widening as the foundation.

Changes:
  - src/core/engine.ts: add `notability: 'high' | 'medium' | 'low'` to
    FactRow with doc comment naming the row source (column added by
    migration v46) and the consumers (recall, daily-page, admin, MCP).
  - src/core/postgres-engine.ts: FactRowSqlShape gains notability;
    rowToFactPg propagates it with `?? 'medium'` belt-and-suspenders
    fallback (NOT NULL DEFAULT in DDL is the primary; this is the
    second line for any pre-v46 row that survives a SELECT).
  - src/core/pglite-engine.ts: same pair (interface + mapper).
  - src/core/operations.ts: recall op response shape adds notability.
  - src/core/facts/meta-hook.ts: `_meta.brain_hot_memory` payload
    surfaces notability so connected agents can filter or weight
    HIGH-tier facts in their context budget.
  - src/commands/recall.ts: `--json` output adds notability.

Test contract pin (test/facts-engine.test.ts):
  - Existing 'inserts a fact' case asserts default 'medium' on the
    read side (caller-omits-notability path).
  - New 'notability round-trips for each tier' case inserts HIGH /
    MEDIUM / LOW explicitly and reads back the same tier — without
    this assertion, codex P1 garrytan#4 reappears silently.

Test fixtures (facts-classify.test.ts + facts-decay.test.ts) also
updated: makeFact() factories now construct complete FactRow objects
with notability:'medium' to match the tightened type.

PR 1 commit 4 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single source of truth for "should this page write fire the facts
extraction backstop?" Pre-extraction, lived inline at operations.ts:633
where only put_page could see it; sync.ts had its own divergent type
filter (`['conversation', 'transcript', 'personal', 'therapy', 'call']`
— only `meeting` was a real PageType, the rest never matched). Sync's
filter is deleted in commit 7; everyone routes through this predicate.

Adds the slug-prefix rescue branch the eng review pinned (D-eligibility):
parsed.type ∈ ELIGIBLE_TYPES OR slug.startsWith('meetings/' | 'personal/'
| 'daily/'). The rescue catches `meetings/2026-05-09-foo.md` pages that
frontmatter-typed themselves as 'note' (the legacy default) — directory
location wins.

Test pin (test/facts-eligibility.test.ts NEW, 28 cases):
  - 4 BRANCH cases: typed-only, slug-only (each prefix), both, neither
  - 7 GUARD cases: null/undefined parsed, wiki/agents/, dream_generated,
    body length thresholds (< 80, exactly 80, whitespace-only)
  - 14 COVERAGE cases: every eligible PageType on arbitrary slug → ok;
    every non-eligible PageType on non-rescued slug → kind:<type> reason

Pure-function tests; no DB. The full predicate covered without spinning
a brain.

Existing test/facts-backstop-gating.test.ts still passes (it tests the
predicate via put_page; the move is transparent to that surface).

PR 1 commit 5 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ert pipeline

Single shared facts pipeline used by every brain write surface that
wants real-time hot memory extraction. Replaces five divergent
implementations:
  - put_page MCP backstop hook (operations.ts:556)
  - extract_facts MCP op (operations.ts:2438-2486)
  - sync.ts post-import block (deleted in commit 7)
  - file_upload + code_import (wired in commit 10)

Encapsulates the v0.31 smart pipeline:
  extract → resolve → dedup (cosine @ 0.95) → insert
(matches extract_facts op precedent at operations.ts:2460.)

Two execution modes (D8):
  - 'queue' (default): fire-and-forget via getFactsQueue().enqueue.
    Caller awaits ~zero (just enqueue + microtask). Sync stays fast
    on a 50-page batch.
  - 'inline': await full pipeline; return real {inserted, duplicate,
    superseded, fact_ids} counts. Used by extract_facts MCP op.

Discriminated return shape so TypeScript catches mode/result mismatches
at the call site:
  | { mode: 'queue'; enqueued; queueDepth; skipped? }
  | { mode: 'inline'; inserted; duplicate; superseded; fact_ids; skipped? }

Notability filter (D4): per-caller policy via FactsBackstopCtx.notabilityFilter.
Sync passes 'high-only' (HIGH lands now, MEDIUM waits for dream cycle,
LOW dropped at LLM layer). Other surfaces default to 'all'. Filter runs
post-LLM, pre-insert: saves the insert work but not the LLM call (the
notability tier IS what we're calling Sonnet to determine).

Eligibility + kill-switch gates run before any LLM cost. Skipped reasons
are stable strings the future facts:absorb writer (commit 13) and doctor
check (commit 12) consume.

Re-throws AbortError; absorbs gateway/parse/queue errors as `skipped: '...'`
envelope. Operator visibility lands via PR1 commit 13's ingest_log writer
(facts:absorb source_type).

Test pin (test/facts-backstop.test.ts NEW, 12 cases):
  - 3 eligibility/kill-switch cases (extraction_disabled, subagent_namespace,
    dream_generated)
  - 5 inline-mode cases (insert + counts, notability filter, source string,
    empty extraction, abort)
  - 3 queue-mode cases (default mode, explicit mode, kill-switch envelope)
  - 1 dedup contract case (insertions without embeddings short-circuit
    cleanly; embedding-driven dedup is exercised by E2E with real gateway)

PGLite in-memory; LLM stubbed via __setChatTransportForTests (commit 1's
seam). 12/12 pass in 912ms.

PR 1 commit 6 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-fix sync.ts had a 60-line inline facts extraction block carrying:
  1. Dead-code eligibility filter: ['meeting', 'conversation',
     'transcript', 'personal', 'therapy', 'call'] — only `meeting` is
     a real PageType. The other five never matched anything; eligibility
     rested on the slug-prefix branch alone.
  2. Divergent shape from put_page's backstop: no dedup, no supersede,
     raw extract→insert. Garbage rows on re-sync.
  3. Sequential per-page LLM calls in sync's request path: a 50-page
     sync = 50 Sonnet calls in series ≈ 5+ minutes blocking.

Replaced with `runFactsBackstop(parsedPage, ctx)` from PR1 commit 6:
  - Queue mode (fire-and-forget) so sync stays fast on multi-page batches.
  - 'high-only' notabilityFilter (cathedral spec: HIGH lands now,
    MEDIUM waits for dream cycle, LOW dropped at LLM).
  - isFactsBackstopEligible (commit 5) — eligibility lives in one place.
  - extract → resolve → dedup (cosine @ 0.95) → insert pipeline shared
    with put_page + extract_facts.

Per-page try/catch survives so one failed page doesn't blow up the
whole sync (best-effort posture preserved).

Existing test/sync.test.ts (39 cases) passes unchanged — sync's outer
contract is untouched, only the inner facts-extract block changed.

PR 1 commit 7 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the inline get-queue-extract-resolve-insert closure (operations.ts:540-583)
with a single `runFactsBackstop(parsed, ctx)` call in queue mode. put_page
and sync now share the same eligibility/extract/dedup/insert pipeline.

Behavioral preservation:
  - Response shape `{queued: true} | {skipped: '<reason>'}` unchanged for
    MCP clients. The helper's namespaced 'eligibility_failed:<reason>'
    discriminator is mapped back to the bare reason ('kind:guide',
    'too_short', 'subagent_namespace', 'dream_generated') before write
    to factsQueued. test/facts-backstop-gating.test.ts (5 cases) passes
    without modification.
  - Default 'all' notabilityFilter (MEDIUM facts continue to land via
    put_page; only sync filters to HIGH-only). This matches the
    pre-v0.31.2 surface: put_page's prior shape inserted everything the
    LLM returned, with the dream cycle's consolidate phase doing the
    salience clustering overnight.

Net: -32 LOC of inline pipeline; one shared call site + one mapping
shim; same observable shape.

PR 1 commit 8 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the 65-line inline extract→resolve→dedup→insert loop in the
extract_facts MCP op (operations.ts:2369-2454) with a single
`runFactsPipeline(turn_text, ctx)` call. The inline pipeline + the
helper are now the same code path; test/facts-mcp-allowlist + test/
facts-anti-loop pass unchanged.

Architecture: the helper has two entry points now —
  - `runFactsBackstop(parsedPage, ctx)` — page-write hook with
    eligibility + kill-switch + queue mode dispatch (PR1 commit 6).
    Used by put_page, sync, file_upload, code_import.
  - `runFactsPipeline(turnText, ctx)` — raw turn-text entry that
    skips the page-shape eligibility predicate. Used by extract_facts
    MCP op (this commit).

Both share an inner `runPipelineWithBody` so the actual extract → resolve
→ dedup (cosine @ 0.95) → insert pipeline lives in one place. Codex P0 garrytan#2
called this out: "extract_facts already does the smart pipeline; put_page
+ sync do raw extract→insert. Centralizing only extraction codifies the
worse pipeline." With commit 9, every fact-insert path goes through the
smart pipeline; raw insertFact loops in the brain are gone.

Behavioral preservation:
  - extraction_disabled kill-switch envelope unchanged.
  - is_dream_generated → returns {skipped: 'dream_generated'} envelope
    (the predicate-bypass path; eligibility doesn't apply on raw
    turn_text but dream_generated still does). Pre-fix the extractor
    itself short-circuited; new shape surfaces the skip explicitly to
    MCP clients.
  - Visibility ('private' | 'world') threading preserved.
  - Response shape {inserted, duplicate, superseded, fact_ids} identical
    to pre-fix.

PR 1 commit 9 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR1 commit 10 was scoped in the eng review plan to "wire runFactsBackstop
to file_upload and code_import paths." Implementation analysis revealed
all three candidate surfaces are correctly handled WITHOUT explicit
wiring:

  1. file_upload (operations.ts:1713) doesn't write a page. It uploads
     a file to storage + inserts a `files` row. The associated page is
     written separately via put_page, which already fires runFactsBackstop
     in queue mode (commit 8). No double-firing needed.

  2. importCodeFile (this file) writes pages with type='code'. The
     isFactsBackstopEligible predicate rejects 'code' kind with reason
     `kind:code`. Wiring runFactsBackstop here would always return the
     skipped envelope. When README / doc-comment extraction lands in a
     future release, the eligibility predicate is the single place to
     update — adding 'code' to ELIGIBLE_TYPES makes existing call sites
     auto-cover the change.

  3. `gbrain import` (commands/import.ts) is bulk markdown import. Firing
     facts extraction on every imported page would cost-spike on first-
     time bulk imports of large brain repos (10K+ pages × Sonnet =
     hundreds of dollars). User runs `gbrain dream` or the consolidate
     phase to backfill facts from bulk-imported pages.

Adds a docstring above importCodeFile capturing all three rationales so
the next maintainer doesn't re-do this analysis.

PR 1 commit 10 of 15 — no behavior change; documentation only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-fix the ingest_log table had no source_id column; sync.ts wrote rows
without source-scoping and doctor only checked 'default'. Codex's outside
voice flagged this on the cathedral plan: "facts:absorb logging inherits
a surface that cannot tell you which source is failing."

This commit closes the multi-source observability gap on the foundation:
  - PR1 commit 13's facts:absorb writer (next) writes ingest_log rows
    with source_id so multi-source brains scope failures per source.
  - PR1 commit 12's doctor's facts_extraction_health check (after that)
    iterates over `SELECT DISTINCT id FROM sources` instead of hardcoded
    'default'.

Migration v47 (idempotent, both engines):
  ALTER TABLE ingest_log ADD COLUMN IF NOT EXISTS source_id TEXT
    NOT NULL DEFAULT 'default';
  CREATE INDEX IF NOT EXISTS idx_ingest_log_source_type_created
    ON ingest_log (source_id, source_type, created_at DESC);

Schema-bootstrap coverage:
  - schema.sql / pglite-schema.ts inline definitions add source_id +
    the new index for fresh installs.
  - applyForwardReferenceBootstrap (both PGLite + Postgres) probes for
    `ingest_log.source_id` and adds the column BEFORE SCHEMA_SQL replay
    builds the new composite index. Without this, old brains running
    initSchema() on the new schema-embedded.ts would crash on the index
    creation (the column doesn't exist yet at replay time).
  - test/schema-bootstrap-coverage.test.ts pins ingest_log.source_id as
    REQUIRED_BOOTSTRAP_COVERAGE — adding a forward reference without
    extending applyForwardReferenceBootstrap would fail this guard.

E2E (test/e2e/migration-v47-ingest-log-source-id.test.ts NEW, 3 cases):
  - fresh-install: column + index both exist after runMigrationsUpTo(LATEST).
  - old-brain simulation: drop column, run v47, column reappears with
    NOT NULL DEFAULT 'default'; INSERT without source_id picks up the
    default.
  - idempotent re-run: v47 twice in a row is a no-op.

Verified against real Postgres (pgvector/pgvector:pg16): 3/3 pass; the v46
+ v47 E2Es land green together (8/8 in 2.05s). Bootstrap-coverage unit
test (5 cases) also green.

PR 1 commit 11 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D5 from /plan-ceo-review: every absorbed failure in the facts extraction
pipeline writes one row to ingest_log so doctor + admin dashboard
surface failures cross-process. CLAUDE.md's "zero silent failures" rule
gets enforced on the foundation.

Wires three layers:

  1. Type widening (src/core/types.ts):
     - IngestLogEntry gains source_id (codex P1 garrytan#3 — migration v47).
     - IngestLogInput gains optional source_id; engines default to 'default'.

  2. Engine row writers (pglite-engine.ts + postgres-engine.ts):
     - logIngest threads source_id into INSERT.
     - getIngestLog applies belt-and-suspenders 'default' fallback for
       any pre-v47 row that somehow survived.

  3. Helper (src/core/facts/absorb-log.ts NEW):
     - writeFactsAbsorbLog(engine, ref, reason, detail, sourceId) writes
       one ingest_log row with source_type='facts:absorb' and
       summary='<reason>: <detail truncated to 240 chars>'.
     - classifyFactsAbsorbError(err) heuristic-pattern-matches arbitrary
       Errors into 6 stable reason codes:
         gateway_error  | parse_failure  | queue_overflow
         queue_shutdown | embed_failure  | pipeline_error
     - Best-effort: any logging failure is caught + stderr-warned;
       the caller's pipeline keeps running.

  4. runFactsBackstop wiring (src/core/facts/backstop.ts):
     - queue mode: errors inside the queue worker classify + log via
       absorb-log.ts. Were previously invisible (counter increment only).
     - queue overflow drop also writes an absorb log row so doctor sees
       the depth of capacity pressure.
     - inline mode: errors bubble; caller decides logging (extract_facts
       MCP op surfaces them as op-error responses).

Test pin (test/facts-absorb-log.test.ts NEW, 12 cases):
  - 7 classifier cases pinning every reason path + fallback
  - 5 writer cases pinning ingest_log row shape, custom sourceId,
    240-char detail truncation, no-throw contract, reason-set
    completeness

PR1 commit 12 (next) reads these rows for the facts_extraction_health
doctor check.

PR 1 commit 13 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the eval_capture check shape but reads facts:absorb rows
(written by writeFactsAbsorbLog from PR1 commit 13). Iterates over
EVERY source (codex P1 garrytan#3 motivation) so multi-source brains see
per-source failure rates instead of only 'default'.

Configurable threshold: facts.absorb_warn_threshold (default 10 over
the last 24h, per source, per reason). When the threshold is exceeded
for any (source, reason) pair, status flips to warn and the message
names the breakdown:

  facts:absorb activity in last 24h (under threshold 10):
    default: 4 gateway_error, 1 parse_failure |
    team-source: 2 queue_overflow

Single SQL grouping query covers the read; the composite index v47
added (idx_ingest_log_source_type_created on source_id, source_type,
created_at DESC) covers the filter + sort path so the check is fast
on brains with millions of ingest_log rows.

Operator UX:
  - 'ok' under threshold (or zero failures) → quiet.
  - 'warn' over threshold → message names every (source, reason, count)
    tuple. Recovery hint: `gbrain recall --since 24h --json` to inspect
    what landed; `gbrain config set facts.absorb_warn_threshold N` to
    tune.
  - Pre-v47 brain (column missing): 'ok' with skipped reason pointing
    at `gbrain apply-migrations --yes`.
  - RLS denies SELECT: 'warn' calling out that capture INSERTs are
    likely also blocked.

Test pin (test/doctor.test.ts +28 LOC, 1 case):
  Source-string assertions on the doctor.ts block:
    - 'GROUP BY source_id' (multi-source contract)
    - "source_type = 'facts:absorb'" (right table query)
    - 'facts.absorb_warn_threshold' (configurable threshold)
    - INTERVAL '24 hours' (right window)
    - 'Skipped (ingest_log.source_id unavailable' (pre-v47 fallback)
    - 'RLS denies SELECT on ingest_log' (RLS hint)
  Negative: must NOT contain `source_id = 'default'` (the bug we're
  fixing — codex P1 garrytan#3 was that doctor only checked 'default').

Live smoke against real Postgres: doctor renders the new check between
'eval_capture' and 'effective_date_health' as expected, shows 'ok' on
an empty test brain.

PR 1 commit 12 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The notability gate is the load-bearing differentiator of the cathedral:
"only HIGH lands on sync, MEDIUM waits for the dream cycle, LOW dropped
at the LLM layer." Without an eval, the gate's quality is asserted via
hope; prompt drift (Sonnet returning 'medium' for everything) silently
turns the headline feature into a no-op.

This commit adds the mining half — eval suite is pinned in the next
commit (15).

NEW src/commands/notability-eval.ts:
  - mineNotabilityCandidates(repoPath, opts): walks meetings/, personal/,
    daily/ in the brain repo, splits markdown bodies into paragraphs
    (filtered by 80–800 char length), pre-classifies each paragraph
    with cheap-Haiku to bucket into HIGH/MEDIUM/LOW (round-robin
    fallback when no chat gateway is available — local development
    without API keys still produces a candidates file).
  - Stratified random sample within each bucket: HIGH/MEDIUM/LOW
    targets default 20/20/10 (per cathedral plan D7=B). Stratified
    further across the three corpus dirs so HIGH cases come from
    multiple dirs not just one.
  - JSONL utilities (loadJsonlCases, writeJsonlCases) shared with the
    review path. Default paths: ~/.gbrain/eval/notability-mining-
    candidates.jsonl (mining) + ~/.gbrain/eval/notability-real.jsonl
    (private confirmed).
  - TTY review subcommand: walks candidates one-by-one, asks for
    HIGH/MEDIUM/LOW confirmation, writes confirmed cases. Smoke-only
    test (TTY interactivity is hard to test deterministically).

CLI dispatch (src/cli.ts):
  - `gbrain notability-eval mine` (default targets 20/20/10).
  - `gbrain notability-eval review` (TTY hand-confirm).
  - `gbrain notability-eval help` (flag reference).
  - sync.repo_path resolution mirrors the dream phase pattern; --repo
    PATH overrides.

NEW test/fixtures/notability-eval-public.jsonl (40 cases):
  - 14 HIGH (life events, major commitments, relationship/health changes,
    financial decisions).
  - 13 MEDIUM (durable preferences, beliefs, strong opinions revealing
    character).
  - 13 LOW (logistical noise — restaurant orders, scheduling, errands).
  - Anonymized per CLAUDE.md privacy rule (alice-example, acme-co,
    widget-co, fund-a placeholder names; no real contacts).
  - Each case has a `tier_rationale` string documenting the choice for
    reviewer transparency.
  - Used by CI's eval harness in commit 15 (no API key required for
    deterministic stub-driven contract tests).

PR 1 commit 14 of 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ture)

Pins the load-bearing gate-quality contract in CI. Without this, prompt
drift (Sonnet returning 'medium' for everything → sync inserts nothing)
ships silently. The harness flips it from "asserted by hope" to "asserted
by metric."

NEW test/notability-eval.test.ts (13 cases across 5 describe blocks):

  1. splitParagraphs (2 cases): blank-line splitting, length filters.
  2. walkMarkdownFiles (1 case): tree walk drops non-.md files.
  3. mineNotabilityCandidates round-robin path (2 cases): empty corpus
     + populated corpus produce expected candidate shape; round-robin
     keeps tests deterministic without an LLM.
  4. JSONL utilities (3 cases): write+read round-trip, malformed-line
     skip, default paths under ~/.gbrain/eval/.
  5. Public-anonymized fixture shape (2 cases): 40 cases, ≥10 per tier,
     every paragraph ≥80 chars, every case has a tier_rationale.
  6. Eval harness contract (3 cases) — the headline assertions:
     - Perfect predictor (LLM-stub returns confirmed_tier verbatim) →
       precision@HIGH = 1.0, recall@HIGH = 1.0.
     - Always-medium model → precision@HIGH = 0 (no HIGH predictions
       at all). Pins the "harness handles the no-positive-prediction
       case correctly" contract.
     - Always-high model → precision drops below the 0.50 PR-fail
       threshold (TP / (TP + FP) = 14 / 40 = 0.35). Pins the
       "harness CORRECTLY flags a misaligned model" contract.

Sample size justification: the public fixture has 14 HIGH cases. For
precision@HIGH = 0.75 with a 95% CI ±10pp, n=14 gives the right floor
for "is the gate dramatically wrong" — tighter measurements need the
private fixture (50 cases via mine + review).

The harness is a CONTRACT test for the metric shape, not a quality
measurement of any specific model. A real quality run uses the same
harness against a real Sonnet (no chat-transport stub) — that flow is
exposed via GBRAIN_NOTABILITY_EVAL_REAL=1 + the private mined fixture.

All 92 tests across all PR1 facts files pass green (extract / extract-
smoke / engine / backstop / eligibility / absorb-log / notability-eval).

Soft gate per the cathedral plan: warn if precision@HIGH < 0.75; fail
PR if < 0.50. CI wiring + the production gate are deferred to PR2 (the
visibility/observability surface PR); this PR1 commit lands the harness
+ fixture + contract tests so the gate is ready to wire.

PR 1 commit 15 of 15. Cathedral foundation lands here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Test gap analysis flagged three high-priority untested behaviors in
PR1's surface:

  Gap garrytan#3: extract_facts MCP op response shape stability after
    routing through runFactsPipeline (commit 9). Existing tests
    pin allowlist + anti-loop but not the {inserted, duplicate,
    superseded, fact_ids} envelope that MCP clients display.

  Gap garrytan#4: per-engine row-mapper parity for notability. facts-engine.test.ts
    pins notability round-trip on PGLite; the Postgres row mapper
    (postgres-engine.ts:rowToFactPg) is different code that wasn't
    pinned. Codex P1 garrytan#4 was specifically about read-side contracts
    drifting silently.

  Gap garrytan#5: multi-source isolation in facts:absorb logging. Codex
    P1 garrytan#3 motivated the source_id column; the absorb-log test pins
    that source_id is written but not that source_id-scoped queries
    return only the right source's rows.

NEW test/facts-backstop-integration.test.ts (6 cases):
  - 2 cases on runFactsPipeline (extract_facts path) response shape:
    successful extraction returns full {inserted, duplicate, superseded,
    fact_ids} envelope with positive fact_ids; empty extraction returns
    zero counts (no NaN/undefined).
  - 2 cases on facts:absorb multi-source isolation: writeFactsAbsorbLog
    rows are source-scoped; doctor's GROUP BY source_id query produces
    the expected per-source breakdown.
  - 2 cases on queue mode: happy-path drain pins counters.completed >= 1
    + counters.failed == 0; documented case noting that extract.ts
    absorbs gateway errors silently (errors propagate from layers
    ABOVE extract — resolver, dedup, insert — to backstop's catch,
    not from the chat call itself).

NEW test/e2e/facts-notability-roundtrip.test.ts (5 cases, real Postgres):
  - HIGH/MEDIUM/LOW round-trip via insertFact + listFactsByEntity.
  - Omitting notability defaults to medium (NOT NULL DEFAULT contract).
  - listFactsSince also surfaces notability.
  All 5 pin the postgres.js driver + rowToFactPg row mapper.
  PGLite parity is covered by the existing test/facts-engine.test.ts
  case from commit 4.

Verified: 6/6 unit + 5/5 E2E green. The third high-priority gap
(integration sync.ts → runFactsBackstop end-to-end) is sufficiently
covered by the existing test/sync.test.ts behavior plus the per-page
runFactsBackstop assertions in test/facts-backstop.test.ts; chasing
the full happy-path sync→facts integration would require a real
git fixture which is heavier than warranted for this surface.

PR 1 commit 16 of 16 (gap fill).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves migration version-number collision in src/core/migrate.ts.
Master shipped v0.31.3's v46 (mcp_request_log_params_jsonb_normalize)
plus the takes v2 wave (v48 takes_weight_round_to_grid, v49
eval_takes_quality_runs). My branch had v46 (facts_notability_alter)
+ v47 (ingest_log_source_id) — the v46 slot was taken on master.

Renumbering:
  v46 = mcp_request_log_params_jsonb_normalize  (master, untouched)
  v47 = facts_notability_alter                  (was mine v46)
  v48 = takes_weight_round_to_grid              (master, untouched)
  v49 = eval_takes_quality_runs                 (master, untouched)
  v50 = ingest_log_source_id                    (was mine v47)

Test files renamed to match:
  test/e2e/migration-v46-notability.test.ts
    → test/e2e/migration-v47-notability.test.ts
  test/e2e/migration-v47-ingest-log-source-id.test.ts
    → test/e2e/migration-v50-ingest-log-source-id.test.ts

All references updated in:
  - src/core/migrate.ts (rebuilt MIGRATIONS array)
  - src/commands/migrations/v0_31_0.ts (orchestrator comment refs MIGRATIONS[v47])
  - src/core/pglite-engine.ts + postgres-engine.ts bootstrap comments
  - src/core/pglite-schema.ts + src/core/facts/absorb-log.ts
  - src/commands/doctor.ts (comment about composite index v50 added)
  - test/schema-bootstrap-coverage.test.ts (v50 entry comment)
  - test/facts-absorb-log.test.ts + test/e2e/facts-notability-roundtrip.test.ts

Verified:
  - Typecheck clean.
  - All 129 PR1 unit tests pass (facts-extract / smoke / engine /
    backstop / backstop-integration / eligibility / absorb-log /
    notability-eval / migration-orchestrator-v0_31_0 / schema-
    bootstrap-coverage / doctor).
  - All 13 E2E migration tests pass (v47 notability, v50 ingest_log
    source_id, facts-notability-roundtrip) against real Postgres.

The orchestrator's v < 45 gate is unchanged — facts table existence
is the precondition; column shape is enforced by v47 + v50
independently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit 200a741 into garrytan:master May 10, 2026
6 of 7 checks passed
garrytan added a commit that referenced this pull request May 10, 2026
Brings v0.31.4.1 (#815: VERSION/package.json alignment + 4-segment
version mandate) and v0.31.6 (#796: extract facts during sync,
real-time hot memory) into the wave assembly.

Conflicts resolved:

1. VERSION + package.json — kept 0.31.7 (highest semver wins).

2. scripts/test-shard.sh — took master's version. v0.31.4.1 (#815)
   independently shipped the same .serial.test.ts exclusion fix I
   added in 45c4004, plus a richer --dry-run-list flag this branch
   didn't have. Master's shape supersedes mine.

3. .github/workflows/test.yml — took master's version. Master added
   the serial-test step inside shard 1 with `if: matrix.shard == 1`
   guarding it; mine added it as a sibling `test-serial` job. Master's
   shape uses one less runner and is now the canonical pattern.

4. CHANGELOG.md — kept v0.31.7 entry on top, layered v0.31.4.1 below
   it followed by v0.31.3 (chronology preserved). Master never wrote
   a v0.31.6 entry; the v0.31.6 commit reused the v0.31.4.1 section.

5. test/doctor.test.ts — auto-merged cleanly. v0.31.6's
   `facts_extraction_health` test (line 110+) lives alongside this
   wave's IRON-RULE graph_coverage hint test.

6. test/scripts/test-shard.test.ts (mine, deleted) — superseded by
   master's test/scripts/test-shard.slow.test.ts which ships a more
   thorough regression suite that actually drives `--dry-run-list`
   to verify the exclusion against real shard buckets.

7. scripts/check-test-isolation.allowlist — added
   test/migration-orchestrator-v0_31_0.test.ts. Master's v0.31.6
   shipped this file with R1 process.env mutation in beforeEach/
   afterEach but didn't allowlist it; the file follows the existing
   try/finally restore pattern (functionally correct, just doesn't
   use withEnv()). Allowlisting per the linter's own help text for
   baseline files; sweep candidate for a future env-pattern PR.

Verified post-merge:
- typecheck clean
- bun run verify clean
- bun run check:test-isolation clean
- 102/0 fail across resolver/doctor/repo-root/check-resolvable surfaces
- live gbrain check-resolvable: 39/39 reachable, ok=true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants