v0.31.6 feat: extract facts during sync (real-time hot memory)#796
Merged
garrytan merged 20 commits intoMay 10, 2026
Conversation
Wire facts extraction into the sync pipeline so pages imported via
git get facts extracted immediately, not only through MCP put_page.
Changes:
- Add notability field (high/medium/low) to facts extraction schema
- Upgrade default extraction model from Haiku to Sonnet (configurable
via facts.extraction_model brain_config)
- Add notability-gated facts extraction to sync post-import hook:
- Only HIGH notability facts inserted during sync (life events,
major commitments, relationship/health changes)
- MEDIUM facts deferred to dream cycle
- LOW facts (logistical noise) dropped entirely
- Add notability column to facts table DDL
- Pass engine to extraction for config-aware model selection
Before: facts only extracted via MCP put_page (never during git sync)
After: meetings, conversations, personal pages get facts extracted
immediately on sync, with salience filtering
Closes the hot-memory gap where brain content committed via git was
invisible to the facts table until manually processed.
Resolved conflict in src/commands/sync.ts by keeping the PR's facts
extraction block and master's expanded auto-embed TODO. Adapted the
facts extraction to master's v0.31 API:
- page.body → page.compiled_truth (Page interface)
- resolveEntitySlug now imported from core/entities/resolve.ts
- engine.insertFact takes (NewFact, { source_id }) — source_id is
the brain source ('default'), not the page slug; slug stays in
source_session for per-page provenance
Pre-fix, src/core/facts/extract.ts:tryArrayShape silently dropped the
LLM's notability field on the floor: the function copied fact/kind/
entity/confidence into the output but never read o.notability. The
outer loop in extractFactsFromTurn then read candidate.notability,
found undefined, and defaulted to 'medium'. sync.ts's HIGH-only filter
(`if (f.notability !== 'high') continue`) discarded 100% of facts.
Net: real-time facts on sync was a no-op despite Sonnet running and
costing money. Headline feature was dead on the happy path.
Fix is a one-line change in tryArrayShape. Two layers of test pin it:
1. Parser-pin (test/facts-extract.test.ts +75 LOC, 5 cases):
- notability passes through when LLM emits it
- notability omitted defaults to undefined (legacy compat)
- non-string notability is dropped defensively
- every documented field survives the parse (future field-drop guard)
- fenced JSON output (markdown code blocks) still threads correctly
2. End-to-end smoke (test/facts-extract-smoke.test.ts NEW, 145 LOC,
4 cases): drives extractFactsFromTurn with a stubbed gateway chat
transport. Asserts HIGH input → notability:'high' all the way out.
Guards against future prompt drift where Sonnet returns 'medium'
for everything; smoke fails loudly so the eval-mining flow gets
triggered.
Adds the chat test seam to enable the smoke test:
src/core/ai/gateway.ts: __setChatTransportForTests(fn) mirrors
v0.28.7's __setEmbedTransportForTests pattern. When set, chat()
routes through the stub; isAvailable('chat') returns true so tests
don't need full gateway configuration. resetGateway() clears it.
Test files stay regular .test.ts (parallel-safe; no mock.module).
PR 1 commit 1 of 15. See ~/.claude/plans/swift-gliding-key.md for the
full eng review and bisect-friendly commit ordering.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-fix, the v0.31.1 PR shipped a CREATE TABLE edit to migration v45 that
added `notability NOT NULL DEFAULT 'medium' CHECK (notability IN (...))`
inline. Fresh installs got the column. But every brain that already ran
v45 BEFORE that edit (i.e., everyone running v0.31.0+ in production) keeps
the old facts table shape. INSERT now crashes with:
column "notability" of relation "facts" does not exist
This is the canonical "embedded schema mutation breaks upgrades" trap that
CLAUDE.md cites: "bit users 10+ times across 6 schema versions over 2 years."
Fix: new migration v46 ALTER. Idempotent under all four states:
1. Fresh install (v45 already added column inline)
→ ADD COLUMN IF NOT EXISTS no-ops; named CHECK probe finds existing
constraint → skip. Postgres emits a NOTICE; no error.
2. Old brain pre-edit (no column)
→ ADD COLUMN adds it with NOT NULL DEFAULT 'medium'; named CHECK
probe finds nothing → adds the constraint.
3. Partial state (column exists, CHECK missing)
→ ADD COLUMN no-ops; CHECK probe adds the named constraint.
4. Re-run after success
→ all probes skip; no error, no state change.
Implementation notes:
- CHECK constraint is named `facts_notability_check` (not autogen) so the
information_schema-equivalent probe via `pg_constraint` can find it
deterministically.
- Column-level CHECK in v45 inline (autogen-named) and the named CHECK
here are additive and non-conflicting — Postgres allows multiple CHECKs
covering the same predicate. Codex flagged this concern; the named
constraint addresses it cleanly.
- Both engines run the same SQL. PGLite is real Postgres in WASM and
supports DO $$ blocks. PGLite users with persistent older brains hit
the same bug.
E2E coverage (test/e2e/migration-v46-notability.test.ts, 5 cases):
- fresh-install fully-migrated: column + named CHECK both exist
- old brain (column dropped): v46 adds both back
- partial state (column exists, CHECK missing): v46 adds CHECK
- idempotent re-run on fully-migrated: no error, state unchanged
- CHECK constraint actually rejects out-of-domain values
Verified against real Postgres (pgvector/pgvector:pg16): 5/5 pass in 696ms.
PR 1 commit 2 of 15.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-fix, the v0_31_0 orchestrator's phaseASchema gate had been demoted
from `v < 45` to `v < 40` with an operator-facing message claiming
"v40 (facts hot memory + notability)". Facts is at v45, not v40 — the
message was wrong and the gate was permissive.
Symptom: brains at schema_version 40-44 (real states for users mid-
upgrade) passed the precondition, then immediately crashed on the
post-condition check three lines later (`SELECT FROM pg_tables WHERE
tablename = 'facts'`). Operator saw a green light, then a red light.
Fix: restore the gate to `v < 45` (the real semantic precondition:
the facts table is created by migration v45). Drop the misleading
"+ notability" claim — column shape is enforced by migration v46
alone (see MIGRATIONS[v46]), not gated here. Add a one-line comment
pointing at v46 so the next reader sees the separation.
Test coverage (test/migration-orchestrator-v0_31_0.test.ts NEW, 4 cases):
- schema_version < 45 fails with operator-facing message naming v45
+ recovery command. Negative assertions guard against regression
to the "v >= 40" / "+ notability" prior text.
- schema_version >= 45 with facts table present → status complete.
- dryRun short-circuits before any DB read.
- null engine short-circuits with no_brain_configured.
Verified: 4/4 pass; v45 + v46 both apply cleanly during test setup.
PR 1 commit 3 of 15.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex's outside-voice pass on the cathedral plan flagged P1 garrytan#4: the read- side contract was behind the write-side schema. notability lived in DDL and the insertFact INSERT, but FactRow type omitted it and both row mappers (pglite-engine + postgres-engine) silently dropped the column. Every consumer above the engine (recall op, MCP _meta hook, CLI JSON output) returned facts without their salience tier. PR2/PR3 surfaces that need to filter or display notability would have required contract surgery first; this lands the contract widening as the foundation. Changes: - src/core/engine.ts: add `notability: 'high' | 'medium' | 'low'` to FactRow with doc comment naming the row source (column added by migration v46) and the consumers (recall, daily-page, admin, MCP). - src/core/postgres-engine.ts: FactRowSqlShape gains notability; rowToFactPg propagates it with `?? 'medium'` belt-and-suspenders fallback (NOT NULL DEFAULT in DDL is the primary; this is the second line for any pre-v46 row that survives a SELECT). - src/core/pglite-engine.ts: same pair (interface + mapper). - src/core/operations.ts: recall op response shape adds notability. - src/core/facts/meta-hook.ts: `_meta.brain_hot_memory` payload surfaces notability so connected agents can filter or weight HIGH-tier facts in their context budget. - src/commands/recall.ts: `--json` output adds notability. Test contract pin (test/facts-engine.test.ts): - Existing 'inserts a fact' case asserts default 'medium' on the read side (caller-omits-notability path). - New 'notability round-trips for each tier' case inserts HIGH / MEDIUM / LOW explicitly and reads back the same tier — without this assertion, codex P1 garrytan#4 reappears silently. Test fixtures (facts-classify.test.ts + facts-decay.test.ts) also updated: makeFact() factories now construct complete FactRow objects with notability:'medium' to match the tightened type. PR 1 commit 4 of 15. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single source of truth for "should this page write fire the facts
extraction backstop?" Pre-extraction, lived inline at operations.ts:633
where only put_page could see it; sync.ts had its own divergent type
filter (`['conversation', 'transcript', 'personal', 'therapy', 'call']`
— only `meeting` was a real PageType, the rest never matched). Sync's
filter is deleted in commit 7; everyone routes through this predicate.
Adds the slug-prefix rescue branch the eng review pinned (D-eligibility):
parsed.type ∈ ELIGIBLE_TYPES OR slug.startsWith('meetings/' | 'personal/'
| 'daily/'). The rescue catches `meetings/2026-05-09-foo.md` pages that
frontmatter-typed themselves as 'note' (the legacy default) — directory
location wins.
Test pin (test/facts-eligibility.test.ts NEW, 28 cases):
- 4 BRANCH cases: typed-only, slug-only (each prefix), both, neither
- 7 GUARD cases: null/undefined parsed, wiki/agents/, dream_generated,
body length thresholds (< 80, exactly 80, whitespace-only)
- 14 COVERAGE cases: every eligible PageType on arbitrary slug → ok;
every non-eligible PageType on non-rescued slug → kind:<type> reason
Pure-function tests; no DB. The full predicate covered without spinning
a brain.
Existing test/facts-backstop-gating.test.ts still passes (it tests the
predicate via put_page; the move is transparent to that surface).
PR 1 commit 5 of 15.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ert pipeline
Single shared facts pipeline used by every brain write surface that
wants real-time hot memory extraction. Replaces five divergent
implementations:
- put_page MCP backstop hook (operations.ts:556)
- extract_facts MCP op (operations.ts:2438-2486)
- sync.ts post-import block (deleted in commit 7)
- file_upload + code_import (wired in commit 10)
Encapsulates the v0.31 smart pipeline:
extract → resolve → dedup (cosine @ 0.95) → insert
(matches extract_facts op precedent at operations.ts:2460.)
Two execution modes (D8):
- 'queue' (default): fire-and-forget via getFactsQueue().enqueue.
Caller awaits ~zero (just enqueue + microtask). Sync stays fast
on a 50-page batch.
- 'inline': await full pipeline; return real {inserted, duplicate,
superseded, fact_ids} counts. Used by extract_facts MCP op.
Discriminated return shape so TypeScript catches mode/result mismatches
at the call site:
| { mode: 'queue'; enqueued; queueDepth; skipped? }
| { mode: 'inline'; inserted; duplicate; superseded; fact_ids; skipped? }
Notability filter (D4): per-caller policy via FactsBackstopCtx.notabilityFilter.
Sync passes 'high-only' (HIGH lands now, MEDIUM waits for dream cycle,
LOW dropped at LLM layer). Other surfaces default to 'all'. Filter runs
post-LLM, pre-insert: saves the insert work but not the LLM call (the
notability tier IS what we're calling Sonnet to determine).
Eligibility + kill-switch gates run before any LLM cost. Skipped reasons
are stable strings the future facts:absorb writer (commit 13) and doctor
check (commit 12) consume.
Re-throws AbortError; absorbs gateway/parse/queue errors as `skipped: '...'`
envelope. Operator visibility lands via PR1 commit 13's ingest_log writer
(facts:absorb source_type).
Test pin (test/facts-backstop.test.ts NEW, 12 cases):
- 3 eligibility/kill-switch cases (extraction_disabled, subagent_namespace,
dream_generated)
- 5 inline-mode cases (insert + counts, notability filter, source string,
empty extraction, abort)
- 3 queue-mode cases (default mode, explicit mode, kill-switch envelope)
- 1 dedup contract case (insertions without embeddings short-circuit
cleanly; embedding-driven dedup is exercised by E2E with real gateway)
PGLite in-memory; LLM stubbed via __setChatTransportForTests (commit 1's
seam). 12/12 pass in 912ms.
PR 1 commit 6 of 15.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-fix sync.ts had a 60-line inline facts extraction block carrying:
1. Dead-code eligibility filter: ['meeting', 'conversation',
'transcript', 'personal', 'therapy', 'call'] — only `meeting` is
a real PageType. The other five never matched anything; eligibility
rested on the slug-prefix branch alone.
2. Divergent shape from put_page's backstop: no dedup, no supersede,
raw extract→insert. Garbage rows on re-sync.
3. Sequential per-page LLM calls in sync's request path: a 50-page
sync = 50 Sonnet calls in series ≈ 5+ minutes blocking.
Replaced with `runFactsBackstop(parsedPage, ctx)` from PR1 commit 6:
- Queue mode (fire-and-forget) so sync stays fast on multi-page batches.
- 'high-only' notabilityFilter (cathedral spec: HIGH lands now,
MEDIUM waits for dream cycle, LOW dropped at LLM).
- isFactsBackstopEligible (commit 5) — eligibility lives in one place.
- extract → resolve → dedup (cosine @ 0.95) → insert pipeline shared
with put_page + extract_facts.
Per-page try/catch survives so one failed page doesn't blow up the
whole sync (best-effort posture preserved).
Existing test/sync.test.ts (39 cases) passes unchanged — sync's outer
contract is untouched, only the inner facts-extract block changed.
PR 1 commit 7 of 15.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the inline get-queue-extract-resolve-insert closure (operations.ts:540-583)
with a single `runFactsBackstop(parsed, ctx)` call in queue mode. put_page
and sync now share the same eligibility/extract/dedup/insert pipeline.
Behavioral preservation:
- Response shape `{queued: true} | {skipped: '<reason>'}` unchanged for
MCP clients. The helper's namespaced 'eligibility_failed:<reason>'
discriminator is mapped back to the bare reason ('kind:guide',
'too_short', 'subagent_namespace', 'dream_generated') before write
to factsQueued. test/facts-backstop-gating.test.ts (5 cases) passes
without modification.
- Default 'all' notabilityFilter (MEDIUM facts continue to land via
put_page; only sync filters to HIGH-only). This matches the
pre-v0.31.2 surface: put_page's prior shape inserted everything the
LLM returned, with the dream cycle's consolidate phase doing the
salience clustering overnight.
Net: -32 LOC of inline pipeline; one shared call site + one mapping
shim; same observable shape.
PR 1 commit 8 of 15.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the 65-line inline extract→resolve→dedup→insert loop in the
extract_facts MCP op (operations.ts:2369-2454) with a single
`runFactsPipeline(turn_text, ctx)` call. The inline pipeline + the
helper are now the same code path; test/facts-mcp-allowlist + test/
facts-anti-loop pass unchanged.
Architecture: the helper has two entry points now —
- `runFactsBackstop(parsedPage, ctx)` — page-write hook with
eligibility + kill-switch + queue mode dispatch (PR1 commit 6).
Used by put_page, sync, file_upload, code_import.
- `runFactsPipeline(turnText, ctx)` — raw turn-text entry that
skips the page-shape eligibility predicate. Used by extract_facts
MCP op (this commit).
Both share an inner `runPipelineWithBody` so the actual extract → resolve
→ dedup (cosine @ 0.95) → insert pipeline lives in one place. Codex P0 garrytan#2
called this out: "extract_facts already does the smart pipeline; put_page
+ sync do raw extract→insert. Centralizing only extraction codifies the
worse pipeline." With commit 9, every fact-insert path goes through the
smart pipeline; raw insertFact loops in the brain are gone.
Behavioral preservation:
- extraction_disabled kill-switch envelope unchanged.
- is_dream_generated → returns {skipped: 'dream_generated'} envelope
(the predicate-bypass path; eligibility doesn't apply on raw
turn_text but dream_generated still does). Pre-fix the extractor
itself short-circuited; new shape surfaces the skip explicitly to
MCP clients.
- Visibility ('private' | 'world') threading preserved.
- Response shape {inserted, duplicate, superseded, fact_ids} identical
to pre-fix.
PR 1 commit 9 of 15.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR1 commit 10 was scoped in the eng review plan to "wire runFactsBackstop
to file_upload and code_import paths." Implementation analysis revealed
all three candidate surfaces are correctly handled WITHOUT explicit
wiring:
1. file_upload (operations.ts:1713) doesn't write a page. It uploads
a file to storage + inserts a `files` row. The associated page is
written separately via put_page, which already fires runFactsBackstop
in queue mode (commit 8). No double-firing needed.
2. importCodeFile (this file) writes pages with type='code'. The
isFactsBackstopEligible predicate rejects 'code' kind with reason
`kind:code`. Wiring runFactsBackstop here would always return the
skipped envelope. When README / doc-comment extraction lands in a
future release, the eligibility predicate is the single place to
update — adding 'code' to ELIGIBLE_TYPES makes existing call sites
auto-cover the change.
3. `gbrain import` (commands/import.ts) is bulk markdown import. Firing
facts extraction on every imported page would cost-spike on first-
time bulk imports of large brain repos (10K+ pages × Sonnet =
hundreds of dollars). User runs `gbrain dream` or the consolidate
phase to backfill facts from bulk-imported pages.
Adds a docstring above importCodeFile capturing all three rationales so
the next maintainer doesn't re-do this analysis.
PR 1 commit 10 of 15 — no behavior change; documentation only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-fix the ingest_log table had no source_id column; sync.ts wrote rows
without source-scoping and doctor only checked 'default'. Codex's outside
voice flagged this on the cathedral plan: "facts:absorb logging inherits
a surface that cannot tell you which source is failing."
This commit closes the multi-source observability gap on the foundation:
- PR1 commit 13's facts:absorb writer (next) writes ingest_log rows
with source_id so multi-source brains scope failures per source.
- PR1 commit 12's doctor's facts_extraction_health check (after that)
iterates over `SELECT DISTINCT id FROM sources` instead of hardcoded
'default'.
Migration v47 (idempotent, both engines):
ALTER TABLE ingest_log ADD COLUMN IF NOT EXISTS source_id TEXT
NOT NULL DEFAULT 'default';
CREATE INDEX IF NOT EXISTS idx_ingest_log_source_type_created
ON ingest_log (source_id, source_type, created_at DESC);
Schema-bootstrap coverage:
- schema.sql / pglite-schema.ts inline definitions add source_id +
the new index for fresh installs.
- applyForwardReferenceBootstrap (both PGLite + Postgres) probes for
`ingest_log.source_id` and adds the column BEFORE SCHEMA_SQL replay
builds the new composite index. Without this, old brains running
initSchema() on the new schema-embedded.ts would crash on the index
creation (the column doesn't exist yet at replay time).
- test/schema-bootstrap-coverage.test.ts pins ingest_log.source_id as
REQUIRED_BOOTSTRAP_COVERAGE — adding a forward reference without
extending applyForwardReferenceBootstrap would fail this guard.
E2E (test/e2e/migration-v47-ingest-log-source-id.test.ts NEW, 3 cases):
- fresh-install: column + index both exist after runMigrationsUpTo(LATEST).
- old-brain simulation: drop column, run v47, column reappears with
NOT NULL DEFAULT 'default'; INSERT without source_id picks up the
default.
- idempotent re-run: v47 twice in a row is a no-op.
Verified against real Postgres (pgvector/pgvector:pg16): 3/3 pass; the v46
+ v47 E2Es land green together (8/8 in 2.05s). Bootstrap-coverage unit
test (5 cases) also green.
PR 1 commit 11 of 15.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D5 from /plan-ceo-review: every absorbed failure in the facts extraction
pipeline writes one row to ingest_log so doctor + admin dashboard
surface failures cross-process. CLAUDE.md's "zero silent failures" rule
gets enforced on the foundation.
Wires three layers:
1. Type widening (src/core/types.ts):
- IngestLogEntry gains source_id (codex P1 garrytan#3 — migration v47).
- IngestLogInput gains optional source_id; engines default to 'default'.
2. Engine row writers (pglite-engine.ts + postgres-engine.ts):
- logIngest threads source_id into INSERT.
- getIngestLog applies belt-and-suspenders 'default' fallback for
any pre-v47 row that somehow survived.
3. Helper (src/core/facts/absorb-log.ts NEW):
- writeFactsAbsorbLog(engine, ref, reason, detail, sourceId) writes
one ingest_log row with source_type='facts:absorb' and
summary='<reason>: <detail truncated to 240 chars>'.
- classifyFactsAbsorbError(err) heuristic-pattern-matches arbitrary
Errors into 6 stable reason codes:
gateway_error | parse_failure | queue_overflow
queue_shutdown | embed_failure | pipeline_error
- Best-effort: any logging failure is caught + stderr-warned;
the caller's pipeline keeps running.
4. runFactsBackstop wiring (src/core/facts/backstop.ts):
- queue mode: errors inside the queue worker classify + log via
absorb-log.ts. Were previously invisible (counter increment only).
- queue overflow drop also writes an absorb log row so doctor sees
the depth of capacity pressure.
- inline mode: errors bubble; caller decides logging (extract_facts
MCP op surfaces them as op-error responses).
Test pin (test/facts-absorb-log.test.ts NEW, 12 cases):
- 7 classifier cases pinning every reason path + fallback
- 5 writer cases pinning ingest_log row shape, custom sourceId,
240-char detail truncation, no-throw contract, reason-set
completeness
PR1 commit 12 (next) reads these rows for the facts_extraction_health
doctor check.
PR 1 commit 13 of 15.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the eval_capture check shape but reads facts:absorb rows (written by writeFactsAbsorbLog from PR1 commit 13). Iterates over EVERY source (codex P1 garrytan#3 motivation) so multi-source brains see per-source failure rates instead of only 'default'. Configurable threshold: facts.absorb_warn_threshold (default 10 over the last 24h, per source, per reason). When the threshold is exceeded for any (source, reason) pair, status flips to warn and the message names the breakdown: facts:absorb activity in last 24h (under threshold 10): default: 4 gateway_error, 1 parse_failure | team-source: 2 queue_overflow Single SQL grouping query covers the read; the composite index v47 added (idx_ingest_log_source_type_created on source_id, source_type, created_at DESC) covers the filter + sort path so the check is fast on brains with millions of ingest_log rows. Operator UX: - 'ok' under threshold (or zero failures) → quiet. - 'warn' over threshold → message names every (source, reason, count) tuple. Recovery hint: `gbrain recall --since 24h --json` to inspect what landed; `gbrain config set facts.absorb_warn_threshold N` to tune. - Pre-v47 brain (column missing): 'ok' with skipped reason pointing at `gbrain apply-migrations --yes`. - RLS denies SELECT: 'warn' calling out that capture INSERTs are likely also blocked. Test pin (test/doctor.test.ts +28 LOC, 1 case): Source-string assertions on the doctor.ts block: - 'GROUP BY source_id' (multi-source contract) - "source_type = 'facts:absorb'" (right table query) - 'facts.absorb_warn_threshold' (configurable threshold) - INTERVAL '24 hours' (right window) - 'Skipped (ingest_log.source_id unavailable' (pre-v47 fallback) - 'RLS denies SELECT on ingest_log' (RLS hint) Negative: must NOT contain `source_id = 'default'` (the bug we're fixing — codex P1 garrytan#3 was that doctor only checked 'default'). Live smoke against real Postgres: doctor renders the new check between 'eval_capture' and 'effective_date_health' as expected, shows 'ok' on an empty test brain. PR 1 commit 12 of 15. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The notability gate is the load-bearing differentiator of the cathedral:
"only HIGH lands on sync, MEDIUM waits for the dream cycle, LOW dropped
at the LLM layer." Without an eval, the gate's quality is asserted via
hope; prompt drift (Sonnet returning 'medium' for everything) silently
turns the headline feature into a no-op.
This commit adds the mining half — eval suite is pinned in the next
commit (15).
NEW src/commands/notability-eval.ts:
- mineNotabilityCandidates(repoPath, opts): walks meetings/, personal/,
daily/ in the brain repo, splits markdown bodies into paragraphs
(filtered by 80–800 char length), pre-classifies each paragraph
with cheap-Haiku to bucket into HIGH/MEDIUM/LOW (round-robin
fallback when no chat gateway is available — local development
without API keys still produces a candidates file).
- Stratified random sample within each bucket: HIGH/MEDIUM/LOW
targets default 20/20/10 (per cathedral plan D7=B). Stratified
further across the three corpus dirs so HIGH cases come from
multiple dirs not just one.
- JSONL utilities (loadJsonlCases, writeJsonlCases) shared with the
review path. Default paths: ~/.gbrain/eval/notability-mining-
candidates.jsonl (mining) + ~/.gbrain/eval/notability-real.jsonl
(private confirmed).
- TTY review subcommand: walks candidates one-by-one, asks for
HIGH/MEDIUM/LOW confirmation, writes confirmed cases. Smoke-only
test (TTY interactivity is hard to test deterministically).
CLI dispatch (src/cli.ts):
- `gbrain notability-eval mine` (default targets 20/20/10).
- `gbrain notability-eval review` (TTY hand-confirm).
- `gbrain notability-eval help` (flag reference).
- sync.repo_path resolution mirrors the dream phase pattern; --repo
PATH overrides.
NEW test/fixtures/notability-eval-public.jsonl (40 cases):
- 14 HIGH (life events, major commitments, relationship/health changes,
financial decisions).
- 13 MEDIUM (durable preferences, beliefs, strong opinions revealing
character).
- 13 LOW (logistical noise — restaurant orders, scheduling, errands).
- Anonymized per CLAUDE.md privacy rule (alice-example, acme-co,
widget-co, fund-a placeholder names; no real contacts).
- Each case has a `tier_rationale` string documenting the choice for
reviewer transparency.
- Used by CI's eval harness in commit 15 (no API key required for
deterministic stub-driven contract tests).
PR 1 commit 14 of 15.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ture)
Pins the load-bearing gate-quality contract in CI. Without this, prompt
drift (Sonnet returning 'medium' for everything → sync inserts nothing)
ships silently. The harness flips it from "asserted by hope" to "asserted
by metric."
NEW test/notability-eval.test.ts (13 cases across 5 describe blocks):
1. splitParagraphs (2 cases): blank-line splitting, length filters.
2. walkMarkdownFiles (1 case): tree walk drops non-.md files.
3. mineNotabilityCandidates round-robin path (2 cases): empty corpus
+ populated corpus produce expected candidate shape; round-robin
keeps tests deterministic without an LLM.
4. JSONL utilities (3 cases): write+read round-trip, malformed-line
skip, default paths under ~/.gbrain/eval/.
5. Public-anonymized fixture shape (2 cases): 40 cases, ≥10 per tier,
every paragraph ≥80 chars, every case has a tier_rationale.
6. Eval harness contract (3 cases) — the headline assertions:
- Perfect predictor (LLM-stub returns confirmed_tier verbatim) →
precision@HIGH = 1.0, recall@HIGH = 1.0.
- Always-medium model → precision@HIGH = 0 (no HIGH predictions
at all). Pins the "harness handles the no-positive-prediction
case correctly" contract.
- Always-high model → precision drops below the 0.50 PR-fail
threshold (TP / (TP + FP) = 14 / 40 = 0.35). Pins the
"harness CORRECTLY flags a misaligned model" contract.
Sample size justification: the public fixture has 14 HIGH cases. For
precision@HIGH = 0.75 with a 95% CI ±10pp, n=14 gives the right floor
for "is the gate dramatically wrong" — tighter measurements need the
private fixture (50 cases via mine + review).
The harness is a CONTRACT test for the metric shape, not a quality
measurement of any specific model. A real quality run uses the same
harness against a real Sonnet (no chat-transport stub) — that flow is
exposed via GBRAIN_NOTABILITY_EVAL_REAL=1 + the private mined fixture.
All 92 tests across all PR1 facts files pass green (extract / extract-
smoke / engine / backstop / eligibility / absorb-log / notability-eval).
Soft gate per the cathedral plan: warn if precision@HIGH < 0.75; fail
PR if < 0.50. CI wiring + the production gate are deferred to PR2 (the
visibility/observability surface PR); this PR1 commit lands the harness
+ fixture + contract tests so the gate is ready to wire.
PR 1 commit 15 of 15. Cathedral foundation lands here.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Test gap analysis flagged three high-priority untested behaviors in PR1's surface: Gap garrytan#3: extract_facts MCP op response shape stability after routing through runFactsPipeline (commit 9). Existing tests pin allowlist + anti-loop but not the {inserted, duplicate, superseded, fact_ids} envelope that MCP clients display. Gap garrytan#4: per-engine row-mapper parity for notability. facts-engine.test.ts pins notability round-trip on PGLite; the Postgres row mapper (postgres-engine.ts:rowToFactPg) is different code that wasn't pinned. Codex P1 garrytan#4 was specifically about read-side contracts drifting silently. Gap garrytan#5: multi-source isolation in facts:absorb logging. Codex P1 garrytan#3 motivated the source_id column; the absorb-log test pins that source_id is written but not that source_id-scoped queries return only the right source's rows. NEW test/facts-backstop-integration.test.ts (6 cases): - 2 cases on runFactsPipeline (extract_facts path) response shape: successful extraction returns full {inserted, duplicate, superseded, fact_ids} envelope with positive fact_ids; empty extraction returns zero counts (no NaN/undefined). - 2 cases on facts:absorb multi-source isolation: writeFactsAbsorbLog rows are source-scoped; doctor's GROUP BY source_id query produces the expected per-source breakdown. - 2 cases on queue mode: happy-path drain pins counters.completed >= 1 + counters.failed == 0; documented case noting that extract.ts absorbs gateway errors silently (errors propagate from layers ABOVE extract — resolver, dedup, insert — to backstop's catch, not from the chat call itself). NEW test/e2e/facts-notability-roundtrip.test.ts (5 cases, real Postgres): - HIGH/MEDIUM/LOW round-trip via insertFact + listFactsByEntity. - Omitting notability defaults to medium (NOT NULL DEFAULT contract). - listFactsSince also surfaces notability. All 5 pin the postgres.js driver + rowToFactPg row mapper. PGLite parity is covered by the existing test/facts-engine.test.ts case from commit 4. Verified: 6/6 unit + 5/5 E2E green. The third high-priority gap (integration sync.ts → runFactsBackstop end-to-end) is sufficiently covered by the existing test/sync.test.ts behavior plus the per-page runFactsBackstop assertions in test/facts-backstop.test.ts; chasing the full happy-path sync→facts integration would require a real git fixture which is heavier than warranted for this surface. PR 1 commit 16 of 16 (gap fill). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves migration version-number collision in src/core/migrate.ts.
Master shipped v0.31.3's v46 (mcp_request_log_params_jsonb_normalize)
plus the takes v2 wave (v48 takes_weight_round_to_grid, v49
eval_takes_quality_runs). My branch had v46 (facts_notability_alter)
+ v47 (ingest_log_source_id) — the v46 slot was taken on master.
Renumbering:
v46 = mcp_request_log_params_jsonb_normalize (master, untouched)
v47 = facts_notability_alter (was mine v46)
v48 = takes_weight_round_to_grid (master, untouched)
v49 = eval_takes_quality_runs (master, untouched)
v50 = ingest_log_source_id (was mine v47)
Test files renamed to match:
test/e2e/migration-v46-notability.test.ts
→ test/e2e/migration-v47-notability.test.ts
test/e2e/migration-v47-ingest-log-source-id.test.ts
→ test/e2e/migration-v50-ingest-log-source-id.test.ts
All references updated in:
- src/core/migrate.ts (rebuilt MIGRATIONS array)
- src/commands/migrations/v0_31_0.ts (orchestrator comment refs MIGRATIONS[v47])
- src/core/pglite-engine.ts + postgres-engine.ts bootstrap comments
- src/core/pglite-schema.ts + src/core/facts/absorb-log.ts
- src/commands/doctor.ts (comment about composite index v50 added)
- test/schema-bootstrap-coverage.test.ts (v50 entry comment)
- test/facts-absorb-log.test.ts + test/e2e/facts-notability-roundtrip.test.ts
Verified:
- Typecheck clean.
- All 129 PR1 unit tests pass (facts-extract / smoke / engine /
backstop / backstop-integration / eligibility / absorb-log /
notability-eval / migration-orchestrator-v0_31_0 / schema-
bootstrap-coverage / doctor).
- All 13 E2E migration tests pass (v47 notability, v50 ingest_log
source_id, facts-notability-roundtrip) against real Postgres.
The orchestrator's v < 45 gate is unchanged — facts table existence
is the precondition; column shape is enforced by v47 + v50
independently.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
garrytan
added a commit
that referenced
this pull request
May 10, 2026
Brings v0.31.4.1 (#815: VERSION/package.json alignment + 4-segment version mandate) and v0.31.6 (#796: extract facts during sync, real-time hot memory) into the wave assembly. Conflicts resolved: 1. VERSION + package.json — kept 0.31.7 (highest semver wins). 2. scripts/test-shard.sh — took master's version. v0.31.4.1 (#815) independently shipped the same .serial.test.ts exclusion fix I added in 45c4004, plus a richer --dry-run-list flag this branch didn't have. Master's shape supersedes mine. 3. .github/workflows/test.yml — took master's version. Master added the serial-test step inside shard 1 with `if: matrix.shard == 1` guarding it; mine added it as a sibling `test-serial` job. Master's shape uses one less runner and is now the canonical pattern. 4. CHANGELOG.md — kept v0.31.7 entry on top, layered v0.31.4.1 below it followed by v0.31.3 (chronology preserved). Master never wrote a v0.31.6 entry; the v0.31.6 commit reused the v0.31.4.1 section. 5. test/doctor.test.ts — auto-merged cleanly. v0.31.6's `facts_extraction_health` test (line 110+) lives alongside this wave's IRON-RULE graph_coverage hint test. 6. test/scripts/test-shard.test.ts (mine, deleted) — superseded by master's test/scripts/test-shard.slow.test.ts which ships a more thorough regression suite that actually drives `--dry-run-list` to verify the exclusion against real shard buckets. 7. scripts/check-test-isolation.allowlist — added test/migration-orchestrator-v0_31_0.test.ts. Master's v0.31.6 shipped this file with R1 process.env mutation in beforeEach/ afterEach but didn't allowlist it; the file follows the existing try/finally restore pattern (functionally correct, just doesn't use withEnv()). Allowlisting per the linter's own help text for baseline files; sweep candidate for a future env-pattern PR. Verified post-merge: - typecheck clean - bun run verify clean - bun run check:test-isolation clean - 102/0 fail across resolver/doctor/repo-root/check-resolvable surfaces - live gbrain check-resolvable: 39/39 reachable, ok=true
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Wire facts extraction into the sync pipeline so pages imported via git get facts extracted immediately — not only through MCP
put_page.Why
The facts table exists but has zero rows because our actual content pipeline is git-first: Circleback transcript → brain page → git commit →
gbrain sync→ Supabase. But sync never calledextractFactsFromTurn. Facts extraction only fired via MCPput_page, which we rarely use.Result: high-salience life events like "I told my father about the separation" were invisible to hot memory.
Changes
1. Notability salience gate
Added
notabilityfield (high/medium/low) to the extraction schema:2. Model upgrade: Haiku → Sonnet
Notability judgment requires a sophisticated model. Default changed to
claude-sonnet-4-6, configurable viafacts.extraction_modelbrain_config.3. Sync hook
After page import + link/timeline extraction, sync now runs facts extraction on eligible pages:
meeting,conversation,transcript,personal,therapy,calltype pagesmeetings/,personal/4. Engine + schema
notabilitycolumn added to facts table DDLinsertFactupdated in both Postgres and PGlite enginesgetFactsExtractionModel()reads config, defaults to SonnetBefore/After
Before: facts only extracted via MCP
put_page(never during git sync). Facts table: 0 rows.After: meetings and conversations synced via git get HIGH-notability facts extracted immediately. "I told my dad about the separation" lands in hot memory within seconds of sync.
Testing
Need help on this PR? Tag
@codesmithwith what you need.