Skip to content

Store node/edge meta as JSON, promote hot keys to columns#107

Merged
zzet merged 7 commits into
mainfrom
feat/json-meta-storage
Jun 17, 2026
Merged

Store node/edge meta as JSON, promote hot keys to columns#107
zzet merged 7 commits into
mainfrom
feat/json-meta-storage

Conversation

@zzet

@zzet zzet commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Why

The daemon could pin a core at multi-hundred-percent CPU and climb to multi-GB RSS while seemingly idle. Live pprof traced it to the SQLite store's meta codec: encodeMeta/decodeMeta built a fresh gob encoder/decoder per blob, so gob recompiled its type-decode engine on every edge. Over a day of uptime that was ~16.6 TB of allocation, ~86% of it through scanEdgedecodeMeta, saturating the GC — and a whole-graph resolve walking the edges turned into an hours-long, lock-held grind.

What

Node/edge Meta is now stored as JSON instead of gob, plus supporting cleanups. Each change is its own commit.

  • Meta codec → JSON. encodeMeta is json.Marshal; decodeMeta routes the document through metaWire, a typed DTO that parses each known key as its exact Go type (int / int64 / float64 / *contracts.Shape / []string / []map[string]any), with a key-type table normalising the open tail and nested maps. The in-memory map a caller receives is byte-for-byte type-identical to what gob produced — JSON's float64/[]any widening never reaches a reader, so no reader changes. JSON needs no per-call engine compilation and no custom binary versioning.
  • Backward compatible. decodeMeta sniffs the leading byte ({ ⇒ JSON) and falls back to gob for existing on-disk rows, which migrate to JSON on their next write. No schema migration, no forced reindex.
  • Promote 4 hot node keys to columns. signature (the single hottest meta read), visibility, doc, external move into dedicated nullable columns — stripped from the JSON blob on write, restored into Meta on read (transparent to the in-memory model), so they become queryable and the common blob shrinks. A NULL column means "not set" so legacy rows keep their blob values; pre-existing databases gain the columns via ALTER on the next Open. Every node-shaped SELECT now resolves to one column-list constant so the projection and scan order can't drift.
  • Fetch only the edge kinds a pass needs. The dataflow materialisation and contract-edge reconcile scanned (and meta-decoded) the whole edge set, then filtered to two or three kinds; they now fetch those kinds through the edges_by_kind index. Behaviour unchanged.
  • Two pre-existing bugs the audit surfaced (independent of the codec): the stale_code inspection read last_authored as a string (it's a map) and gated on a never-written flag, so it surfaced nothing — now reads via the shared blame helper with a 365-day threshold; and contract route lookups read method/path at the node's top level when they live under nested contract_meta.

Testing

  • New round-trip tests assert every audited reader (numeric, slice, *contracts.Shape, nested-map, the integral-float case) survives a persist→reload cycle with its exact type, plus the gob-legacy fallback and the column ALTER migration.
  • store_sqlite (incl. the conformance suite) green under -race; mcp (2327) and indexer (578) suites green; go build ./..., go vet, golangci-lint, and the cmd/gortex wire-contract golden all clean. No graph.Node/Edge struct fields changed (promotion is storage-layer only), so the wire contract is unaffected.

Deploy note

The codec lives in the store; deploying it is a rebuild + reinstall. Existing stores keep working via the gob fallback and migrate lazily.

zzet added 7 commits June 17, 2026 13:28
Meta was gob-encoded with a fresh encoder/decoder constructed per blob,
so gob recompiled its type-decode engine on every edge — that dominated
cold-load CPU and allocation and could pin the daemon at multi-hundred
percent CPU while a whole-graph resolve walked the edges.

Encode meta as JSON and decode it through metaWire, a typed DTO whose
fields parse each known key as its exact Go type (int / int64 / float64 /
*contracts.Shape / []string / []map[string]any). The open tail and nested
maps are normalised with a small key-type table, so the in-memory map a
caller receives is type-identical to what gob produced and no reader
changes. JSON needs no per-call engine compilation and carries no custom
binary versioning.

Existing on-disk stores hold gob blobs; decodeMeta sniffs the leading
byte ('{' => JSON) and falls back to gob for legacy rows, which migrate
to JSON on their next write. No schema migration required.
runStaleCodeInspection asserted n.Meta["last_authored"].(string), but
blame writes last_authored as a nested map (commit / email / timestamp),
so the assertion always missed; it additionally gated on an is_stale
flag that nothing ever writes. The inspection surfaced nothing.

Read last_authored through the shared lastAuthoredFrom helper (blame
sidecar with node-meta fallback) and apply the same 365-day age
threshold analyze stale_code uses, so the inspection lists genuinely
stale functions/methods with their age and author.
routeMethodAndPath read method / path / service / topic / operation off
a contract node's top-level Meta, but the contract-to-node build nests
the contract's own Meta under Meta["contract_meta"] — the node top level
only holds type / role / symbol_id / line / confidence. Every route
lookup therefore returned empty.

Read the route fields from the nested contract_meta map, falling back to
the top level for any node that stamps them directly.
These four node meta keys are universal and hot-read (signature is the
single hottest meta read in the graph). Lift them into dedicated nullable
columns: stripped from the JSON meta blob on write and restored into Meta
on read, so the in-memory map is unchanged while the keys become
queryable and the common blob shrinks.

A NULL column means "not set", so a legacy row that still carries the
keys in its (gob) blob is left untouched; databases created before the
columns existed gain them via ALTER on the next Open. Every node-shaped
SELECT now resolves to a single column-list constant so the projection
and scanNode order can never drift apart again.
materializeDataflowParams and ReconcileContractEdges scanned the entire
edge set via AllEdges and filtered down to two or three kinds — decoding
every edge's meta along the way. On the sqlite backend that is a
full-table read plus a meta decode per edge on every resolve, when the
pass only ever touches arg_of/returns_to (dataflow) or
matches/produces_topic/consumes_topic (reconcile).

Fetch those kinds directly through the edges_by_kind index instead, so
only the relevant rows are read and only their meta is decoded. Behaviour
is unchanged — the same edges are processed.
errcheck: route rows.Close()/s.Close() through the package's
"_ = ...Close()" convention in ensureNodeColumns and its test.
The meta column now stores JSON, so update the in-code docs that still
called it gob-encoded (package doc, the constant_values / churn sidecar
rationale, and the analysis read paths). References to the separate
gob+gzip persistence snapshot and to legacy gob rows are intentionally
left untouched — those are accurate.
@zzet zzet merged commit ed483c7 into main Jun 17, 2026
10 checks passed
@zzet zzet deleted the feat/json-meta-storage branch June 17, 2026 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant