Store node/edge meta as JSON, promote hot keys to columns#107
Merged
Conversation
Meta was gob-encoded with a fresh encoder/decoder constructed per blob,
so gob recompiled its type-decode engine on every edge — that dominated
cold-load CPU and allocation and could pin the daemon at multi-hundred
percent CPU while a whole-graph resolve walked the edges.
Encode meta as JSON and decode it through metaWire, a typed DTO whose
fields parse each known key as its exact Go type (int / int64 / float64 /
*contracts.Shape / []string / []map[string]any). The open tail and nested
maps are normalised with a small key-type table, so the in-memory map a
caller receives is type-identical to what gob produced and no reader
changes. JSON needs no per-call engine compilation and carries no custom
binary versioning.
Existing on-disk stores hold gob blobs; decodeMeta sniffs the leading
byte ('{' => JSON) and falls back to gob for legacy rows, which migrate
to JSON on their next write. No schema migration required.
runStaleCodeInspection asserted n.Meta["last_authored"].(string), but blame writes last_authored as a nested map (commit / email / timestamp), so the assertion always missed; it additionally gated on an is_stale flag that nothing ever writes. The inspection surfaced nothing. Read last_authored through the shared lastAuthoredFrom helper (blame sidecar with node-meta fallback) and apply the same 365-day age threshold analyze stale_code uses, so the inspection lists genuinely stale functions/methods with their age and author.
routeMethodAndPath read method / path / service / topic / operation off a contract node's top-level Meta, but the contract-to-node build nests the contract's own Meta under Meta["contract_meta"] — the node top level only holds type / role / symbol_id / line / confidence. Every route lookup therefore returned empty. Read the route fields from the nested contract_meta map, falling back to the top level for any node that stamps them directly.
These four node meta keys are universal and hot-read (signature is the single hottest meta read in the graph). Lift them into dedicated nullable columns: stripped from the JSON meta blob on write and restored into Meta on read, so the in-memory map is unchanged while the keys become queryable and the common blob shrinks. A NULL column means "not set", so a legacy row that still carries the keys in its (gob) blob is left untouched; databases created before the columns existed gain them via ALTER on the next Open. Every node-shaped SELECT now resolves to a single column-list constant so the projection and scanNode order can never drift apart again.
materializeDataflowParams and ReconcileContractEdges scanned the entire edge set via AllEdges and filtered down to two or three kinds — decoding every edge's meta along the way. On the sqlite backend that is a full-table read plus a meta decode per edge on every resolve, when the pass only ever touches arg_of/returns_to (dataflow) or matches/produces_topic/consumes_topic (reconcile). Fetch those kinds directly through the edges_by_kind index instead, so only the relevant rows are read and only their meta is decoded. Behaviour is unchanged — the same edges are processed.
errcheck: route rows.Close()/s.Close() through the package's "_ = ...Close()" convention in ensureNodeColumns and its test.
The meta column now stores JSON, so update the in-code docs that still called it gob-encoded (package doc, the constant_values / churn sidecar rationale, and the analysis read paths). References to the separate gob+gzip persistence snapshot and to legacy gob rows are intentionally left untouched — those are accurate.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The daemon could pin a core at multi-hundred-percent CPU and climb to multi-GB RSS while seemingly idle. Live pprof traced it to the SQLite store's meta codec:
encodeMeta/decodeMetabuilt a freshgobencoder/decoder per blob, so gob recompiled its type-decode engine on every edge. Over a day of uptime that was ~16.6 TB of allocation, ~86% of it throughscanEdge→decodeMeta, saturating the GC — and a whole-graph resolve walking the edges turned into an hours-long, lock-held grind.What
Node/edge
Metais now stored as JSON instead of gob, plus supporting cleanups. Each change is its own commit.encodeMetaisjson.Marshal;decodeMetaroutes the document throughmetaWire, a typed DTO that parses each known key as its exact Go type (int/int64/float64/*contracts.Shape/[]string/[]map[string]any), with a key-type table normalising the open tail and nested maps. The in-memory map a caller receives is byte-for-byte type-identical to what gob produced — JSON'sfloat64/[]anywidening never reaches a reader, so no reader changes. JSON needs no per-call engine compilation and no custom binary versioning.decodeMetasniffs the leading byte ({⇒ JSON) and falls back to gob for existing on-disk rows, which migrate to JSON on their next write. No schema migration, no forced reindex.signature(the single hottest meta read),visibility,doc,externalmove into dedicated nullable columns — stripped from the JSON blob on write, restored intoMetaon read (transparent to the in-memory model), so they become queryable and the common blob shrinks. ANULLcolumn means "not set" so legacy rows keep their blob values; pre-existing databases gain the columns viaALTERon the nextOpen. Every node-shaped SELECT now resolves to one column-list constant so the projection and scan order can't drift.edges_by_kindindex. Behaviour unchanged.stale_codeinspection readlast_authoredas a string (it's a map) and gated on a never-written flag, so it surfaced nothing — now reads via the shared blame helper with a 365-day threshold; and contract route lookups readmethod/pathat the node's top level when they live under nestedcontract_meta.Testing
*contracts.Shape, nested-map, the integral-float case) survives a persist→reload cycle with its exact type, plus the gob-legacy fallback and the column ALTER migration.store_sqlite(incl. the conformance suite) green under-race;mcp(2327) andindexer(578) suites green;go build ./...,go vet, golangci-lint, and thecmd/gortexwire-contract golden all clean. Nograph.Node/Edgestruct fields changed (promotion is storage-layer only), so the wire contract is unaffected.Deploy note
The codec lives in the store; deploying it is a rebuild + reinstall. Existing stores keep working via the gob fallback and migrate lazily.