feat(scripts): post-mortem snapshot analyzer with replay + motion-history detectors#121
Conversation
…tory detectors
Adds scripts/analyze-snapshot.ts — a pure-Node CLI for offline analysis of
downloaded debug snapshots (DebugSnapshot envelope from src/platform/debug-snapshot.ts).
Phase A — replay-from-seed byte-equality check:
createScenario(seed) → tick() through the captured inputLog up to
snapshot.tick, then JSON-compare serializeWorldState(replay) against the
captured snapshot. A divergence here is a free SCEN-06 regression signal.
Exits 1 on mismatch so CI / scripts can detect a determinism regression
without parsing stdout.
Also reports tile-occupancy clusters (≥5 ants on one tile) and underground
ants standing on non-Open tiles (Solid / Marked / BeingDug — stuck-in-dirt).
Phase B — motion-history detectors:
During replay, samples each live ant's (zone, tile, task, subTask, colony)
every 20 ticks into a 30-sample sliding window (= 30 sim sec). Two classes
reported, with dominant-task annotation:
- Stationary: top tile dominates ≥90% of the window. Grouped by
(zone, tile, colony) so a 75-ant pile-up surfaces as one line.
- Oscillating: ≤3 unique tiles AND not stationary. Canonical tile-set
key so A↔B and B↔A collapse to one eddy.
Queens are filtered (they never move by design).
The dominant-task labeling is the key UX: a human can scan the report and
instantly tell a real bug ("70 Foraging/CarryingFood ants stationary on
(104,0)") apart from expected stuck cases ("18 Fighting ants at the rally
point").
Run:
node --experimental-transform-types scripts/analyze-snapshot.ts <snapshot.json>
--transform-types (not --strip-types) is required because src/platform/save.ts
uses constructor parameter properties, which strip-types rejects. The script
installs the same .js→.ts resolve hook used by scripts/run-sim.ts.
Verified against the May 11 91342-tick snapshot that motivated bug #120:
replay PASS in 173s; analyzer flagged 70 carrier ants stationary at the
underground chamber boundary plus expected non-bug stationary clusters at
the rally point.
Envelope is validated up front (missing/wrong-shape fields exit 3 with a
clear message); malformed JSON or unreadable file also exits 3. scripts/* is
excluded from tsconfig.json's include, consistent with scripts/run-sim.ts.
No tests — this is a diagnostic CLI; existing replay tests
(src/sim/issue-44-snapshot-replay.test.ts) already exercise the
deserializeWorldState + tick() path in CI.
References #120 (carriers wedged at chamber boundary).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c28fa91a04
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Correct usage text: --experimental-transform-types (strip-types fails on src/platform/save.ts parameter properties). - Validate every inputLog entry up front — non-object entries or missing / non-integer / negative issuedAtTick now bail(3) with a clear error pointing to the offending index instead of TypeErroring mid-replay. - On replay FAIL, distinguish trustworthy reports (cluster, stuck-in-dirt — built from the captured snapshot) from caveat-laden ones (motion history — built from the divergent replayed trajectory) so a determinism regression doesn't silently misdirect debugging. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: abd231ad6b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Per save.ts Pitfall 7, serializeWorldState preserves world.commandQueue — debug snapshots and autosaves can fire between ticks while input handlers have staged pending commands. The replay path only processes commands already drained into inputLog, so replay.commandQueue is always [] at the end. A naive byte-equality compare therefore false-positives FAIL on any snapshot captured between input and drain, even though sim determinism is intact. Strip commandQueue from both sides before comparing. Surface the captured pending-command count in the result line so a non-zero queue is visible when reading the report. Addresses codex P1 on PR #121. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 61eb1e42f0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
typeof === 'number' lets NaN, +/-Infinity, and fractional values through.
Concrete harm:
- tick = Infinity → unbounded replay loop
- tick = NaN → loop skipped silently, replay reports PASS on no work
- seed = NaN → createScenario coerces via `seed >>> 0` to 0, producing
a misleading "wrong scenario" replay instead of a clear error
Switch to Number.isInteger (rejects all of the above plus BigInts) and
include the offending value in the error message.
Addresses codex P1 follow-up on PR #121.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Adds a section under "Building and Running" pointing future contributors (human or AI) at scripts/analyze-snapshot.ts as the standard way to dig into a debug snapshot. Names the F9 download convention, the run command (including the --transform-types flag the script requires), and what the analyzer reports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Adds
scripts/analyze-snapshot.ts— a pure-Node CLI for offline analysis of downloaded debug snapshots. No sim-side changes; no save-format changes; no perf cost in-game. References #120.Phase A — replay + byte-equality:
createScenario(seed) → tick()through the capturedinputLogup tosnapshot.tick, then JSON-comparesserializeWorldState(replay)against the captured snapshot. A divergence is a free SCEN-06 regression signal — exits 1 on mismatch so CI can detect it without parsing stdout. Also reports tile-occupancy clusters (≥5 ants on one tile) and underground ants on non-Open tiles (Solid / Marked / BeingDug — stuck-in-dirt).Phase B — motion-history detectors: during replay, samples each live ant's
(zone, tile, task, subTask, colony)every 20 ticks (= 1 sim sec) into a 30-sample sliding window (= 30 sim sec). Reports:(zone, tile, colony)so a pile-up surfaces as one line.Queens are filtered (they never move by design). Each group is labeled with its dominant
(task, subTask)so the reader can tell a real bug ("70 Foraging/CarryingFood ants stationary on (104,0)") from expected stuck cases ("18 Fighting ants at the rally point").Run
--transform-types(not--strip-types) is required becausesrc/platform/save.tsuses constructor parameter properties, which strip-types rejects. The script installs the same.js→.tsresolve hook asscripts/run-sim.ts.Verification
Ran against
~/Downloads/subterrans-debug-seed400367819-tick91342.json(the May 11 snapshot that motivated #120):(104,0) ↔ (104,1) ↔ (104,2)Foraging/CarryingFood — same chamber, same bug class. Expected non-bug: 18 Fighting at the rally point, 4 Nurses oscillating near the queen chamber, ~20 Idle young workers.Caveats
foodPiles[].pickupsInitialdidn't exist yet. Save-compat artifact; analyzer exits with a clear error in that case.JSON.stringifybyte-equality; in the unlikely event a future change reorders object keys inserializeWorldState, the analyzer will report FAIL even though state is semantically identical. The FAIL warning calls this out.scripts/*is excluded fromtsconfig.json'sinclude, consistent withscripts/run-sim.tsandscripts/check-foraging-survival.ts. No tests added — this is a diagnostic CLI;src/sim/issue-44-snapshot-replay.test.tsalready exercises thedeserializeWorldState + tick()path in CI.UAT — quick tests you can run locally
End-to-end on a real snapshot. Download a debug snapshot from the game's debug UI, then:
Expect: prints
Snapshot: …,Replaying from seed for N ticks…, thenPASS,Live ants: …,Tile-occupancy clusters …,Underground ants on non-Open tiles …,Motion-history analysis …,Done.Exit code 0.Missing-argument error path.
Expect: usage message on stderr;
exit=2.Missing file error path.
Expect:
Invalid snapshot: could not read "..." : ENOENT...on stderr;exit=3. (No stack trace.)Malformed JSON.
Expect:
Invalid snapshot: not valid JSON (...);exit=3.Wrong-shape envelope.
Expect:
Invalid snapshot: missing or non-numeric "seed";exit=3.Determinism regression signal. On a real snapshot, temporarily tweak something deterministic in
src/sim/(any byte-changing edit), re-run the analyzer. Expectreplay vs captured byte-equality: FAIL, the SCEN-06 warning printed, thenDone., thenexit=1. Revert the edit afterwards.Dominant-task labels render. On a real snapshot, eyeball the Stationary and Oscillating sections — each line should show a
Foraging/CarryingFood/Fighting/Nursing/Idlestyle label between the count and the[ids…]sample. Mixed-task groups should showFoo (n/total, +k others).🤖 Generated with Claude Code