Skip to content

feat(scripts): post-mortem snapshot analyzer with replay + motion-history detectors#121

Merged
LightAxe merged 5 commits into
mainfrom
feat/analyze-snapshot-cli
May 13, 2026
Merged

feat(scripts): post-mortem snapshot analyzer with replay + motion-history detectors#121
LightAxe merged 5 commits into
mainfrom
feat/analyze-snapshot-cli

Conversation

@LightAxe
Copy link
Copy Markdown
Owner

Summary

Adds scripts/analyze-snapshot.ts — a pure-Node CLI for offline analysis of downloaded debug snapshots. No sim-side changes; no save-format changes; no perf cost in-game. References #120.

Phase A — replay + byte-equality: createScenario(seed) → tick() through the captured inputLog up to snapshot.tick, then JSON-compares serializeWorldState(replay) against the captured snapshot. A divergence is a free SCEN-06 regression signal — exits 1 on mismatch so CI can detect it without parsing stdout. Also reports tile-occupancy clusters (≥5 ants on one tile) and underground ants on non-Open tiles (Solid / Marked / BeingDug — stuck-in-dirt).

Phase B — motion-history detectors: during replay, samples each live ant's (zone, tile, task, subTask, colony) every 20 ticks (= 1 sim sec) into a 30-sample sliding window (= 30 sim sec). Reports:

  • Stationary — top tile dominates ≥90% of the window. Grouped by (zone, tile, colony) so a pile-up surfaces as one line.
  • Oscillating — ≤3 unique tiles AND not stationary. Canonical tile-set key so A↔B and B↔A collapse to one eddy.

Queens are filtered (they never move by design). Each group is labeled with its dominant (task, subTask) so the reader can tell a real bug ("70 Foraging/CarryingFood ants stationary on (104,0)") from expected stuck cases ("18 Fighting ants at the rally point").

Run

node --experimental-transform-types scripts/analyze-snapshot.ts <snapshot.json>

--transform-types (not --strip-types) is required because src/platform/save.ts uses constructor parameter properties, which strip-types rejects. The script installs the same .js.ts resolve hook as scripts/run-sim.ts.

Verification

Ran against ~/Downloads/subterrans-debug-seed400367819-tick91342.json (the May 11 snapshot that motivated #120):

  • Replay: 91342 ticks in 173.8s. Byte-equality PASS.
  • Tile clusters: 75 ants on underground (104,0) colony 2; 18 on surface (104,64) colony 2.
  • Motion: 70 stationary Foraging/CarryingFood ants on underground (104,0) — bug [BUG] Ants get stuck oscillating between adjacent tiles (and stacking on hot tiles) as ant count grows #120 symptom (carriers wedged at chamber boundary). Ant 252 oscillating (104,0) ↔ (104,1) ↔ (104,2) Foraging/CarryingFood — same chamber, same bug class. Expected non-bug: 18 Fighting at the rally point, 4 Nurses oscillating near the queen chamber, ~20 Idle young workers.

Caveats

  • Old snapshots (pre-PR feat(sim,render,save): finite food piles with runtime respawn (closes #112) #113) won't deserialize — foodPiles[].pickupsInitial didn't exist yet. Save-compat artifact; analyzer exits with a clear error in that case.
  • Replay time is O(captured tick count) — 91k ticks ≈ 3 min. Post-mortem use, not real-time.
  • Replay determinism check uses JSON.stringify byte-equality; in the unlikely event a future change reorders object keys in serializeWorldState, the analyzer will report FAIL even though state is semantically identical. The FAIL warning calls this out.
  • scripts/* is excluded from tsconfig.json's include, consistent with scripts/run-sim.ts and scripts/check-foraging-survival.ts. No tests added — this is a diagnostic CLI; src/sim/issue-44-snapshot-replay.test.ts already exercises the deserializeWorldState + tick() path in CI.

UAT — quick tests you can run locally

  1. End-to-end on a real snapshot. Download a debug snapshot from the game's debug UI, then:

    node --experimental-transform-types scripts/analyze-snapshot.ts ~/Downloads/subterrans-debug-seed*.json

    Expect: prints Snapshot: …, Replaying from seed for N ticks…, then PASS, Live ants: …, Tile-occupancy clusters …, Underground ants on non-Open tiles …, Motion-history analysis …, Done. Exit code 0.

  2. Missing-argument error path.

    node --experimental-transform-types scripts/analyze-snapshot.ts
    echo "exit=$?"

    Expect: usage message on stderr; exit=2.

  3. Missing file error path.

    node --experimental-transform-types scripts/analyze-snapshot.ts /tmp/does-not-exist.json
    echo "exit=$?"

    Expect: Invalid snapshot: could not read "..." : ENOENT... on stderr; exit=3. (No stack trace.)

  4. Malformed JSON.

    echo 'not json' > /tmp/bad.json
    node --experimental-transform-types scripts/analyze-snapshot.ts /tmp/bad.json
    echo "exit=$?"

    Expect: Invalid snapshot: not valid JSON (...); exit=3.

  5. Wrong-shape envelope.

    echo '{}' > /tmp/empty.json
    node --experimental-transform-types scripts/analyze-snapshot.ts /tmp/empty.json
    echo "exit=$?"

    Expect: Invalid snapshot: missing or non-numeric "seed"; exit=3.

  6. Determinism regression signal. On a real snapshot, temporarily tweak something deterministic in src/sim/ (any byte-changing edit), re-run the analyzer. Expect replay vs captured byte-equality: FAIL, the SCEN-06 warning printed, then Done., then exit=1. Revert the edit afterwards.

  7. Dominant-task labels render. On a real snapshot, eyeball the Stationary and Oscillating sections — each line should show a Foraging/CarryingFood / Fighting / Nursing / Idle style label between the count and the [ids…] sample. Mixed-task groups should show Foo (n/total, +k others).

🤖 Generated with Claude Code

…tory detectors

Adds scripts/analyze-snapshot.ts — a pure-Node CLI for offline analysis of
downloaded debug snapshots (DebugSnapshot envelope from src/platform/debug-snapshot.ts).

Phase A — replay-from-seed byte-equality check:
  createScenario(seed) → tick() through the captured inputLog up to
  snapshot.tick, then JSON-compare serializeWorldState(replay) against the
  captured snapshot. A divergence here is a free SCEN-06 regression signal.
  Exits 1 on mismatch so CI / scripts can detect a determinism regression
  without parsing stdout.

  Also reports tile-occupancy clusters (≥5 ants on one tile) and underground
  ants standing on non-Open tiles (Solid / Marked / BeingDug — stuck-in-dirt).

Phase B — motion-history detectors:
  During replay, samples each live ant's (zone, tile, task, subTask, colony)
  every 20 ticks into a 30-sample sliding window (= 30 sim sec). Two classes
  reported, with dominant-task annotation:
    - Stationary: top tile dominates ≥90% of the window. Grouped by
      (zone, tile, colony) so a 75-ant pile-up surfaces as one line.
    - Oscillating: ≤3 unique tiles AND not stationary. Canonical tile-set
      key so A↔B and B↔A collapse to one eddy.
  Queens are filtered (they never move by design).

The dominant-task labeling is the key UX: a human can scan the report and
instantly tell a real bug ("70 Foraging/CarryingFood ants stationary on
(104,0)") apart from expected stuck cases ("18 Fighting ants at the rally
point").

Run:
  node --experimental-transform-types scripts/analyze-snapshot.ts <snapshot.json>

--transform-types (not --strip-types) is required because src/platform/save.ts
uses constructor parameter properties, which strip-types rejects. The script
installs the same .js→.ts resolve hook used by scripts/run-sim.ts.

Verified against the May 11 91342-tick snapshot that motivated bug #120:
replay PASS in 173s; analyzer flagged 70 carrier ants stationary at the
underground chamber boundary plus expected non-bug stationary clusters at
the rally point.

Envelope is validated up front (missing/wrong-shape fields exit 3 with a
clear message); malformed JSON or unreadable file also exits 3. scripts/* is
excluded from tsconfig.json's include, consistent with scripts/run-sim.ts.
No tests — this is a diagnostic CLI; existing replay tests
(src/sim/issue-44-snapshot-replay.test.ts) already exercise the
deserializeWorldState + tick() path in CI.

References #120 (carriers wedged at chamber boundary).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c28fa91a04

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/analyze-snapshot.ts Outdated
Comment thread scripts/analyze-snapshot.ts Outdated
Comment thread scripts/analyze-snapshot.ts
- Correct usage text: --experimental-transform-types (strip-types fails on
  src/platform/save.ts parameter properties).
- Validate every inputLog entry up front — non-object entries or missing /
  non-integer / negative issuedAtTick now bail(3) with a clear error
  pointing to the offending index instead of TypeErroring mid-replay.
- On replay FAIL, distinguish trustworthy reports (cluster, stuck-in-dirt
  — built from the captured snapshot) from caveat-laden ones (motion
  history — built from the divergent replayed trajectory) so a determinism
  regression doesn't silently misdirect debugging.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LightAxe
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: abd231ad6b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/analyze-snapshot.ts Outdated
Per save.ts Pitfall 7, serializeWorldState preserves world.commandQueue —
debug snapshots and autosaves can fire between ticks while input handlers
have staged pending commands. The replay path only processes commands
already drained into inputLog, so replay.commandQueue is always [] at the
end. A naive byte-equality compare therefore false-positives FAIL on any
snapshot captured between input and drain, even though sim determinism is
intact.

Strip commandQueue from both sides before comparing. Surface the captured
pending-command count in the result line so a non-zero queue is visible
when reading the report.

Addresses codex P1 on PR #121.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LightAxe
Copy link
Copy Markdown
Owner Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61eb1e42f0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/analyze-snapshot.ts Outdated
typeof === 'number' lets NaN, +/-Infinity, and fractional values through.
Concrete harm:
  - tick = Infinity → unbounded replay loop
  - tick = NaN → loop skipped silently, replay reports PASS on no work
  - seed = NaN → createScenario coerces via `seed >>> 0` to 0, producing
    a misleading "wrong scenario" replay instead of a clear error
Switch to Number.isInteger (rejects all of the above plus BigInts) and
include the offending value in the error message.

Addresses codex P1 follow-up on PR #121.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LightAxe
Copy link
Copy Markdown
Owner Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Swish!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Adds a section under "Building and Running" pointing future contributors
(human or AI) at scripts/analyze-snapshot.ts as the standard way to dig
into a debug snapshot. Names the F9 download convention, the run command
(including the --transform-types flag the script requires), and what the
analyzer reports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@LightAxe LightAxe merged commit 8370950 into main May 13, 2026
1 check passed
@LightAxe LightAxe deleted the feat/analyze-snapshot-cli branch May 13, 2026 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant