Add difficulty curve explorer tool for game balance tuning#7
Open
adewale wants to merge 11 commits into
Open
Conversation
Single-file HTML tool at tools/difficulty-curve.html for visualising and tuning the difficulty inputs (per-player-count speed/shoot multipliers, grid sizes, lives, barriers) plus a hypothetical wave ramp. Loads Chart.js from a CDN — no build step. Surfaces the asymmetry players have been reporting: at wave 1 the 4p difficulty score is several multiples of solo, and there is no across-wave ramp in the shipped game (only barriers degrade because they persist between waves). Includes presets for the current shipped values, an "easier multi" tuning, a flat (no scaling) tuning, and a classic-Invaders-style per-wave ramp.
9ee64b8 to
490f96a
Compare
Specifies the three-part tuning loop discussed for fixing multiplayer difficulty: (1) extract the hardcoded scaling table into a DifficultyConfig document loadable via env override, (2) a deterministic bot simulation harness over the pure reducer to screen candidate configs, (3) wide-event telemetry additions and an analysis script to measure the real difficulty curve. Section 6 defines the judging gates: refactor produces bit-identical behavior (golden tests + seed-identical sim runs), candidate configs must bring 2-4p median final wave within ±1 wave of solo without breaking solo or making the game unlosable, and production telemetry must show the wave-2 bounce-rate gap closing to ≤1.5x solo.
Audit for the deterministic-simulation prerequisite (per the TigerBeetle VOPR and Jane Street library-level-simulation patterns in adewale/testing-best-practices): the reducer is already free of wall-clock, unseeded randomness, timers, and network, but three pieces of game-progression logic live in GameRoom orchestration — wave-2+ alien spawning at the wipe_hold→wipe_reveal transition, nextWave() (barrier persistence), and nextEntityId instance state. Spec now requires moving these into the reducer rather than duplicating them in the sim runner, making gameReducer the complete state machine and the sim a single-process for-loop with zero network.
Move the three pieces of game-progression logic that lived in GameRoom orchestration into the pure reducer, so reducer-only simulation can run complete multi-wave games with zero Durable Object involvement: 1. Entity id generation: nextEntityId is now a GameState field (was DO instance state). migrateGameState derives a non-colliding default from existing e_<n> ids for old persisted states; GameRoom adopts the legacy SQLite column value on rehydration and keeps writing it for backward compat, but state is the source of truth. 2. Wave-2+ alien spawning: the wipe_hold → wipe_reveal transition in the reducer now spawns the scaled formation (entering=true) itself, using createAlienFormation with ids from state.nextEntityId. 3. nextWave(): wave increment, barrier-only entity prune, alienDirection reset, and wipe_exit start now happen in the same reducer TICK that emits wave_complete (which also returns persist=true). GameRoom's nextWave() shrinks to side effects only: persistState() and the wave_complete wide event with unchanged fields (wave, nextWave, survivors). GameRoom.tick() is now a transport/persistence shell: dispatch actions, broadcast, persist. Tests that encoded the old split (barrier PBT states with zero aliens, GameRoom-simulated spawning) were updated to the new behavior; new reducer tests cover a pure full wave transition and 2000-tick determinism. Web change is the contract-test allowlist entry classifying nextEntityId as server-only. https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv
Any difficulty configuration is now one JSON document loaded at game
start; the shipped values are just the default document. Behavior with
DEFAULT_DIFFICULTY is bit-identical to before — golden tests enforce it.
- shared/types.ts: new DifficultyConfig interface + DEFAULT_DIFFICULTY
("ship-v1": base 0.016/18, the old scaleTable per player count, lives
3/5/5/5, barriers 3/4/4/4, livesMode 'shared', zero waveRamp capped at
wave 8) and validateDifficultyConfig (structural issues list).
ScaledConfig gains a `barriers` field. GameState gains a `difficulty`
snapshot so running games are self-describing and reducer-only sims
can vary the config per run.
- getScaledConfig(playerCount, baseConfig) → (playerCount, wave,
difficulty): reads the document and applies the wave ramp
speedMult(w) = speedMult * (1 + speedPctPerWave * (min(w, cap) - 1))
(same for shoot), floors the move interval with a ≥1 clamp, and sizes
the lives pool (per-player mode multiplies by player count).
Out-of-range player counts fall back to the 1-player entry wholesale
(previously cols/rows fell back but lives were hardcoded to 5; the
divergence only affected impossible counts like 0/5/-1).
- Reducer reads state.difficulty (tick, wipe spawn, START_SOLO lives) —
never a module constant. GameRoom.startGame() resolves the document:
env.DIFFICULTY_CONFIG (JSON string) if present and valid, else
DEFAULT_DIFFICULTY; bad configs log difficulty_config_invalid and
fall back, never crash a room. game_start now logs
difficultyConfigName. createBarriers() takes the count from the
scaled config instead of min(4, playerCount + 2); the now-dead
MAX_BARRIER_COUNT / BARRIER_PLAYER_OFFSET constants are removed.
- wrangler.jsonc documents the var with a commented-out example;
Env.DIFFICULTY_CONFIG added to worker/src/env.ts. Old persisted
states migrate via GAME_STATE_DEFAULTS / migrateGameState.
Tests: golden per-player-count values (18/14/12/10 ticks, 0.016–0.040
shoot, 11×5→13×6 grids, 3/5 lives, 3/4 barriers), wave-ramp math + cap,
livesMode pool sizing, JSON round-trip, validateDifficultyConfig units,
and GameRoom env-override/fallback integration (valid, malformed JSON,
structurally invalid). Web contract allowlist classifies `difficulty`
as server-only.
https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv
Headless simulation of full games through the pure gameReducer: - worker/src/sim/bot.ts: three fixed bot policies (random, novice, competent) as pure functions of (GameState, slot, gameSeed). Bots act through the same reducer actions real clients use (PLAYER_INPUT held keys + PLAYER_SHOOT); randomness is a pure hash of (gameSeed, slot, tick) — no Math.random anywhere. - worker/src/sim/runner.ts: runSimGame() builds a startGame()-equivalent initial state (wipe_hold, barriers per difficulty config, seeded RNG) and loops bot intents + TICK to game_over or tick cap, returning a GameResult per spec §3.5. Default cap is 18,000 ticks (10 sim-min): the reducer's per-tick structuredClone limits throughput to ~5k ticks/s, so the spec's 54,000 default would make 500-game cells take ~30 min (§3.6 says lower the cap before the sample count). - worker/src/sim/experiment.ts: (config × playerCount × policy × seed) grid with per-cell median/p25/p75, defeat-before-wave-2 rate, mean wave-1 lives lost, outcome breakdown. - worker/src/sim/report.ts: markdown table grouped by config + JSON. - worker/src/sim/cli.ts: `bun run sim -- --configs ... --games N`; built-in configs resolve from worker/src/sim/configs/. - worker/src/sim/configs/: ship-v1 (= DEFAULT_DIFFICULTY), easier-multi, flat, classic-ramp — the four visualizer presets. - worker/src/sim/sim.test.ts: §3.7 validation — determinism (same seed → deep-equal GameResult), skill ordering over 20 seeds (median survival competent > novice > random: ~13k > ~1k > ~0.6k ticks solo), wave progression (competent clears wave 1), config validation. Not wired into the worker entry graph — src/index.ts does not import sim/, so nothing here ships in the production bundle. First measurement (ship-v1, competent, 20 seeds): solo median final wave 16.5 with 0% defeat before wave 2; 4-player games end by invasion at tick 190 with 100% defeat before wave 2 — the 13-column formation is wider than the alien movement range, so it drops every move interval. This is the quantified version of the "multiplayer is unfair" report. https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv
The full experiment grid revealed that every config with 13-column alien grids at 3p/4p ends in invasion within 6-10 seconds — the formation is 115 cells wide vs an alien movement range of 110 cells, so leftmost+rightmost columns hit walls every move interval and the formation drops every tick. Not a difficulty problem; a layout bug. ship-v2 keeps ship-v1's exact multipliers and lives but caps cols at 11 everywhere (4p moves to 11×6 = 66 aliens, vs the broken 13×6 = 78). Solo is bit-identical to ship-v1 (G1 preserved). Re-running the experiment grid will confirm whether the layout fix alone is enough or whether the multipliers need re-tuning. https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv
Both are ship-v2 (11-col layout fix) plus a modest wave ramp, addressing the T3 "later waves never get harder" gap: v2.1 ramps +2% speed / +3% shoot per wave, v2.2 ramps +3% / +5%, both capped at wave 10. Experiment results to follow. https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv
Designer-facing static HTML report (tools/difficulty-report.html) that visualizes bot-sim results JSON: per-wave hazard curves (Aponte/Levieux/ Natkin failure probability), Kaplan-Meier survival with cap-censoring, flow-channel challenge proxy with tunable band, wave pacing, and summary cards anchored to the first config's solo cell. Loads files via drag-drop, file picker, or bundled sample dataset (ship-v1 vs ship-v2, 800 games). Pure transforms live in tools/difficulty-report-lib.js (UMD: browser script + CommonJS) and are tested by `bun tools/difficulty-report-test.js` (47 assertions against the real sample data, incl. the known ship-v1 4p wave-1 wipe yielding hazard 1.0 and ship-v2 fixing it). https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv
The ship-v2 commit added a fifth builtin without updating the "exactly four built-in configs" assertion, and ship-v2.1/v2.2 were only loadable by file path. All seven shipped configs are now registered and the test validates each against validateDifficultyConfig. https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv
Drop both sample files onto tools/difficulty-report.html to compare ship-v1, ship-v2, ship-v2.1 and ship-v2.2 across all views. Verdict from this grid: ship-v2.1 (+2% speed/+3% shoot per wave, cap wave 10) keeps solo within the G1 band (median wave 14 vs 15) while cutting 3p/4p cap-outs by half to two-thirds; ship-v2.2 over-corrects and drags solo down 33%. https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add an interactive web-based tool for exploring and tuning the difficulty curve of the game across different player counts and waves. This allows designers to visualize how scaling parameters affect game difficulty and balance.
Key Changes
New tool:
tools/difficulty-curve.html— a standalone HTML/Canvas.js application that provides:Difficulty scoring: Implements a composite metric combining threat-per-second, effective lives per player, alien pace, and barrier protection to quantify relative difficulty
Lives mode toggle: Switch between shared pool and per-player lives accounting to see how different life systems affect balance
Dark theme UI: GitHub-inspired dark color scheme with responsive grid layout
Implementation Details
worker/src/game/scaling.tsandGameRoom.createBarriersdefaultshttps://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv