Add difficulty curve explorer tool for game balance tuning by adewale · Pull Request #7 · adewale/vaders

adewale · 2026-05-01T14:54:04Z

Summary

Add an interactive web-based tool for exploring and tuning the difficulty curve of the game across different player counts and waves. This allows designers to visualize how scaling parameters affect game difficulty and balance.

Key Changes

New tool: tools/difficulty-curve.html — a standalone HTML/Canvas.js application that provides:
- Real-time visualization of difficulty metrics across 1–4 player counts
- Interactive sliders for tuning base alien behavior (shoot rate, move speed)
- Per-player-count scaling grid for speed, shoot rate, alien grid dimensions, lives, and barriers
- Hypothetical wave ramp controls (speed/shoot bonuses per wave, barrier degradation)
- Four preset configurations (current ship settings, easier multiplayer, flat scaling, classic-style ramp)
- Four bar charts showing wave-1 metrics: alien shots/sec, move pace, lives per player, time-to-loss
- Line chart tracking difficulty score across 10 waves for all player counts
- Summary cards and detailed table comparing difficulty ratios relative to solo play
Difficulty scoring: Implements a composite metric combining threat-per-second, effective lives per player, alien pace, and barrier protection to quantify relative difficulty
Lives mode toggle: Switch between shared pool and per-player lives accounting to see how different life systems affect balance
Dark theme UI: GitHub-inspired dark color scheme with responsive grid layout

Implementation Details

Uses Chart.js 4.4.1 for all visualizations
Metrics computed on-the-fly from configuration inputs with 200ms animation
Barrier HP model: 9 segments × 4 HP per barrier, with configurable wave-based degradation
Tick rate hardcoded to 30Hz to match game engine
All parameters match current worker/src/game/scaling.ts and GameRoom.createBarriers defaults
No external dependencies beyond Chart.js; fully self-contained single HTML file

https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv

Single-file HTML tool at tools/difficulty-curve.html for visualising and tuning the difficulty inputs (per-player-count speed/shoot multipliers, grid sizes, lives, barriers) plus a hypothetical wave ramp. Loads Chart.js from a CDN — no build step. Surfaces the asymmetry players have been reporting: at wave 1 the 4p difficulty score is several multiples of solo, and there is no across-wave ramp in the shipped game (only barriers degrade because they persist between waves). Includes presets for the current shipped values, an "easier multi" tuning, a flat (no scaling) tuning, and a classic-Invaders-style per-wave ramp.

Specifies the three-part tuning loop discussed for fixing multiplayer difficulty: (1) extract the hardcoded scaling table into a DifficultyConfig document loadable via env override, (2) a deterministic bot simulation harness over the pure reducer to screen candidate configs, (3) wide-event telemetry additions and an analysis script to measure the real difficulty curve. Section 6 defines the judging gates: refactor produces bit-identical behavior (golden tests + seed-identical sim runs), candidate configs must bring 2-4p median final wave within ±1 wave of solo without breaking solo or making the game unlosable, and production telemetry must show the wave-2 bounce-rate gap closing to ≤1.5x solo.

Audit for the deterministic-simulation prerequisite (per the TigerBeetle VOPR and Jane Street library-level-simulation patterns in adewale/testing-best-practices): the reducer is already free of wall-clock, unseeded randomness, timers, and network, but three pieces of game-progression logic live in GameRoom orchestration — wave-2+ alien spawning at the wipe_hold→wipe_reveal transition, nextWave() (barrier persistence), and nextEntityId instance state. Spec now requires moving these into the reducer rather than duplicating them in the sim runner, making gameReducer the complete state machine and the sim a single-process for-loop with zero network.

Move the three pieces of game-progression logic that lived in GameRoom orchestration into the pure reducer, so reducer-only simulation can run complete multi-wave games with zero Durable Object involvement: 1. Entity id generation: nextEntityId is now a GameState field (was DO instance state). migrateGameState derives a non-colliding default from existing e_<n> ids for old persisted states; GameRoom adopts the legacy SQLite column value on rehydration and keeps writing it for backward compat, but state is the source of truth. 2. Wave-2+ alien spawning: the wipe_hold → wipe_reveal transition in the reducer now spawns the scaled formation (entering=true) itself, using createAlienFormation with ids from state.nextEntityId. 3. nextWave(): wave increment, barrier-only entity prune, alienDirection reset, and wipe_exit start now happen in the same reducer TICK that emits wave_complete (which also returns persist=true). GameRoom's nextWave() shrinks to side effects only: persistState() and the wave_complete wide event with unchanged fields (wave, nextWave, survivors). GameRoom.tick() is now a transport/persistence shell: dispatch actions, broadcast, persist. Tests that encoded the old split (barrier PBT states with zero aliens, GameRoom-simulated spawning) were updated to the new behavior; new reducer tests cover a pure full wave transition and 2000-tick determinism. Web change is the contract-test allowlist entry classifying nextEntityId as server-only. https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv

Any difficulty configuration is now one JSON document loaded at game start; the shipped values are just the default document. Behavior with DEFAULT_DIFFICULTY is bit-identical to before — golden tests enforce it. - shared/types.ts: new DifficultyConfig interface + DEFAULT_DIFFICULTY ("ship-v1": base 0.016/18, the old scaleTable per player count, lives 3/5/5/5, barriers 3/4/4/4, livesMode 'shared', zero waveRamp capped at wave 8) and validateDifficultyConfig (structural issues list). ScaledConfig gains a `barriers` field. GameState gains a `difficulty` snapshot so running games are self-describing and reducer-only sims can vary the config per run. - getScaledConfig(playerCount, baseConfig) → (playerCount, wave, difficulty): reads the document and applies the wave ramp speedMult(w) = speedMult * (1 + speedPctPerWave * (min(w, cap) - 1)) (same for shoot), floors the move interval with a ≥1 clamp, and sizes the lives pool (per-player mode multiplies by player count). Out-of-range player counts fall back to the 1-player entry wholesale (previously cols/rows fell back but lives were hardcoded to 5; the divergence only affected impossible counts like 0/5/-1). - Reducer reads state.difficulty (tick, wipe spawn, START_SOLO lives) — never a module constant. GameRoom.startGame() resolves the document: env.DIFFICULTY_CONFIG (JSON string) if present and valid, else DEFAULT_DIFFICULTY; bad configs log difficulty_config_invalid and fall back, never crash a room. game_start now logs difficultyConfigName. createBarriers() takes the count from the scaled config instead of min(4, playerCount + 2); the now-dead MAX_BARRIER_COUNT / BARRIER_PLAYER_OFFSET constants are removed. - wrangler.jsonc documents the var with a commented-out example; Env.DIFFICULTY_CONFIG added to worker/src/env.ts. Old persisted states migrate via GAME_STATE_DEFAULTS / migrateGameState. Tests: golden per-player-count values (18/14/12/10 ticks, 0.016–0.040 shoot, 11×5→13×6 grids, 3/5 lives, 3/4 barriers), wave-ramp math + cap, livesMode pool sizing, JSON round-trip, validateDifficultyConfig units, and GameRoom env-override/fallback integration (valid, malformed JSON, structurally invalid). Web contract allowlist classifies `difficulty` as server-only. https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv

Headless simulation of full games through the pure gameReducer: - worker/src/sim/bot.ts: three fixed bot policies (random, novice, competent) as pure functions of (GameState, slot, gameSeed). Bots act through the same reducer actions real clients use (PLAYER_INPUT held keys + PLAYER_SHOOT); randomness is a pure hash of (gameSeed, slot, tick) — no Math.random anywhere. - worker/src/sim/runner.ts: runSimGame() builds a startGame()-equivalent initial state (wipe_hold, barriers per difficulty config, seeded RNG) and loops bot intents + TICK to game_over or tick cap, returning a GameResult per spec §3.5. Default cap is 18,000 ticks (10 sim-min): the reducer's per-tick structuredClone limits throughput to ~5k ticks/s, so the spec's 54,000 default would make 500-game cells take ~30 min (§3.6 says lower the cap before the sample count). - worker/src/sim/experiment.ts: (config × playerCount × policy × seed) grid with per-cell median/p25/p75, defeat-before-wave-2 rate, mean wave-1 lives lost, outcome breakdown. - worker/src/sim/report.ts: markdown table grouped by config + JSON. - worker/src/sim/cli.ts: `bun run sim -- --configs ... --games N`; built-in configs resolve from worker/src/sim/configs/. - worker/src/sim/configs/: ship-v1 (= DEFAULT_DIFFICULTY), easier-multi, flat, classic-ramp — the four visualizer presets. - worker/src/sim/sim.test.ts: §3.7 validation — determinism (same seed → deep-equal GameResult), skill ordering over 20 seeds (median survival competent > novice > random: ~13k > ~1k > ~0.6k ticks solo), wave progression (competent clears wave 1), config validation. Not wired into the worker entry graph — src/index.ts does not import sim/, so nothing here ships in the production bundle. First measurement (ship-v1, competent, 20 seeds): solo median final wave 16.5 with 0% defeat before wave 2; 4-player games end by invasion at tick 190 with 100% defeat before wave 2 — the 13-column formation is wider than the alien movement range, so it drops every move interval. This is the quantified version of the "multiplayer is unfair" report. https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv

The full experiment grid revealed that every config with 13-column alien grids at 3p/4p ends in invasion within 6-10 seconds — the formation is 115 cells wide vs an alien movement range of 110 cells, so leftmost+rightmost columns hit walls every move interval and the formation drops every tick. Not a difficulty problem; a layout bug. ship-v2 keeps ship-v1's exact multipliers and lives but caps cols at 11 everywhere (4p moves to 11×6 = 66 aliens, vs the broken 13×6 = 78). Solo is bit-identical to ship-v1 (G1 preserved). Re-running the experiment grid will confirm whether the layout fix alone is enough or whether the multipliers need re-tuning. https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv

Both are ship-v2 (11-col layout fix) plus a modest wave ramp, addressing the T3 "later waves never get harder" gap: v2.1 ramps +2% speed / +3% shoot per wave, v2.2 ramps +3% / +5%, both capped at wave 10. Experiment results to follow. https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv

Designer-facing static HTML report (tools/difficulty-report.html) that visualizes bot-sim results JSON: per-wave hazard curves (Aponte/Levieux/ Natkin failure probability), Kaplan-Meier survival with cap-censoring, flow-channel challenge proxy with tunable band, wave pacing, and summary cards anchored to the first config's solo cell. Loads files via drag-drop, file picker, or bundled sample dataset (ship-v1 vs ship-v2, 800 games). Pure transforms live in tools/difficulty-report-lib.js (UMD: browser script + CommonJS) and are tested by `bun tools/difficulty-report-test.js` (47 assertions against the real sample data, incl. the known ship-v1 4p wave-1 wipe yielding hazard 1.0 and ship-v2 fixing it). https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv

The ship-v2 commit added a fifth builtin without updating the "exactly four built-in configs" assertion, and ship-v2.1/v2.2 were only loadable by file path. All seven shipped configs are now registered and the test validates each against validateDifficultyConfig. https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv

Drop both sample files onto tools/difficulty-report.html to compare ship-v1, ship-v2, ship-v2.1 and ship-v2.2 across all views. Verdict from this grid: ship-v2.1 (+2% speed/+3% shoot per wave, cap wave 10) keeps solo within the G1 band (median wave 14 vs 15) while cutting 3p/4p cap-outs by half to two-thirds; ship-v2.2 over-corrects and drags solo down 33%. https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv

adewale force-pushed the claude/difficulty-scaling-multiplayer-sbKE3 branch from 9ee64b8 to 490f96a Compare May 1, 2026 14:56

claude added 10 commits June 10, 2026 08:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add difficulty curve explorer tool for game balance tuning#7

Add difficulty curve explorer tool for game balance tuning#7
adewale wants to merge 11 commits into
mainfrom
claude/difficulty-scaling-multiplayer-sbKE3

adewale commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

adewale commented May 1, 2026

Summary

Key Changes

Implementation Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants