Skip to content

Add difficulty curve explorer tool for game balance tuning#7

Open
adewale wants to merge 11 commits into
mainfrom
claude/difficulty-scaling-multiplayer-sbKE3
Open

Add difficulty curve explorer tool for game balance tuning#7
adewale wants to merge 11 commits into
mainfrom
claude/difficulty-scaling-multiplayer-sbKE3

Conversation

@adewale

@adewale adewale commented May 1, 2026

Copy link
Copy Markdown
Owner

Summary

Add an interactive web-based tool for exploring and tuning the difficulty curve of the game across different player counts and waves. This allows designers to visualize how scaling parameters affect game difficulty and balance.

Key Changes

  • New tool: tools/difficulty-curve.html — a standalone HTML/Canvas.js application that provides:

    • Real-time visualization of difficulty metrics across 1–4 player counts
    • Interactive sliders for tuning base alien behavior (shoot rate, move speed)
    • Per-player-count scaling grid for speed, shoot rate, alien grid dimensions, lives, and barriers
    • Hypothetical wave ramp controls (speed/shoot bonuses per wave, barrier degradation)
    • Four preset configurations (current ship settings, easier multiplayer, flat scaling, classic-style ramp)
    • Four bar charts showing wave-1 metrics: alien shots/sec, move pace, lives per player, time-to-loss
    • Line chart tracking difficulty score across 10 waves for all player counts
    • Summary cards and detailed table comparing difficulty ratios relative to solo play
  • Difficulty scoring: Implements a composite metric combining threat-per-second, effective lives per player, alien pace, and barrier protection to quantify relative difficulty

  • Lives mode toggle: Switch between shared pool and per-player lives accounting to see how different life systems affect balance

  • Dark theme UI: GitHub-inspired dark color scheme with responsive grid layout

Implementation Details

  • Uses Chart.js 4.4.1 for all visualizations
  • Metrics computed on-the-fly from configuration inputs with 200ms animation
  • Barrier HP model: 9 segments × 4 HP per barrier, with configurable wave-based degradation
  • Tick rate hardcoded to 30Hz to match game engine
  • All parameters match current worker/src/game/scaling.ts and GameRoom.createBarriers defaults
  • No external dependencies beyond Chart.js; fully self-contained single HTML file

https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv

Single-file HTML tool at tools/difficulty-curve.html for visualising and
tuning the difficulty inputs (per-player-count speed/shoot multipliers,
grid sizes, lives, barriers) plus a hypothetical wave ramp. Loads
Chart.js from a CDN — no build step.

Surfaces the asymmetry players have been reporting: at wave 1 the 4p
difficulty score is several multiples of solo, and there is no
across-wave ramp in the shipped game (only barriers degrade because
they persist between waves).

Includes presets for the current shipped values, an "easier multi"
tuning, a flat (no scaling) tuning, and a classic-Invaders-style
per-wave ramp.
@adewale adewale force-pushed the claude/difficulty-scaling-multiplayer-sbKE3 branch from 9ee64b8 to 490f96a Compare May 1, 2026 14:56
claude added 10 commits June 10, 2026 08:46
Specifies the three-part tuning loop discussed for fixing multiplayer
difficulty: (1) extract the hardcoded scaling table into a
DifficultyConfig document loadable via env override, (2) a deterministic
bot simulation harness over the pure reducer to screen candidate
configs, (3) wide-event telemetry additions and an analysis script to
measure the real difficulty curve.

Section 6 defines the judging gates: refactor produces bit-identical
behavior (golden tests + seed-identical sim runs), candidate configs
must bring 2-4p median final wave within ±1 wave of solo without
breaking solo or making the game unlosable, and production telemetry
must show the wave-2 bounce-rate gap closing to ≤1.5x solo.
Audit for the deterministic-simulation prerequisite (per the TigerBeetle
VOPR and Jane Street library-level-simulation patterns in
adewale/testing-best-practices): the reducer is already free of
wall-clock, unseeded randomness, timers, and network, but three pieces
of game-progression logic live in GameRoom orchestration — wave-2+
alien spawning at the wipe_hold→wipe_reveal transition, nextWave()
(barrier persistence), and nextEntityId instance state.

Spec now requires moving these into the reducer rather than duplicating
them in the sim runner, making gameReducer the complete state machine
and the sim a single-process for-loop with zero network.
Move the three pieces of game-progression logic that lived in GameRoom
orchestration into the pure reducer, so reducer-only simulation can run
complete multi-wave games with zero Durable Object involvement:

1. Entity id generation: nextEntityId is now a GameState field (was DO
   instance state). migrateGameState derives a non-colliding default
   from existing e_<n> ids for old persisted states; GameRoom adopts the
   legacy SQLite column value on rehydration and keeps writing it for
   backward compat, but state is the source of truth.

2. Wave-2+ alien spawning: the wipe_hold → wipe_reveal transition in the
   reducer now spawns the scaled formation (entering=true) itself,
   using createAlienFormation with ids from state.nextEntityId.

3. nextWave(): wave increment, barrier-only entity prune, alienDirection
   reset, and wipe_exit start now happen in the same reducer TICK that
   emits wave_complete (which also returns persist=true). GameRoom's
   nextWave() shrinks to side effects only: persistState() and the
   wave_complete wide event with unchanged fields (wave, nextWave,
   survivors).

GameRoom.tick() is now a transport/persistence shell: dispatch actions,
broadcast, persist. Tests that encoded the old split (barrier PBT states
with zero aliens, GameRoom-simulated spawning) were updated to the new
behavior; new reducer tests cover a pure full wave transition and
2000-tick determinism. Web change is the contract-test allowlist entry
classifying nextEntityId as server-only.

https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv
Any difficulty configuration is now one JSON document loaded at game
start; the shipped values are just the default document. Behavior with
DEFAULT_DIFFICULTY is bit-identical to before — golden tests enforce it.

- shared/types.ts: new DifficultyConfig interface + DEFAULT_DIFFICULTY
  ("ship-v1": base 0.016/18, the old scaleTable per player count, lives
  3/5/5/5, barriers 3/4/4/4, livesMode 'shared', zero waveRamp capped at
  wave 8) and validateDifficultyConfig (structural issues list).
  ScaledConfig gains a `barriers` field. GameState gains a `difficulty`
  snapshot so running games are self-describing and reducer-only sims
  can vary the config per run.

- getScaledConfig(playerCount, baseConfig) → (playerCount, wave,
  difficulty): reads the document and applies the wave ramp
  speedMult(w) = speedMult * (1 + speedPctPerWave * (min(w, cap) - 1))
  (same for shoot), floors the move interval with a ≥1 clamp, and sizes
  the lives pool (per-player mode multiplies by player count).
  Out-of-range player counts fall back to the 1-player entry wholesale
  (previously cols/rows fell back but lives were hardcoded to 5; the
  divergence only affected impossible counts like 0/5/-1).

- Reducer reads state.difficulty (tick, wipe spawn, START_SOLO lives) —
  never a module constant. GameRoom.startGame() resolves the document:
  env.DIFFICULTY_CONFIG (JSON string) if present and valid, else
  DEFAULT_DIFFICULTY; bad configs log difficulty_config_invalid and
  fall back, never crash a room. game_start now logs
  difficultyConfigName. createBarriers() takes the count from the
  scaled config instead of min(4, playerCount + 2); the now-dead
  MAX_BARRIER_COUNT / BARRIER_PLAYER_OFFSET constants are removed.

- wrangler.jsonc documents the var with a commented-out example;
  Env.DIFFICULTY_CONFIG added to worker/src/env.ts. Old persisted
  states migrate via GAME_STATE_DEFAULTS / migrateGameState.

Tests: golden per-player-count values (18/14/12/10 ticks, 0.016–0.040
shoot, 11×5→13×6 grids, 3/5 lives, 3/4 barriers), wave-ramp math + cap,
livesMode pool sizing, JSON round-trip, validateDifficultyConfig units,
and GameRoom env-override/fallback integration (valid, malformed JSON,
structurally invalid). Web contract allowlist classifies `difficulty`
as server-only.

https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv
Headless simulation of full games through the pure gameReducer:

- worker/src/sim/bot.ts: three fixed bot policies (random, novice,
  competent) as pure functions of (GameState, slot, gameSeed). Bots act
  through the same reducer actions real clients use (PLAYER_INPUT held
  keys + PLAYER_SHOOT); randomness is a pure hash of
  (gameSeed, slot, tick) — no Math.random anywhere.
- worker/src/sim/runner.ts: runSimGame() builds a startGame()-equivalent
  initial state (wipe_hold, barriers per difficulty config, seeded RNG)
  and loops bot intents + TICK to game_over or tick cap, returning a
  GameResult per spec §3.5. Default cap is 18,000 ticks (10 sim-min):
  the reducer's per-tick structuredClone limits throughput to ~5k
  ticks/s, so the spec's 54,000 default would make 500-game cells take
  ~30 min (§3.6 says lower the cap before the sample count).
- worker/src/sim/experiment.ts: (config × playerCount × policy × seed)
  grid with per-cell median/p25/p75, defeat-before-wave-2 rate, mean
  wave-1 lives lost, outcome breakdown.
- worker/src/sim/report.ts: markdown table grouped by config + JSON.
- worker/src/sim/cli.ts: `bun run sim -- --configs ... --games N`;
  built-in configs resolve from worker/src/sim/configs/.
- worker/src/sim/configs/: ship-v1 (= DEFAULT_DIFFICULTY), easier-multi,
  flat, classic-ramp — the four visualizer presets.
- worker/src/sim/sim.test.ts: §3.7 validation — determinism (same seed
  → deep-equal GameResult), skill ordering over 20 seeds (median
  survival competent > novice > random: ~13k > ~1k > ~0.6k ticks solo),
  wave progression (competent clears wave 1), config validation.

Not wired into the worker entry graph — src/index.ts does not import
sim/, so nothing here ships in the production bundle.

First measurement (ship-v1, competent, 20 seeds): solo median final
wave 16.5 with 0% defeat before wave 2; 4-player games end by invasion
at tick 190 with 100% defeat before wave 2 — the 13-column formation is
wider than the alien movement range, so it drops every move interval.
This is the quantified version of the "multiplayer is unfair" report.

https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv
The full experiment grid revealed that every config with 13-column
alien grids at 3p/4p ends in invasion within 6-10 seconds — the
formation is 115 cells wide vs an alien movement range of 110 cells,
so leftmost+rightmost columns hit walls every move interval and the
formation drops every tick. Not a difficulty problem; a layout bug.

ship-v2 keeps ship-v1's exact multipliers and lives but caps cols at
11 everywhere (4p moves to 11×6 = 66 aliens, vs the broken 13×6 = 78).
Solo is bit-identical to ship-v1 (G1 preserved). Re-running the
experiment grid will confirm whether the layout fix alone is enough
or whether the multipliers need re-tuning.

https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv
Both are ship-v2 (11-col layout fix) plus a modest wave ramp, addressing
the T3 "later waves never get harder" gap: v2.1 ramps +2% speed / +3%
shoot per wave, v2.2 ramps +3% / +5%, both capped at wave 10.
Experiment results to follow.

https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv
Designer-facing static HTML report (tools/difficulty-report.html) that
visualizes bot-sim results JSON: per-wave hazard curves (Aponte/Levieux/
Natkin failure probability), Kaplan-Meier survival with cap-censoring,
flow-channel challenge proxy with tunable band, wave pacing, and summary
cards anchored to the first config's solo cell. Loads files via drag-drop,
file picker, or bundled sample dataset (ship-v1 vs ship-v2, 800 games).

Pure transforms live in tools/difficulty-report-lib.js (UMD: browser
script + CommonJS) and are tested by `bun tools/difficulty-report-test.js`
(47 assertions against the real sample data, incl. the known ship-v1 4p
wave-1 wipe yielding hazard 1.0 and ship-v2 fixing it).

https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv
The ship-v2 commit added a fifth builtin without updating the
"exactly four built-in configs" assertion, and ship-v2.1/v2.2 were
only loadable by file path. All seven shipped configs are now
registered and the test validates each against
validateDifficultyConfig.

https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv
Drop both sample files onto tools/difficulty-report.html to compare
ship-v1, ship-v2, ship-v2.1 and ship-v2.2 across all views. Verdict
from this grid: ship-v2.1 (+2% speed/+3% shoot per wave, cap wave 10)
keeps solo within the G1 band (median wave 14 vs 15) while cutting
3p/4p cap-outs by half to two-thirds; ship-v2.2 over-corrects and
drags solo down 33%.

https://claude.ai/code/session_01N2xMGpx5Vq9TjHwxZ37iNv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants