(WIP) do not merge - Add nightly speed runs to check how fast we can solve a single map by eugenevinitsky · Pull Request #439 · Emerge-Lab/PufferDrive

eugenevinitsky · 2026-05-23T15:09:34Z

No description provided.

New [env] knob goal_placement (gigaflow): 0 = route (existing forward-route waypoints), 1 = random. In random mode compute_goals samples each target waypoint at a random drivable point anywhere on the map, rejecting points within min_waypoint_spacing of the agent, so goals can land in any direction including behind it. The route/path are still built at spawn for lane observations and progression; only the goal positions change. Goal-reached stays a pure euclidean check, so random goals reward correctly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Manual launcher for single-agent-per-map training: one agent per env on Town02 only, many envs, goal_placement=random (goals anywhere on the map incl. behind). Auto-creates the Town02-only map dir, runs gpu_heartbeat alongside, wandb-tracked. Scheduling and scale (NUM_AGENTS) are overridable without editing the file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Launch the single-agent speed run as a 3-task SLURM array (one seed each via --train.seed), all logged to wandb group "Nightly Test". max_goal_position=1000 and total_timesteps capped at 1B. Town02-folder create is atomic so concurrent array tasks can't read a partial map file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds an opt-in [env] knob use_map_cache (default 0). When set, environments loading the same map file share one read-only copy of the static geometry -- road_elements, the spatial grid (cells + neighbor cache), and the lane graph (incl. its O(n^2) distance matrix) -- via a per-process, reference-counted cache keyed by map filename. Per-env mutable state (agents, traffic-light states) is never shared, so dynamic traffic lights stay correct. This removes the per-env map duplication that OOMs single-map / many-env runs (e.g. 1 map x 1024 single-agent envs): the geometry is built once and borrowed. Lifecycle: init() builds-or-borrows; c_close() decrements the refcount and frees the entry only on the last reference. A getpid() guard prevents a forked worker from freeing the parent's copy-on-write geometry (the use-after-free that the upstream WIP #346 hit). Default off keeps existing single-owner behavior; the single_agent_speed_run launcher passes --env.use_map_cache 1. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Pass --env.traffic_light_behavior 0 so the agent isn't stopped/penalized at lights during point-to-point speed runs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The dynamic pufferl argparser registers dotted [env]/[train] keys with hyphens (map_dir -> --env.map-dir), so the underscore forms were rejected as unrecognized arguments. Convert all dotted overrides to hyphens. Also disable the nuPlan-based evaluators (validation_replay + behavior buckets) for single-map CARLA training via --eval.<name>.enabled 0; validation_gigaflow stays enabled. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

It inherited enabled from validation_defaults, so the argparser never registered --eval.validation-replay.enabled and CLI overrides were rejected. Declaring it explicitly (same true default) makes it toggleable, matching the behaviors_* sections. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Dedicated, disabled-by-default evaluator mirroring the single-agent speed-run task (Town02, one agent, random goals, lights off) with render_backend=obs_html. Lets us faithfully obs-render any checkpoint via puffer eval puffer_drive --evaluator single_agent_obs --load-model-path <ckpt> without it firing inline during training (enabled=false; standalone-by-name still runs it). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

With goal_placement=random, min_waypoint_spacing is the rejection floor on goal distance; 0 lets goals land right next to the agent (near goals, not only far). Set in the training launcher and the single_agent_obs render config. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Remove the [eval.single_agent_obs] section and the [eval.validation_replay] enabled= line; keep drive.ini as the shared default. The general goal_placement and use_map_cache [env] knobs stay (default off) since they're required for the CLI flags to exist. The launcher no longer tries to disable validation_replay (its enabled is inherited, not CLI-overridable without a drive.ini line). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…_obs) These are eval-scoped and do not alter default training: validation_replay's enabled was already inherited-true (now explicit, CLI-toggleable); single_agent_obs is enabled=false so it never runs inline. Re-enable the launcher's validation_replay disable for single-agent runs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

traffic_light_behavior=0 only stops the env force-halting at reds; the agent still learns to stop because a red-light violation zeroes the reward multiplier (no_red_light, drive.h). The red-light metric is gated on max_traffic_control_observations, so setting it to 0 removes lights from the observation AND keeps red_light_violation_rate at 0 (multiplier never zeroed) -- the agent can neither see nor be penalized by lights. Set in the launcher and the single_agent_obs render config. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Adds a SLURM “nightly speed run” training job and extends the Drive environment to support (1) random goal placement across the map and (2) an optional per-process shared map-geometry cache to reduce repeated map load/build overhead.

Changes:

Introduce scripts/single_agent_speed_run.sbatch for a 3-seed nightly single-map (Town02) training sweep with evaluator disabling.
Add goal_placement (route vs random) and use_map_cache (share static map geometry) plumbing from config → Python env → C env.
Extend drive.ini with new env keys and a disabled-by-default evaluator preset (eval.single_agent_obs) for manual obs_html rendering.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
scripts/single_agent_speed_run.sbatch	New SLURM script to run the single-agent nightly “speed run” training sweep.
pufferlib/ocean/drive/drive.py	Adds validation and passes through new env kwargs (`goal_placement`, `use_map_cache`).
pufferlib/ocean/drive/drive.h	Implements random goal sampling and a shared map-geometry cache with refcounted entries.
pufferlib/ocean/drive/binding.c	Unpacks new kwargs into the C Drive struct.
pufferlib/config/ocean/drive.ini	Adds new env keys and a disabled-by-default evaluator config for the speed-run task.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+; Goal placement (gigaflow) - options: 0 - route (forward waypoints), 1 - random (anywhere on map)
+goal_placement = 0
+; Share static map geometry (roads/grid/lane-graph) across envs using the same map - 0 off, 1 on
+use_map_cache = 0


+        for (int attempt = 0; attempt < MAX_GOAL_ATTEMPTS; attempt++) {
+            int list_idx = rand() % env->grid_map->num_drivable_grid_cell;
+            int grid_idx = env->grid_map->grid_index_drivable[list_idx];
+


+            GridMapEntity cell_candidates[MAX_ENTITIES_PER_CELL];
+            int candidate_count = 0;
+            for (int j = 0; j < env->grid_map->cell_entities_count[grid_idx]; j++) {
+                GridMapEntity entity = env->grid_map->cells[grid_idx][j];
+                if (is_drivable_road_lane(env->road_elements[entity.entity_idx].type)) {
+                    cell_candidates[candidate_count++] = entity;
+                }
+            }
+            if (candidate_count == 0) {
+                continue;
+            }
+
+            GridMapEntity chosen = cell_candidates[rand() % candidate_count];


+// Per-process map cache. Built lazily in init(). g_map_cache_pid stamps the
+// process that built it: a forked worker inherits these pointers via copy-on-write
+// and must never free them (it would corrupt the parent's heap), so the free path
+// in c_close is guarded by getpid() == g_map_cache_pid.
+static struct SharedMapData **g_map_cache = NULL;
+static int g_map_cache_count = 0;
+static pid_t g_map_cache_pid = 0;


Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Set WANDB_NAME=<YYYY-MM-DD>_single_agent_seed<N> (WandbLogger passes no name=, so the env var sets the run name). Date-first so runs sort chronologically per launch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… bound compute_goals_random previously only enforced a min distance and sampled uniformly over the whole map (far-skewed, no cap). Now it accepts points in [min_waypoint_spacing, max_waypoint_spacing], with a closest-to-band fallback so a tight max still lands a nearby goal. The random-mode contexts (single-agent launcher + single_agent_obs render) default max_waypoint_spacing to a huge value (= anywhere on the map), preserving prior behavior; the global [env] default stays 60 for route mode / default training. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ngle_agent_speed_runs

free_shared_map_data NULLs an entry's slot on refcount zero; reuse those holes on the next insert so a resample cycle (free all, rebuild) keeps g_map_cache_count bounded by the number of distinct maps instead of appending past NULL holes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Move the single-agent Town02 / random-goal / lights-off knobs into a program_config YAML so the run goes through the canonical submit_cluster.py path (code isolation + container + heartbeat) instead of a hand-rolled sbatch. Launch 3 seeds via --args train.seed=0:1:2. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

submit_cluster.py joins the inner command into a bash -c string without quoting arg values, so a space in --wandb-group split the arg and crashed argparse. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Superseded by the submit_cluster.py path (scripts/cluster_configs/single_agent_speed_run.yaml), which provides per-run code isolation, container wrapping, and the heartbeat. The hand-rolled sbatch ran from the shared checkout with no isolation, which is what let a rebuild SIGBUS the live seeds. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…d ablation) Self-contained variant of single_agent_speed_run.yaml with lane (lane_align, lane_center) and velocity (vel_align, velocity, overspeed) rewards zeroed, so the ablation is defined by a config file rather than ad-hoc --args. seed stays on --args. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Override the inherited validation_gigaflow scenario_length=500 / render_max_steps=300 so the obs render matches the single-agent training episode length (1280 steps). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Eugene Vinitsky and others added 11 commits May 22, 2026 22:09

scripts: turn traffic lights off for single-agent speed runs

befb01e

Pass --env.traffic_light_behavior 0 so the agent isn't stopped/penalized at lights during point-to-point speed runs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 23, 2026 15:09

Copilot started reviewing on behalf of eugenevinitsky May 23, 2026 15:09 View session

Copilot AI reviewed May 23, 2026

View reviewed changes

Eugene Vinitsky and others added 10 commits May 23, 2026 11:26

scripts: drop stale scheduling/logs comment from single-agent launcher

6a6c2e8

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

scripts: date-stamp wandb run name for single-agent launches

2def571

Set WANDB_NAME=<YYYY-MM-DD>_single_agent_seed<N> (WandbLogger passes no name=, so the env var sets the run name). Date-first so runs sort chronologically per launch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/emerge/temp_training' into ev/si…

54bea3d

…ngle_agent_speed_runs

scripts: single-agent wandb_group Nightly_Test (no space)

7d84e78

submit_cluster.py joins the inner command into a bash -c string without quoting arg values, so a space in --wandb-group split the arg and crashed argparse. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

drive.ini: single_agent_obs renders full 1280-step episodes

7d931ef

Override the inherited validation_gigaflow scenario_length=500 / render_max_steps=300 so the obs render matches the single-agent training episode length (1280 steps). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(WIP) do not merge - Add nightly speed runs to check how fast we can solve a single map#439

(WIP) do not merge - Add nightly speed runs to check how fast we can solve a single map#439
eugenevinitsky wants to merge 22 commits into
emerge/temp_trainingfrom
ev/single_agent_speed_runs

eugenevinitsky commented May 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eugenevinitsky commented May 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants