(WIP) do not merge - Add nightly speed runs to check how fast we can solve a single map#439
Open
eugenevinitsky wants to merge 22 commits into
Open
(WIP) do not merge - Add nightly speed runs to check how fast we can solve a single map#439eugenevinitsky wants to merge 22 commits into
eugenevinitsky wants to merge 22 commits into
Conversation
New [env] knob goal_placement (gigaflow): 0 = route (existing forward-route waypoints), 1 = random. In random mode compute_goals samples each target waypoint at a random drivable point anywhere on the map, rejecting points within min_waypoint_spacing of the agent, so goals can land in any direction including behind it. The route/path are still built at spawn for lane observations and progression; only the goal positions change. Goal-reached stays a pure euclidean check, so random goals reward correctly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Manual launcher for single-agent-per-map training: one agent per env on Town02 only, many envs, goal_placement=random (goals anywhere on the map incl. behind). Auto-creates the Town02-only map dir, runs gpu_heartbeat alongside, wandb-tracked. Scheduling and scale (NUM_AGENTS) are overridable without editing the file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Launch the single-agent speed run as a 3-task SLURM array (one seed each via --train.seed), all logged to wandb group "Nightly Test". max_goal_position=1000 and total_timesteps capped at 1B. Town02-folder create is atomic so concurrent array tasks can't read a partial map file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds an opt-in [env] knob use_map_cache (default 0). When set, environments loading the same map file share one read-only copy of the static geometry -- road_elements, the spatial grid (cells + neighbor cache), and the lane graph (incl. its O(n^2) distance matrix) -- via a per-process, reference-counted cache keyed by map filename. Per-env mutable state (agents, traffic-light states) is never shared, so dynamic traffic lights stay correct. This removes the per-env map duplication that OOMs single-map / many-env runs (e.g. 1 map x 1024 single-agent envs): the geometry is built once and borrowed. Lifecycle: init() builds-or-borrows; c_close() decrements the refcount and frees the entry only on the last reference. A getpid() guard prevents a forked worker from freeing the parent's copy-on-write geometry (the use-after-free that the upstream WIP #346 hit). Default off keeps existing single-owner behavior; the single_agent_speed_run launcher passes --env.use_map_cache 1. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pass --env.traffic_light_behavior 0 so the agent isn't stopped/penalized at lights during point-to-point speed runs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The dynamic pufferl argparser registers dotted [env]/[train] keys with hyphens (map_dir -> --env.map-dir), so the underscore forms were rejected as unrecognized arguments. Convert all dotted overrides to hyphens. Also disable the nuPlan-based evaluators (validation_replay + behavior buckets) for single-map CARLA training via --eval.<name>.enabled 0; validation_gigaflow stays enabled. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
It inherited enabled from validation_defaults, so the argparser never registered --eval.validation-replay.enabled and CLI overrides were rejected. Declaring it explicitly (same true default) makes it toggleable, matching the behaviors_* sections. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Dedicated, disabled-by-default evaluator mirroring the single-agent speed-run task (Town02, one agent, random goals, lights off) with render_backend=obs_html. Lets us faithfully obs-render any checkpoint via puffer eval puffer_drive --evaluator single_agent_obs --load-model-path <ckpt> without it firing inline during training (enabled=false; standalone-by-name still runs it). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
With goal_placement=random, min_waypoint_spacing is the rejection floor on goal distance; 0 lets goals land right next to the agent (near goals, not only far). Set in the training launcher and the single_agent_obs render config. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Remove the [eval.single_agent_obs] section and the [eval.validation_replay] enabled= line; keep drive.ini as the shared default. The general goal_placement and use_map_cache [env] knobs stay (default off) since they're required for the CLI flags to exist. The launcher no longer tries to disable validation_replay (its enabled is inherited, not CLI-overridable without a drive.ini line). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…_obs) These are eval-scoped and do not alter default training: validation_replay's enabled was already inherited-true (now explicit, CLI-toggleable); single_agent_obs is enabled=false so it never runs inline. Re-enable the launcher's validation_replay disable for single-agent runs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
traffic_light_behavior=0 only stops the env force-halting at reds; the agent still learns to stop because a red-light violation zeroes the reward multiplier (no_red_light, drive.h). The red-light metric is gated on max_traffic_control_observations, so setting it to 0 removes lights from the observation AND keeps red_light_violation_rate at 0 (multiplier never zeroed) -- the agent can neither see nor be penalized by lights. Set in the launcher and the single_agent_obs render config. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a SLURM “nightly speed run” training job and extends the Drive environment to support (1) random goal placement across the map and (2) an optional per-process shared map-geometry cache to reduce repeated map load/build overhead.
Changes:
- Introduce
scripts/single_agent_speed_run.sbatchfor a 3-seed nightly single-map (Town02) training sweep with evaluator disabling. - Add
goal_placement(route vs random) anduse_map_cache(share static map geometry) plumbing from config → Python env → C env. - Extend
drive.iniwith new env keys and a disabled-by-default evaluator preset (eval.single_agent_obs) for manual obs_html rendering.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/single_agent_speed_run.sbatch | New SLURM script to run the single-agent nightly “speed run” training sweep. |
| pufferlib/ocean/drive/drive.py | Adds validation and passes through new env kwargs (goal_placement, use_map_cache). |
| pufferlib/ocean/drive/drive.h | Implements random goal sampling and a shared map-geometry cache with refcounted entries. |
| pufferlib/ocean/drive/binding.c | Unpacks new kwargs into the C Drive struct. |
| pufferlib/config/ocean/drive.ini | Adds new env keys and a disabled-by-default evaluator config for the speed-run task. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+57
to
+60
| ; Goal placement (gigaflow) - options: 0 - route (forward waypoints), 1 - random (anywhere on map) | ||
| goal_placement = 0 | ||
| ; Share static map geometry (roads/grid/lane-graph) across envs using the same map - 0 off, 1 on | ||
| use_map_cache = 0 |
Comment on lines
+1918
to
+1921
| for (int attempt = 0; attempt < MAX_GOAL_ATTEMPTS; attempt++) { | ||
| int list_idx = rand() % env->grid_map->num_drivable_grid_cell; | ||
| int grid_idx = env->grid_map->grid_index_drivable[list_idx]; | ||
|
|
Comment on lines
+1922
to
+1934
| GridMapEntity cell_candidates[MAX_ENTITIES_PER_CELL]; | ||
| int candidate_count = 0; | ||
| for (int j = 0; j < env->grid_map->cell_entities_count[grid_idx]; j++) { | ||
| GridMapEntity entity = env->grid_map->cells[grid_idx][j]; | ||
| if (is_drivable_road_lane(env->road_elements[entity.entity_idx].type)) { | ||
| cell_candidates[candidate_count++] = entity; | ||
| } | ||
| } | ||
| if (candidate_count == 0) { | ||
| continue; | ||
| } | ||
|
|
||
| GridMapEntity chosen = cell_candidates[rand() % candidate_count]; |
Comment on lines
+307
to
+313
| // Per-process map cache. Built lazily in init(). g_map_cache_pid stamps the | ||
| // process that built it: a forked worker inherits these pointers via copy-on-write | ||
| // and must never free them (it would corrupt the parent's heap), so the free path | ||
| // in c_close is guarded by getpid() == g_map_cache_pid. | ||
| static struct SharedMapData **g_map_cache = NULL; | ||
| static int g_map_cache_count = 0; | ||
| static pid_t g_map_cache_pid = 0; |
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Set WANDB_NAME=<YYYY-MM-DD>_single_agent_seed<N> (WandbLogger passes no name=, so the env var sets the run name). Date-first so runs sort chronologically per launch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… bound compute_goals_random previously only enforced a min distance and sampled uniformly over the whole map (far-skewed, no cap). Now it accepts points in [min_waypoint_spacing, max_waypoint_spacing], with a closest-to-band fallback so a tight max still lands a nearby goal. The random-mode contexts (single-agent launcher + single_agent_obs render) default max_waypoint_spacing to a huge value (= anywhere on the map), preserving prior behavior; the global [env] default stays 60 for route mode / default training. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ngle_agent_speed_runs
free_shared_map_data NULLs an entry's slot on refcount zero; reuse those holes on the next insert so a resample cycle (free all, rebuild) keeps g_map_cache_count bounded by the number of distinct maps instead of appending past NULL holes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Move the single-agent Town02 / random-goal / lights-off knobs into a program_config YAML so the run goes through the canonical submit_cluster.py path (code isolation + container + heartbeat) instead of a hand-rolled sbatch. Launch 3 seeds via --args train.seed=0:1:2. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
submit_cluster.py joins the inner command into a bash -c string without quoting arg values, so a space in --wandb-group split the arg and crashed argparse. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Superseded by the submit_cluster.py path (scripts/cluster_configs/single_agent_speed_run.yaml), which provides per-run code isolation, container wrapping, and the heartbeat. The hand-rolled sbatch ran from the shared checkout with no isolation, which is what let a rebuild SIGBUS the live seeds. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…d ablation) Self-contained variant of single_agent_speed_run.yaml with lane (lane_align, lane_center) and velocity (vel_align, velocity, overspeed) rewards zeroed, so the ablation is defined by a config file rather than ad-hoc --args. seed stays on --args. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Override the inherited validation_gigaflow scenario_length=500 / render_max_steps=300 so the obs render matches the single-agent training episode length (1280 steps). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.