This tutorial makes three assumptions
- It targets an AlpaSim user rather than an AlpaSim developer
- It treats docker compose` as the primary execution environment.
- It focuses on letting the user do simple things quick and leaves detail for later. This is reflected in subdivision into three levels of complexity.
In level 1 we run a default simulation with the VaVAM driver policy, learn how to interpret the results, and perform basic debugging.
AlpaSim consists of multiple networked microservices (renderer, physics simulation, runtime, controller, driver, traffic simulation). The AlpaSim runtime requests observed video frames from the renderer and egomotion history from the controller, communicates with the physics microservice to constrain actors to the road surface, and provides the information to the driver, with the expectation of receiving driving decisions in return to close the loop.
This repository contains the implementations of a subset of the services needed to execute the simulation as well as config files and infra code necessary to bring the microservices up via docker/enroot.
Let's start by executing a run with default settings.
- Follow instructions in onboarding to ensure necessary dependencies have been installed
- Set up your environment with:
source setup_local_env.sh- This will compile protos, download an example driver model, ensure you have a valid Hugging
Face token, and install the
alpasim_wizardcommand line tool.
- Run the wizard to create the necessary config files, download the scene (if necessary), and run a
simulation _
alpasim_wizard +deploy=local wizard.log_dir=$PWD/tutorial_ This will create atutorial/directory with all necessary config files and run the simulation
The simulation logs/output will be in the created tutorial directory. For a visualization of the
results, an mp4 file is created in tutorial/eval/videos/clipgt-026d..._0.mp4. The full results
should looks something like:
tutorial/
├── aggregate
│ ├── metrics_results.png
│ ├── metrics_results.txt
│ ├── metrics_unprocessed.parquet
│ └── videos
│ ├── all
│ │ └── clipgt-026d6a39-bd8f-4175-bc61-fe50ed0403a3_814f3c22-bb78-11f0-a5f3-2f64b47b8685_0.mp4
│ └── violations
│ ├── collision_at_fault
│ ├── collision_rear
│ ├── dist_to_gt_trajectory
│ │ └── clipgt-026d6a39-bd8f-4175-bc61-fe50ed0403a3_814f3c22-bb78-11f0-a5f3-2f64b47b8685_0.mp4 -> ../../all/clipgt-026d6a39-bd8f-4175-bc61-fe50ed0403a3_814f3c22-bb78-11f0-a5f3-2f64b47b8685_0.mp4
│ └── offroad
├── asl
│ └── clipgt-026d6a39-bd8f-4175-bc61-fe50ed0403a3
│ └── 814f3c22-bb78-11f0-a5f3-2f64b47b8685
│ ├── 0.asl
│ └── _complete
├── avmf-config.yaml
├── controller
│ └── alpasim_controller_814f3c22-bb78-11f0-a5f3-2f64b47b8685.csv
├── docker-compose.yaml
├── driver
│ └── vam-driver.yaml
├── driver-config.yaml
├── eval
│ ├── metrics_unprocessed.parquet
│ └── videos
│ └── clipgt-026d6a39-bd8f-4175-bc61-fe50ed0403a3_814f3c22-bb78-11f0-a5f3-2f64b47b8685_0.mp4
├── eval-config.yaml
├── generated-network-config.yaml
├── generated-user-config-0.yaml
├── metrics
├── run_metadata.yaml
├── run.sh
├── trafficsim-config.yaml
├── txt-logs
├── wizard-config-loadable.yaml
└── wizard-config.yaml
Some noteworthy files and directories:
aslcontains logs of simulation messages between components in each rollout and can be used to analyze AV behavior and calculate metrics. The logs are organized intoasl/{scenario.scene_id}/{rollout_id}.*- in this case we have 1 scenes with one batch of a single rollout..aslfiles which record the messages exchanged within the simulation. These are useful for debugging the simulator behavior and replaying events.
eval/contains per-rollout evaluation results:metrics_unprocessed.parquet- Raw driving quality metrics for each rolloutvideos/- Video recordings of each rollout
aggregate/contains aggregated results across all rollouts:metrics_results.txt- Formatted table of driving scores (mean, std, quantiles)metrics_results.png- Visual summary of driving quality metricsmetrics_unprocessed.parquet- Combined metrics from all rolloutsvideos/- Videos organized by violation type (collision_at_fault, offroad, etc.)
metrics/contains performance profiling data (see OPERATIONS.md for details):metrics.prom- Prometheus metrics from simulationmetrics_plot.png- Performance visualization (CPU/GPU/RPC metrics)
driveris a directory with logs written by the driver service, useful to debug policy-internal problems.wizard-config.yamlcontains the config the wizard used for this run after applying the inheritance of hydra. This is useful for debugging configuration issues.generated-user-config-{ARRAY_ID}.yamlcontains an expanded version of the simulation config provided by the user, possibly split into chunks when simulating on multiple nodes.trafficsim-config.yaml. A copy of the traffic simulation config used for simulation, useful for debugging traffic simulation.generated-network-config.yamldescribes which services listen on which ports during simulation. Not useful unless debugging the simulator itself.
If everything went correctly asl and eval are usually the only results of interest. For
understanding driving quality metrics and performance tuning, see the
Operations Guide.
⚠️ This section is about debugging the configuration of the simulator itself (not of vehicle behavior within simulation)
The console contains logs from all microservices, and is the first place one should look when
something goes wrong. When an error happens (for example the asl directory does not appear), it's
best to consult that log to see where the first errors occurred. The microservices may produce
additional logs that can be useful for debugging, but that is not covered here.
In level 2 we learn to customize the simulation (i.e. change the driver policy, change simulated scenes, etc.) and understand the architecture in more depth.
AlpaSim wizard is configured via hydra and takes in a .yaml
configuration file and arbitrary command line overrides. Example config files are in
src/wizard/configs/. We suggest reading base_config.yaml,
which has detailed comments on the configuration fields.
Under the top-level runtime item in the base_config.yaml, we describe the details of the
simulation to be performed (as opposed to deployment settings under wizard.* and services.*).
The important configurable fields of runtime are:
save_dir- the name of the directory where to saveasllogs. It needs to be kept in sync with wizard mount points. certain modulesendpoints- used to configure simulator scaling propertiesdefault_scenario_parameters- specify all the simulation parameters (e.g. timing, cameras, vehicle configuration, etc.).
For example, one might change the number of rollouts per scene generated in the configuration files by running the wizard as follows:
alpasim_wizard +deploy=local wizard.log_dir=<dir> runtime.default_scenario_parameters.n_rollouts=8The driver in AlpaSim is a policy for the ego vechicle that takes in sensor inputs and optional navigation commands, and outputs a trajectory for the ego vehicle to follow, along with other optional outputs, such as chain-of-causation reasoning text.
The driver is specfied by a pair of config files under src/wizard/configs/, one for the driver
service itself, and one for the runtime (so that it provides the inputs required for the specific
driver).
The wizard uses VaVAM as the default driver. To explicitly define the driver config, one can use:
alpasim_wizard +deploy=local wizard.log_dir=$PWD/tutorial_alpamayo driver=[vavam,vavam_runtime_configs]To run with the Alpamayo-R1 10b model use
driver=[ar1,ar1_runtime_configs].
First, one may download the model weights from HuggingFace:
huggingface-cli download nvidia/Alpamayo-R1-10BThe wizard will use the HF_HOME environment variable to find the system HuggingFace cache
(~/.cache/huggingface by default). If the model weights do not exists locally, the driver service
will automatiocally download them, but the download may timeout, requiring you to re-run.
Alternatively, you can specify the path to the model directory by setting the
model.checkpoint_path configuration field.
Then run the wizard with the following command:
alpasim_wizard +deploy=local wizard.log_dir=$PWD/tutorial_alpamayo driver=[ar1,ar1_runtime_configs]
⚠️ The Alpamayo R1 model is large (10b parameters)--please ensure that your GPU has the capacity to run it.
To visualize the predicted chain-of-causation reaoning you can change the generated video layout
alpasim_wizard +deploy=local wizard.log_dir=$PWD/tutorial_alpamayo driver=[ar1,ar1_runtime_configs] eval.video.video_layouts=[REASONING_OVERLAY]As an example for how to integrate a different driver model, we provide a provisional integration for the Transfuser policy, specifically the Latent TransFuser v6 (LTFv6) model developed for NAVSIM.
To run with the Transfuser model use driver=[transfuser,transfuser_runtime_configs].
First, one must download the Transfuser model weights/config from HuggingFace:
huggingface-cli download longpollehn/tfv6_navsim model_0060.pth --local-dir=data/drivers/transfuser/
huggingface-cli download longpollehn/tfv6_navsim config.json --local-dir=data/drivers/transfuser/Then, run the wizard with the following command:
alpasim_wizard +deploy=local wizard.log_dir=$PWD/tutorial_transfuser driver=[transfuser,transfuser_runtime_configs]If you would like to force the ego vehicle to follow its recorded trajectory, instead of following
the predictions of a policy, you can set
runtime.endpoints.{physics,trafficsim,controller}.skip: true,
runtime.default_scenario_parameters.physics_update_mode: NONE and
runtime.default_scenario_parameters.force_gt_duration_us to a very high value (20s+).
The scene in AlpaSim is a NuRec reconstruction of a real-world driving log.
Publicly available NuRec scenes are stored on
Hugging Face
and, once downloaded, are placed under data/nre-artifacts/all-usdzs. The scenes are identified by
their uuid, rather than their filenames, to prevent versioning issues. The list of currently
available scenes exists in scenes set and the set of available suites
exists in scene suites.
For custom scene selection, you can specify scenes manually using scenes.scene_ids:
alpasim_wizard +deploy=local wizard.log_dir=$PWD/tutorial_2 scenes.scene_ids=['clipgt-02eadd92-02f1-46d8-86fe-a9e338fed0b6']If necessary, the scene will automatically be downloaded from Hugging Face to your local
data/nre-artifacts/all-usdzs directory. If the download is necessary, ensure you have set your
Hugging Face token in the HF_TOKEN environment variable as described in the onboarding
instructions.
📗 Scene ids are defined/viewable in
data/scenes/sim_scenes.csv⚠️ A scene id does not uniquely identify theusdzfile as the scene id comes from themetadata.yamlfile inside theusdzzip file. The proper artifact file will be chosen to satisfy the NRE version requirements.
Scene suites provide pre-validated collections of scenes for testing. To use the public sceneset with 901 validated scenes (:warning: this will download all the scenes):
alpasim_wizard +deploy=local scenes.test_suite_id=public_2507_ex_failures wizard.log_dir=$PWD/tutorial_suiteThis will run simulations across all 910 scenes in the public_2507_ex_failures suite, which
excludes problematic scenes from the full 25.07 release dataset.
Code changes in the repo are automatically mounted into the docker containers at runtime, with the
exception that the virtual environment of the container is not synced, so changes that rely on new
dependencies will require rebuilding the container image. To try this out, one can add some logging
statements to the driver code in src/driver/src/alpasim_driver/ and rerun the wizard.
The simulation is split into multiple microservices, each running in its own docker container. The
primary requirement for a custom container image is that it exposes a gRPC endpoint compatible with
the expected service interface. The default images used for each service are specified in
stable_manifest; however, these can be overridden
by setting services.<service>.image to the desired image name and updating the relevant service
command services.<service>.command. For more information about the service interfaces, please see
the protocol buffer definitions.
asl contains most of messages exchanged in the course of a batch simulation as size-delimited
protobuf messages. These files can be read to access detailed information about the course of the
simulation. Aside from being used for evaluation, they can also be useful for debugging model or
simulation behavior. This notebook shows an
example of reading an asl log and "replaying the stimuli" on a driver instance, allowing for
reproducing behavior with your favorite debugger attached.
In level 3 we show how to circumvent the alpasim_wizard defined components: this enables use cases
such as enabling breakpoint debugging in components or even replacing components entirely. The basic
idea behind the approach is to:
- Use the
alpasim_wizardto generate config files without actually running the simulation - Manually start the desired components with the generated config files
- Use the
alpasim_wizardgenerated config files to run the rest of the simulation as normal.
The following steps might be used to show how to debug the controller component with breakpoints in the context of a full simulation.
-
(Terminal 1) Run the wizard to generate config files without running the simulation:
alpasim_wizard +deploy=local wizard.log_dir=$PWD/tutorial_dbg wizard.run_method=NONE wizard.debug_flags.use_localhost=True -
(Terminal 1)
cdto the generated directory (tutorial_dbg) and note the command/port of the component to be replaced indocker-compose.yaml. For the simulation case, we are looking for components in thesimprofile, which includescontroller-0,driver-0,physics-0,runtime-0, andsensorsim-0. Here we will replacecontroller-0, which in this case has been allocated port 6003. -
(Terminal 2)
cdinto the the controller src directory (<repo_root>/src/controller/) and prepare to start the controller. Note that there are various ways to accomplish this, including through an IDE. Add breakpoints as desired in the controller code and then start the controller with:cd <repo_root>/src/controller/ mkdir my_controller_log_dir # Note: port (6003 in this case) must match the port allocated in docker-compose.yaml uv run python -m alpasim_controller.server --port=6003 --log_dir=my_controller_log_dir --log-level=INFO
-
(Terminal 1) Start the rest of the simulation with docker compose:
docker compose -f docker-compose.yaml --profile sim up runtime-0 driver-0 physics-0 sensorsim-0
For VSCode users, instead of running the controller from the command line (step 3), you can use the built-in debugger:
- Create or update
.vscode/launch.jsonwith:
{
"version": "0.2.0",
"configurations": [
{
"name": "Debug Controller (Level 3 Tutorial)",
"type": "debugpy",
"request": "launch",
"module": "alpasim_controller.server",
"justMyCode": false,
"cwd": "${workspaceFolder}/src/controller",
"args": ["--port=6003", "--log_dir=my_controller_logdir", "--log-level=INFO"],
"console": "integratedTerminal"
}
]
}- Set breakpoints in the controller code
- Press F5 (or go to Run and Debug → "Debug Controller")
- Your breakpoints will hit as the simulation runs!
Note: Make sure the --port argument matches the port allocated in docker-compose.yaml.
If the runtime is the service being debugged, there are a few things that change. For one, it is
expected that the other services are up and running before the runtime is brought up, so the
ordering of steps will change. Additionally, one can speed up iteration by preventing the simulation
from shutting down the docker containers after each simulation by setting
runtime.endpoints.do_shutdown=False in the wizard command line.
- (Terminal 1) Run the wizard to generate config files without running the simulation:
alpasim_wizard +deploy=local \ wizard.log_dir=$PWD/tutorial_dbg_runtime \ wizard.run_method=NONE \ wizard.debug_flags.use_localhost=True \ runtime.endpoints.do_shutdown=False - (Terminal 1)
cdto the generated directory (tutorial_dbg_runtime) and start the non-runtime services:bash docker compose -f docker-compose.yaml --profile sim up driver-0 controller-0 physics-0 sensorsim-0 - (Terminal 2)
cdinto the the runtime src directory (<repo_root>/src/runtime/) and prepare to start the runtime. The exact command paths will vary, but, to use the configuration generated from the earlier steps, an example command would be:bash cd <repo_root>/src/runtime/ # Following command is based on the docker-compose.yaml generated by the wizard uv run python -m alpasim_runtime.simulate \ --usdz-glob=../../data/nre-artifacts/all-usdzs/**/*.usdz \ --user-config=../../tutorial_dbg_runtime/generated-user-config-0.yaml \ --network-config=../../tutorial_dbg_runtime/generated-network-config.yaml \ --log-dir=../../tutorial_dbg_runtime \ --log-level=INFO
For VSCode users, instead of running the runtime from the command line (step 3), you can use the built-in debugger:
- Add this configuration to
.vscode/launch.json:
{
"name": "Debug Runtime (Level 3 Tutorial)",
"type": "debugpy",
"request": "launch",
"module": "alpasim_runtime.simulate",
"justMyCode": false,
"cwd": "${workspaceFolder}/src/runtime",
"args": [
"--usdz-glob=../../data/nre-artifacts/all-usdzs/**/*.usdz",
"--user-config=../../tutorial_dbg_runtime/generated-user-config-0.yaml",
"--network-config=../../tutorial_dbg_runtime/generated-network-config.yaml",
"--log-dir=../../tutorial_dbg_runtime",
"--log-level=INFO"
],
"console": "integratedTerminal"
}- Set breakpoints in the runtime code
- Press F5 (or go to Run and Debug → "Debug Runtime")
- Your breakpoints will hit as the simulation runs!