Skip to content

Add Nuplan central token extraction and USDZ dataset support for Alpasim integration#62

Open
WCJ-BERT wants to merge 4 commits intoNVlabs:alpasimfrom
WCJ-BERT:alpasim
Open

Add Nuplan central token extraction and USDZ dataset support for Alpasim integration#62
WCJ-BERT wants to merge 4 commits intoNVlabs:alpasimfrom
WCJ-BERT:alpasim

Conversation

@WCJ-BERT
Copy link
Copy Markdown

Overview

This PR enables trajdata as a unified data source for Alpasim by adding:

  1. USDZ dataset support - support for Alpasim's USDZ data format
  2. Nuplan central token extraction - Efficient scenario fragment extraction
  3. Multiprocessing parameter passing - Proper config propagation in parallel mode

All changes are fully backward compatible with existing trajdata usage, and ready to support Unified Scene Data Flow changes in Alpasim.

WCJ-BERT and others added 4 commits March 19, 2026 10:00
Add USDZ dataset implementation with the following features:
- Wrap alpasim_utils.Artifact for stable USDZ parsing
- Convert Artifact data to trajdata's standard DataFrame format
- Extract velocity and acceleration from trajectory positions
- Use project's arr_utils.quaternion_to_yaw() for consistency
- Calculate actual dt from timestamps (fallback to 0.1s)
- Optimize derivative computation with single groupby pass
- Support maps and agent metadata extraction

Technical improvements:
- Proper quaternion order conversion ([x,y,z,w] -> [w,x,y,z])
- Dynamic time step calculation from trajectory timestamps
- Efficient pandas operations for velocity/acceleration
- Clean integration with env_utils

Total: 461 lines of new USDZ dataset code
- Add dataset_kwargs parameter to ParallelDatasetPreprocessor
- Pass dataset_kwargs through multiprocessing to child processes
- Save dataset_kwargs in UnifiedDataset for parallel preprocessing
- This enables dataset-specific configurations to work in parallel mode

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…path

Major changes:
- Change dataset_kwargs format from flat to nested dict structure
  Old: dataset_kwargs={'param': value} (shared by all datasets)
  New: dataset_kwargs={'dataset_name': {'param': value}} (per-dataset)

- Remove yaml_config_path parameter from NuplanDataset and NuPlanObject
  Use central_tokens_config directly instead

- Update env_utils.get_raw_datasets() to support nested dict format
  Each dataset now receives only its own specific parameters

- Optimize parallel preprocessing to avoid redundant parameter extraction
  ParallelDatasetPreprocessor now correctly handles nested dict format

- Clean up df_cache.py: remove unused resolution parameter

Benefits:
- Clearer separation of per-dataset parameters
- More flexible multi-dataset configuration
- Simpler parallel worker logic

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant