-
Notifications
You must be signed in to change notification settings - Fork 28
CFG weight has minimal impact on performance — is this expected? #17
Description
Hi, thanks for the great work on FlowPlanner!
I've been conducting experiments on the sensitivity of cfg_weight during inference using your official pre-trained weights from HuggingFace (ttwhy/flow-planner), and found surprisingly minimal performance variation across a wide range of cfg_weight values:
Open-loop evaluation (15,000 val scenarios, 5 runs per w to reduce ODE noise):
| cfg_weight | Mean ADE (m) |
|---|---|
| 0.5 | 1.9789 |
| 1.0 | 2.0024 |
| 1.5 | 1.9901 |
| 2.0 | 1.9810 |
| 2.5 | 2.0190 |
| 3.0 | 1.9988 |
| 4.0 | 2.0053 |
| → Total ADE range across all w: <0.04m | |
| Closed-loop evaluation (Val14 NR, 1,115 scenarios): |
cfg_weight=1.8: Overall NR = 81.19%
cfg_weight=2.5: Overall NR = 80.71%
→ Difference: only 0.48 points
I noticed the config uses cfg_prob: 0.3 for training dropout, which seems reasonable. My questions:
Did you observe similar insensitivity to cfg_weight during your experiments? I noticed you fixed it at 1.8 — was this chosen after a hyperparameter search, and if so, how much variation did you see?
Could you share some intuition on why cfg_weight has such little impact? My hypothesis is that the unconditional output v_uncond (with neighbor info dropped out) already produces reasonable trajectories since lane/route information is preserved, making v_cond - v_uncond relatively small in magnitude.
Is the CFG mechanism primarily effective in specific scenario types (e.g., dense interactions, unprotected turns) rather than globally? If so, could you point to any per-scenario analysis?
Any insights would be greatly appreciated. This would help me better understand CFG's role in the framework and explore potential improvements like scenario-adaptive guidance.
Thanks!