CFG weight has minimal impact on performance — is this expected?

 Hi, thanks for the great work on FlowPlanner!

I've been conducting experiments on the sensitivity of cfg_weight during inference using your official pre-trained weights from HuggingFace (ttwhy/flow-planner), and found surprisingly minimal performance variation across a wide range of cfg_weight values:

Open-loop evaluation (15,000 val scenarios, 5 runs per w to reduce ODE noise): 
cfg_weight | Mean ADE (m)
-- | --
0.5 | 1.9789
1.0 | 2.0024
1.5 | 1.9901
2.0 | 1.9810
2.5 | 2.0190
3.0 | 1.9988
4.0 | 2.0053
→ Total ADE range across all w: <0.04m
Closed-loop evaluation (Val14 NR, 1,115 scenarios):

cfg_weight=1.8: Overall NR = 81.19%
cfg_weight=2.5: Overall NR = 80.71%
→ Difference: only 0.48 points

I noticed the config uses cfg_prob: 0.3 for training dropout, which seems reasonable. My questions:

Did you observe similar insensitivity to cfg_weight during your experiments? I noticed you fixed it at 1.8 — was this chosen after a hyperparameter search, and if so, how much variation did you see?

Could you share some intuition on why cfg_weight has such little impact? My hypothesis is that the unconditional output v_uncond (with neighbor info dropped out) already produces reasonable trajectories since lane/route information is preserved, making v_cond - v_uncond relatively small in magnitude.

Is the CFG mechanism primarily effective in specific scenario types (e.g., dense interactions, unprotected turns) rather than globally? If so, could you point to any per-scenario analysis?

Any insights would be greatly appreciated. This would help me better understand CFG's role in the framework and explore potential improvements like scenario-adaptive guidance.

Thanks!
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CFG weight has minimal impact on performance — is this expected? #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cfg_weight	Mean ADE (m)
0.5	1.9789
1.0	2.0024
1.5	1.9901
2.0	1.9810
2.5	2.0190
3.0	1.9988
4.0	2.0053
→ Total ADE range across all w: <0.04m
Closed-loop evaluation (Val14 NR, 1,115 scenarios):

CFG weight has minimal impact on performance — is this expected? #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions