Skip to content

Question about RF ablation #17

@dazazh

Description

@dazazh

Thanks for the great work. I have a small question about the ablation results in Table 2.

From my understanding, w/o RF inference removes the rolling window only during inference, while the model is still trained with the full RF-style procedure. Would this introduce a train-test mismatch? If so, the large degradation in this setting may not be solely caused by removing the rolling window itself.

In contrast, w/o RF training seems to be a cleaner comparison, since both training and inference use the frame-by-frame paradigm. Interestingly, its quality drift is much smaller than w/o RF inference, while w/o attention sink shows a much larger drift.

So I wonder whether it is fair to interpret the results as follows:

  • the attention sink / initial KV cache plays a more direct role in suppressing long-term drift;
  • the rolling window mainly improves local joint denoising, temporal consistency, and subject/background quality.

Have you considered whether selecting more informative keyframes as anchors could further mitigate long-video drift? I wonder if this type of keyframe-based anchoring might be a complementary or even more effective alternative to the rolling-window mechanism for maintaining long-term consistency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions