Skip to content

Are all I-frame tokens intended to be preserved in the current implementation? #113

@Wenbo-Nie

Description

@Wenbo-Nie

Hi, thanks for the great work on this HEVC-based token selection pipeline. I have a question about how I-frames are handled in ap_dataloader_dali_codec.py.

My understanding from the paper is that all tokens from I-frames are preserved, while Top-K selection is only applied to P-frame patches based on codec-derived saliency. In particular, Equation (2) seems to describe the HEVC input as keeping the full patchified I-frame and applying the visibility mask only to decoded P-frames.

However, in get_frame_id_list, I noticed that residuals at I-frame positions are explicitly zeroed out:

if pos in I_pos_set:
residuals_y[pos] = np.zeros((H0, W0), dtype=dtype0 or np.uint8)

Since patch scores in compute_visible_indices_cpu are computed from residual energy, this seems to imply that all I-frame patches receive a score of 0 and therefore would not be selected by Top-K, except possibly through tie-breaking or the static_fallback path.

So I wanted to check whether I am misunderstanding the implementation, or whether the current code is using a different behavior from what I inferred from the paper. If I-frame tokens are indeed intended to be fully preserved, could you clarify where that happens in the pipeline?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions