Skip to content

Clarification on DB5 HF checkpoint/config vs paper/repo setup #12

Description

@cyx-alpha

Hi, thank you for releasing the code and checkpoints.

I am opening this issue on GitHub because it seems to be the main place for reproducibility discussion, although the specific files I refer to below are mainly from the released Hugging Face DB5 checkpoint/config.

While trying to reproduce the DB5 downstream results, I found several points that seem inconsistent between the paper, the repository commands/code, and the released HF files. I would really appreciate some clarification.

1. DB5 HF checkpoint/config appears to use patchify = 4

From the released DB5 HF checkpoint/config, it looks like the patch size is set to 4 samples.

If this corresponds to the 200 ms DB5 setting at 200 Hz, then:

  • 200 ms = 40 samples
  • patch size = 4 samples

So each patch would cover only 20 ms, which seems surprisingly small and a bit difficult to interpret physiologically.

Could you please confirm whether the released DB5 HF model indeed uses patch size = 4 samples?
If yes, could you also explain the motivation for this choice?

2. Repository preprocessing command for DB5 seems inconsistent with the paper

In the repository, the DB5 preprocessing command appears to use:

  • window_size = 200, stride = 50

and another setting like:

  • window_size = 1000, stride = 250

If these values are in samples at 200 Hz, they correspond to:

  • 200 samples = 1 s, stride 50 = 0.25 s75% overlap
  • 1000 samples = 5 s, stride 250 = 1.25 s75% overlap

However, in the paper, the DB5 downstream setting is described as:

  • 200 ms and 1000 ms windows
  • 25% overlap

So the repository command and the paper description do not seem to match.

Could you clarify:

  • whether the repository command is only an example and not the exact DB5 setup used in the paper,
  • and what the actual DB5 window length / stride / overlap were for the reported results?

3. Downstream fine-tuning code seems to use data augmentation, but the paper does not mention it

From the downstream training code, it seems that augmentation-related options are present and used during fine-tuning.

However, I could not find a clear statement in the paper that the reported DB5 downstream results used data augmentation.

Could you clarify:

  • whether data augmentation was used for the reported DB5 results,
  • which augmentation(s) were enabled,
  • and whether they were enabled by default in the released training pipeline?

Why this matters

These details seem very important for DB5 reproducibility, because DB5 appears to be highly sensitive to:

  • window duration,
  • overlap ratio,
  • transition-region label noise,
  • and training tricks such as augmentation.

So it would be very helpful if the exact protocol corresponding to the reported DB5 results could be clarified.

Thanks again for open-sourcing the project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions