Extracting info from the H5 files

Hello,

I would be interested to train an audio-only model (or, perhaps, a bimodal audio-text one) using CMU-MOSEI data.

I would be recomputing the audio embeddings.

So I would need only the links to the videos plus the timestamps and the annotated emotions per timestamp range.

How would I go about extracting this information?

Thanks,

Ed