Thank you for your work. Just a quick question - in the dataset you provided, why are there only 5 to 6 frames for images and actions in each "episodes"? It seems like just keyframes, but this might not be sufficient for training, right? Do we need to collect and complete it ourselves?
Thank you for your work. Just a quick question - in the dataset you provided, why are there only 5 to 6 frames for images and actions in each "episodes"? It seems like just keyframes, but this might not be sufficient for training, right? Do we need to collect and complete it ourselves?