First of all, thank you for sharing such great work and code.
I have two questions here:
- When giving inference to video_gen (image_pair), why is the reconstructed rendering result passed in instead of the original image directly?
- I see that the 'noise_timestep' seems to be fixed in the inference code, and I don't see any DDPNet related code?