Pixel-level diffusion seems to be catching up and I think diffusers could have a nice role to play in accelerating its adoption and research.
Code and checkpoints can be found here: https://github.com/LTH14/JiT.git.
Even though the released checkpoints were obtained with ImageNet, I believe we can support this like we supported the original DiT when it came out.
Cc: @LTH14 (first author), @kashif