Hi, thanks for sharing your work, and I'd like to try the model. Is there any demand for the GPU memory when running inference? Thanks.