Hi, I notice there is no padding mask when training the model. Actually, there exist many padding tokens in a batch of data? I wonder how Mamba handles these padding tokens?
Hi, I notice there is no padding mask when training the model.
Actually, there exist many padding tokens in a batch of data?
I wonder how Mamba handles these padding tokens?