Thanks for this great work, you have mentioned the accuracy with both SA and AS blocks but under the absence of the co-attention fusion module in the paper and I wonder how did you get the result in this case? Did you have a direct FC layer at the end of the attention modules? How can we replicate that result?
Thanks for this great work, you have mentioned the accuracy with both SA and AS blocks but under the absence of the co-attention fusion module in the paper and I wonder how did you get the result in this case? Did you have a direct FC layer at the end of the attention modules? How can we replicate that result?