I have a question regarding the computation of cosine similarity between task vectors. Should the cosine similarity be computed by flattening all parameters into a single vector (high memory cost), or should it be calculated separately for different parts of the model (e.g., MLP, CNN, RNN...) and then aggregated by a weighted average (where the weight are based on the dimensionality of each layer)?
Thank you for your time and assistance.
I have a question regarding the computation of cosine similarity between task vectors. Should the cosine similarity be computed by flattening all parameters into a single vector (high memory cost), or should it be calculated separately for different parts of the model (e.g., MLP, CNN, RNN...) and then aggregated by a weighted average (where the weight are based on the dimensionality of each layer)?
Thank you for your time and assistance.