Question
Qwen3 adds self.cfg.attn_implementation = "eager" to the config.
It further adds the following code to the rotary setup component testing method:
if hasattr(hf_model, "config") and hasattr(hf_model.config, "_attn_implementation"):
hf_model.config._attn_implementation = "eager"
if hasattr(hf_model, "model") and hasattr(hf_model.model, "layers"):
for layer in hf_model.model.layers:
if hasattr(layer, "self_attn") and hasattr(layer.self_attn, "config"):
layer.self_attn.config._attn_implementation = "eager"
Is this something we should replicate elsewhere? Isn't eager the default attention implementation? So the code block exists just to ensure the bridge component and HF instance both execute eager attention?
Question
Qwen3 adds
self.cfg.attn_implementation = "eager"to the config.It further adds the following code to the rotary setup component testing method:
Is this something we should replicate elsewhere? Isn't eager the default attention implementation? So the code block exists just to ensure the bridge component and HF instance both execute eager attention?