See comments in #395. - add torch fallbacks - adapt kda and gdn unit tests to adhere to attention structure (Fast-LLM/tests/test_attention.py) Postponed because this is not on the critical path.