Overview
This issue tracks the addition of end-to-end TileGym kernel support for Liquid AI's LFM2 MoE model family, starting with: LiquidAI/LFM2-8B-A1B
LFM2-8B-A1B is a hybrid MoE model with 8.3B total parameters and 1.5B active parameters. It combines full-attention blocks, convolution-style blocks, GQA attention, RoPE, RMSNorm-style normalization, SwiGLU-style expert FFNs, and top-k MoE routing.
Planned Steps
- Add
apply_tilegym_kernel_to_lfm2_moe in monkey_patch.py
- Register
lfm2_moe in MODEL_TYPE_TO_APPLY_TILEGYM_FN
- Patch compatible common kernels where applicable:
- RoPE
- RMSNorm
- attention path
- SwiGLU / expert MLP path
- Add an LFM2 MoE wrapper that reuses TileGym
fused_moe
- Verify expert weight layout and routing semantics:
- gate/up projection order
- down projection layout
- top-k routing weights
norm_topk_prob
routed_scaling_factor
- expert bias handling
- Add an E2E inference / benchmark script for
LiquidAI/LFM2-8B-A1B
Questions
- Is
LiquidAI/LFM2-8B-A1B an acceptable validation target for a 5090-friendly E2E model integration?
- Are there existing TileGym conventions for hybrid architectures with non-attention convolution blocks that this integration should follow?
Overview
This issue tracks the addition of end-to-end TileGym kernel support for Liquid AI's LFM2 MoE model family, starting with:
LiquidAI/LFM2-8B-A1BLFM2-8B-A1B is a hybrid MoE model with 8.3B total parameters and 1.5B active parameters. It combines full-attention blocks, convolution-style blocks, GQA attention, RoPE, RMSNorm-style normalization, SwiGLU-style expert FFNs, and top-k MoE routing.
Planned Steps
apply_tilegym_kernel_to_lfm2_moeinmonkey_patch.pylfm2_moeinMODEL_TYPE_TO_APPLY_TILEGYM_FNfused_moenorm_topk_probrouted_scaling_factorLiquidAI/LFM2-8B-A1BQuestions
LiquidAI/LFM2-8B-A1Ban acceptable validation target for a 5090-friendly E2E model integration?