[E2E Model Support] Add TileGym kernel integration for LFM2 MoE

### Overview
This issue tracks the addition of end-to-end TileGym kernel support for Liquid AI's LFM2 MoE model family, starting with: `LiquidAI/LFM2-8B-A1B`
LFM2-8B-A1B is a  hybrid MoE model with 8.3B total parameters and 1.5B active parameters. It combines full-attention blocks, convolution-style blocks, GQA attention, RoPE, RMSNorm-style normalization, SwiGLU-style expert FFNs, and top-k MoE routing.

### Planned Steps
- Add `apply_tilegym_kernel_to_lfm2_moe` in `monkey_patch.py`
- Register `lfm2_moe` in `MODEL_TYPE_TO_APPLY_TILEGYM_FN`
- Patch compatible common kernels where applicable:
  - RoPE
  - RMSNorm
  - attention path
  - SwiGLU / expert MLP path
- Add an LFM2 MoE wrapper that reuses TileGym `fused_moe`
- Verify expert weight layout and routing semantics:
  - gate/up projection order
  - down projection layout
  - top-k routing weights
  - `norm_topk_prob`
  - `routed_scaling_factor`
  - expert bias handling
- Add an E2E inference / benchmark script for `LiquidAI/LFM2-8B-A1B`

### Questions 
- Is `LiquidAI/LFM2-8B-A1B` an acceptable validation target for a 5090-friendly E2E model integration?
- Are there existing TileGym conventions for hybrid architectures with non-attention convolution blocks that this integration should follow?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[E2E Model Support] Add TileGym kernel integration for LFM2 MoE #154

Overview

Planned Steps

Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[E2E Model Support] Add TileGym kernel integration for LFM2 MoE #154

Description

Overview

Planned Steps

Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions