Skip to content

Feat/refactoring: migration to Transformers v5, removing custom MoE backends, and misc. improvements#9

Open
fedebotu wants to merge 7 commits intomainfrom
feat/refactoring
Open

Feat/refactoring: migration to Transformers v5, removing custom MoE backends, and misc. improvements#9
fedebotu wants to merge 7 commits intomainfrom
feat/refactoring

Conversation

@fedebotu
Copy link
Member

@fedebotu fedebotu commented Feb 22, 2026

Overview

This PR replaces the custom MoE backend system (vLLM, SGLang, FlashInfer, HF -- also noting that newer version of these were quite buggy) with the recently released transformers v5's built-in Qwen3MoeExperts and Qwen3MoeTopKRouter. This drops ~250 lines of backend-specific code and lets transformers handle kernel dispatch automatically.

Additionally, several changes to the overall QOL, including linting, bug fixes, and documentation, have been added.

The older version that included the backends code will be left open at the backends brach for future reference.

Major Changes

Model (rnd/modeling_rnd.py)

  • Rewrote RND1SparseMoeBlock to use Qwen3MoeExperts + Qwen3MoeTopKRouter instead of manually routing tokens through per-expert MLPs
  • Removed all vLLM/SGLang/FlashInfer imports, weight-packing logic, and backend selection
  • Registered rnd1 in transformers' _MODEL_TO_CONVERSION_PATTERN so per-expert checkpoint weights are automatically fused into the 3D tensor format during loading
  • Added rotary embedding inv_freq recomputation in from_pretrained (these buffers are non-persistent and not stored in safetensors)
  • Simplified RND1DecoderLayer and RND1Attention by removing backend-conditional RMSNorm class selection
  • Replaced _init_weights with a no-op (weights always comAdditionally, several changes to the overall QOL including linting, bug fixes, and documentation have been added.e from a checkpoint)

Config (rnd/configuration_rnd.py)

  • Removed moe_backend parameter
  • Switched RoPE config to the v5 rope_parameters format

Demo script (demo_rnd_generation.py)

  • Replaced --moe_backend with --experts-implementation (optional, auto-detected when not set)
  • Normalized all CLI args to kebab-case (--top-k, --num-steps, etc.) for consistency with other projects

Project setup

  • Bumped minimum: transformers>=5.0.0, torch>=2.8
  • Removed optional deps for vllm, sglang, flashinfer
  • Reduced ruff line-length from 120 to 100, added bugbear/simplify/isort rules
  • Cleaned up pre-commit config (removed generic hooks, updated ruff, fixed uvx commands)
  • Removed unused SVG asset

@fedebotu fedebotu changed the title Feat/refactoring Feat/refactoring: migration to Transformers v5 and removing custom MoE backends Feb 22, 2026
@fedebotu fedebotu marked this pull request as ready for review February 22, 2026 11:34
@fedebotu fedebotu requested a review from keshik6 February 22, 2026 11:35
@fedebotu fedebotu changed the title Feat/refactoring: migration to Transformers v5 and removing custom MoE backends Feat/refactoring: migration to Transformers v5, removing custom MoE backends, and misc. improvements Feb 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant