Skip to content

[refactor] Simplify and unify the TornadoVM layer planner infrastructure#101

Open
orionpapadakis wants to merge 13 commits intomainfrom
refactor/simplify-layerplanner
Open

[refactor] Simplify and unify the TornadoVM layer planner infrastructure#101
orionpapadakis wants to merge 13 commits intomainfrom
refactor/simplify-layerplanner

Conversation

@orionpapadakis
Copy link
Copy Markdown
Collaborator

@orionpapadakis orionpapadakis commented Mar 27, 2026

This PR is a structural refactoring of the tornadovm package — no behavior changes, no new model capabilities. The goal is to eliminate duplication across the FFN layer and layer planner hierarchies and establish a cleaner, more maintainable package structure.

Changes:

FFN layer hierarchy (tornadovm/layers/)

  • Introduced AbstractLogitsLayer to centralize shared setup logic for all logits layers (LogitsFP16Layer, LogitsQ8_0Layer, and Granite variants), replacing duplicated task graph and
    scheduler boilerplate
  • Generalized AbstractFFNLayers<W, C> with type parameters for Weights and Configuration, giving typed access to weights and config in subclasses
  • Unified setupFFNLayerTaskGraphs() across all FFN subclasses; removed redundant fields and methods surfaced during the cleanup
  • Added MistralFP16FFNLayers and MistralQ8_0FFNLayers — Mistral-specific FFN layer implementations using MistralConfiguration, required after the generics tightening

Layer planner hierarchy (tornadovm/layerplanner/)

  • Centralized createTornadoInferencePlan(), all shared fields (activationLayer, ffnLayers, logitsLayer, scheduler, task graphs), and the GenericLayerPlanner interface implementations
    into QuantizedLayerPlanner, eliminating near-total duplication between FP16LayerPlanner and Q8_0LayerPlanner
  • Added MistralFP16LayerPlanner and MistralQ8_0LayerPlanner with correct generic types (LlamaState, MistralConfiguration, LlamaTornadoWeights), fixing a ClassCastException caused by
    Mistral being incorrectly routed to the Llama planners

Package reorganization

  • GenericLayerPlanner and QuantizationPlannerFactory moved to layerplanner/ root (were in a base/ subpackage)
  • QuantizedLayerPlanner moved to layerplanner/ root
  • FP16LayerPlanner co-located with FP16 concrete planners in model/fp16/
  • Q8_0LayerPlanner co-located with Q8_0 concrete planners in model/q8_0/

DeepSeek-R1-Distill-Qwen fix

  • Introduced DeepSeekR1Qwen model class (extends Qwen2) that correctly overrides getModelType() and shouldAddBeginOfText(), fixing a repetition loop caused by missing BOS token and
    <think> prefix injection
  • Fixed Qwen3ChatFormat.getBeginOfText() to fall back to startHeader (<|begin▁of▁sentence|>) when no BOS alias is registered — DeepSeek reuses its first role-marker token as BOS
  • Fixed Qwen3Tokenizer.encode() to byte-map non-special text parts before BPE encoding, resolving a NoSuchElementException when encoding "\n" after splitting on special tokens like
    <think>

@orionpapadakis orionpapadakis force-pushed the refactor/simplify-layerplanner branch from 3fffabd to a3f1450 Compare March 27, 2026 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant