Skip to content

feat: auto-configure MTP models with metadata-based detection#7

Merged
offbyonebit merged 1 commit into
mainfrom
mtp-clean
May 21, 2026
Merged

feat: auto-configure MTP models with metadata-based detection#7
offbyonebit merged 1 commit into
mainfrom
mtp-clean

Conversation

@offbyonebit
Copy link
Copy Markdown
Owner

Add three MTP-aware behaviors that inspect GGUF metadata (not filenames):

  1. MTP head detection — read nextn_predict_layers and architecture from the GGUF kv store to determine whether a file actually contains MTP heads. Auto-enable --spec-type draft-mtp and -ub 8 at add/scan time.

  2. Safety wiring at launch time:

    • Auto-inject -ub 8 for any model with MTP heads (prevents SSM compute-buffer OOM during speculative decode verification batches).
    • Warn if the user explicitly set spec_type=draft-mtp on a GGUF that lacks MTP heads.
  3. Backend recommendation for hybrid SSM+attention MTP models on Xe2 (Battlemage, Lunar Lake): log a note that SYCL MTP is net-negative here due to GDN serial state passes, suggesting Vulkan for ~+9%.

Also adds:

  • mtp-info <path.gguf> CLI command for quick diagnostics
  • Admin edit endpoint accepts spec_type and ubatch_size
  • gguf>=0.10 as a hard dependency

Add three MTP-aware behaviors that inspect GGUF metadata (not filenames):

1. MTP head detection — read nextn_predict_layers and architecture from
   the GGUF kv store to determine whether a file actually contains MTP
   heads. Auto-enable --spec-type draft-mtp and -ub 8 at add/scan time.

2. Safety wiring at launch time:
   • Auto-inject -ub 8 for any model with MTP heads (prevents SSM
     compute-buffer OOM during speculative decode verification batches).
   • Warn if the user explicitly set spec_type=draft-mtp on a GGUF that
     lacks MTP heads.

3. Backend recommendation for hybrid SSM+attention MTP models on Xe2
   (Battlemage, Lunar Lake): log a note that SYCL MTP is net-negative
   here due to GDN serial state passes, suggesting Vulkan for ~+9%.

Also add an 'mtp-info <path.gguf>' CLI command for quick diagnostics,
and extend the admin edit endpoint to accept spec_type and ubatch_size.

New dependency: gguf>=0.10 (hard dep, not optional).
@offbyonebit offbyonebit merged commit 97f77d9 into main May 21, 2026
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant