feat: auto-configure MTP models with metadata-based detection#6
Open
offbyonebit wants to merge 2 commits into
Open
feat: auto-configure MTP models with metadata-based detection#6offbyonebit wants to merge 2 commits into
offbyonebit wants to merge 2 commits into
Conversation
Add support for discovering and serving models from Ollama instances and OpenAI-compatible endpoints alongside locally managed models. - New UpstreamConfig in config.toml for explicit upstream URLs ([[upstreams]] url = "http://127.0.0.1:11435" name = "proxy"]) - discover_ollama() probes ports 11434-11436 for /api/tags - discover_openai_endpoints() scans ports 8080-8088, 18080 for /v1/models (opt-in via --discover-openai or configured upstreams) - discover_upstreams() combines all sources with dedup - Server merges upstream models into /v1/models and /admin/status, forwards chat/completion requests to the matching upstream - Web UI shows upstream models with source pill (ollama/upstream) - CLI: arc-llama add-upstream <url>, arc-llama scan --discover-openai - Startup cache-warm prevents first-request latency - Also includes: launcher log file handle fix, router fast-path optimization
Add three MTP-aware behaviors that inspect GGUF metadata (not filenames):
1. MTP head detection — read nextn_predict_layers and architecture from
the GGUF kv store to determine whether a file actually contains MTP
heads. Auto-enable --spec-type draft-mtp and -ub 8 at add/scan time.
2. Safety wiring at launch time:
• Auto-inject -ub 8 for any model with MTP heads (prevents SSM
compute-buffer OOM during speculative decode verification batches).
• Warn if the user explicitly set spec_type=draft-mtp on a GGUF that
lacks MTP heads.
3. Backend recommendation for hybrid SSM+attention MTP models on Xe2
(Battlemage, Lunar Lake): log a note that SYCL MTP is net-negative
here due to GDN serial state passes, suggesting Vulkan for ~+9%.
Also add an 'mtp-info <path.gguf>' CLI command for quick diagnostics,
and extend the admin edit endpoint to accept spec_type and ubatch_size.
New dependency: gguf>=0.10 (hard dep, not optional).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add three MTP-aware behaviors that inspect GGUF metadata (not filenames):
MTP head detection — read and architecture from the GGUF kv store to determine whether a file actually contains MTP heads. Auto-enable and at / time.
Safety wiring at launch time:
Backend recommendation for hybrid SSM+attention MTP models on Xe2 (Battlemage, Lunar Lake): log a note that SYCL MTP is net-negative here due to GDN serial state passes, suggesting Vulkan for ~+9%.
Also adds: