Skip to content

Enhance multimodal support and speculative decoding in atomic-llama-c…#14

Merged
Ooooze merged 1 commit into
feature/turboquant-kv-cachefrom
b1-mtp-qwen-rebase
May 13, 2026
Merged

Enhance multimodal support and speculative decoding in atomic-llama-c…#14
Ooooze merged 1 commit into
feature/turboquant-kv-cachefrom
b1-mtp-qwen-rebase

Conversation

@Ooooze
Copy link
Copy Markdown

@Ooooze Ooooze commented May 13, 2026

…pp-turboquant

  • Updated NEXTN.md to document the integration of --mmproj with speculative decoding types mtp, nextn, and eagle3, allowing coexistence on a single slot.
  • Revised README.md to reflect the new multimodal capabilities and their implications for text and image processing.
  • Added functions in common/speculative.cpp and common/speculative.h to check compatibility of speculative types with multimodal settings.
  • Enhanced server context handling to manage multimodal prompts and ensure correct behavior during speculative decoding.
  • Introduced a new script for running Gemma 4 with multimodal projector support, detailing expected behavior for text and image turns.
  • Updated documentation in docs/speculative.md to clarify per-turn behavior and future roadmap for draft acceleration on vision turns.

Overview

Additional information

Requirements

…pp-turboquant

- Updated NEXTN.md to document the integration of `--mmproj` with speculative decoding types `mtp`, `nextn`, and `eagle3`, allowing coexistence on a single slot.
- Revised README.md to reflect the new multimodal capabilities and their implications for text and image processing.
- Added functions in `common/speculative.cpp` and `common/speculative.h` to check compatibility of speculative types with multimodal settings.
- Enhanced server context handling to manage multimodal prompts and ensure correct behavior during speculative decoding.
- Introduced a new script for running Gemma 4 with multimodal projector support, detailing expected behavior for text and image turns.
- Updated documentation in `docs/speculative.md` to clarify per-turn behavior and future roadmap for draft acceleration on vision turns.
@Ooooze Ooooze merged commit 0a635dc into feature/turboquant-kv-cache May 13, 2026
@github-actions github-actions Bot added documentation Improvements or additions to documentation examples server script labels May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation examples script server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant