adds --lora and --lora-scaled support (aligned with llama.cpp api)#786
Open
loganpowell wants to merge 9 commits intomozilla-ai:mainfrom
Open
adds --lora and --lora-scaled support (aligned with llama.cpp api)#786loganpowell wants to merge 9 commits intomozilla-ai:mainfrom
--lora and --lora-scaled support (aligned with llama.cpp api)#786loganpowell wants to merge 9 commits intomozilla-ai:mainfrom
Conversation
added 2 commits
August 8, 2025 10:23
Implements full LoRA (Low-Rank Adaptation) adapter support compatible with llama.cpp, enabling fine-tuning capabilities in llamafile server mode. Features: - Multiple LoRA adapter support with individual scaling factors - New command-line flags: --lora, --lora-scaled, --lora-base - Automatic memory mapping disabling for LoRA compatibility - Per-slot adapter application during initialization - Clean resource management and cleanup on shutdown Changes: - flags.cpp: Add LoRA flag parsing and global adapter management - prog.cpp: Implement adapter loading, validation, and cleanup - slot.cpp/slot.h: Add slot-level adapter application logic - llamafile.h: Define LoRA adapter data structures and constants - README.md: Add comprehensive LoRA usage documentation - RELEASE.md: Document new LoRA features for release notes The implementation follows llama.cpp patterns for maximum compatibility and provides a solid foundation for advanced fine-tuning workflows. Tested with Llama 3 8B + LoRA adapters, supporting both single and multiple adapter configurations with custom scaling factors. Resolves mozilla-ai#697
…pply (mirroring llama.cpp functionality)
- Removes redundant code by deferring to llama.cpp for lora structures - Add Slot::mark_for_refresh() to flag slots for context refresh after LoRA changes - Integrate needs_refresh_ flag and logic into Slot class and prefill() method - Update LoRA adapter API handlers to call mark_for_refresh() after applying or updating adapters - Ensure system prompts and context are preserved using slot’s intelligent prefill mechanism - Remove naive KV cache clearing logic in favor of slot-managed refresh - Improves runtime LoRA scale update reliability
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
error: unknown argument: --lora#697