adds `--lora` and `--lora-scaled` support (aligned with llama.cpp api) by loganpowell · Pull Request #786 · mozilla-ai/llamafile

loganpowell · 2025-08-08T14:59:54Z

resolves Bug: error: unknown argument: --lora #697
supports the same api as llama.cpp WRT lora adapter(s)

Implements full LoRA (Low-Rank Adaptation) adapter support compatible with llama.cpp, enabling fine-tuning capabilities in llamafile server mode. Features: - Multiple LoRA adapter support with individual scaling factors - New command-line flags: --lora, --lora-scaled, --lora-base - Automatic memory mapping disabling for LoRA compatibility - Per-slot adapter application during initialization - Clean resource management and cleanup on shutdown Changes: - flags.cpp: Add LoRA flag parsing and global adapter management - prog.cpp: Implement adapter loading, validation, and cleanup - slot.cpp/slot.h: Add slot-level adapter application logic - llamafile.h: Define LoRA adapter data structures and constants - README.md: Add comprehensive LoRA usage documentation - RELEASE.md: Document new LoRA features for release notes The implementation follows llama.cpp patterns for maximum compatibility and provides a solid foundation for advanced fine-tuning workflows. Tested with Llama 3 8B + LoRA adapters, supporting both single and multiple adapter configurations with custom scaling factors. Resolves mozilla-ai#697

…pply (mirroring llama.cpp functionality)

- Removes redundant code by deferring to llama.cpp for lora structures - Add Slot::mark_for_refresh() to flag slots for context refresh after LoRA changes - Integrate needs_refresh_ flag and logic into Slot class and prefill() method - Update LoRA adapter API handlers to call mark_for_refresh() after applying or updating adapters - Ensure system prompts and context are preserved using slot’s intelligent prefill mechanism - Remove naive KV cache clearing logic in favor of slot-managed refresh - Improves runtime LoRA scale update reliability

Logan Powell added 2 commits August 8, 2025 10:23

fixes scale printing in server log

dc8a203

github-actions Bot added documentation llamafile labels Aug 8, 2025

Logan Powell and others added 7 commits August 8, 2025 19:29

adds multi-lora hot-swapping functionality with --lora-init-without-a…

e3288bc

…pply (mirroring llama.cpp functionality)

hk

78a3b76

moar hk

95f6887

hk... sorry

069024d

removes .vscode setup

f9204e7

hk

86cae3d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds `--lora` and `--lora-scaled` support (aligned with llama.cpp api)#786

adds `--lora` and `--lora-scaled` support (aligned with llama.cpp api)#786
loganpowell wants to merge 9 commits intomozilla-ai:mainfrom
loganpowell:main

loganpowell commented Aug 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

loganpowell commented Aug 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant