feat: Enable LoRA checkpoint utils for ScatterMoE by willmj · Pull Request #523 · foundation-model-stack/fms-hf-tuning

willmj · 2025-04-08T18:14:03Z

Description of the change

PR to be merged in after fms-acceleration changes

Enables checkpoint utils for ScatterMoE on LoRA tuned models to convert them to their original structure.

Related issue number

How to verify the PR

python -m pytest tests/test_sft_trainer.py::test_run_moe_lora_and_inference
=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.12.5, pytest-8.3.5, pluggy-1.5.0
rootdir: /app/fms-hf-tuning
configfile: pytest.ini
plugins: typeguard-4.4.1
collected 1 item                                                                                                                                                                                           

tests/test_sft_trainer.py .                                                                                                                                                                          [100%]

============================================================================================= warnings summary =============================================================================================
tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
  /app/fms-hf-tuning/tuning/config/acceleration_configs/acceleration_framework_config.py:297: UserWarning: An experimental acceleration feature is requested by specifying the '--fast_moe' argument. Please note this feature may not support certain edge cases at this juncture. When the feature matures this message will be turned off.
    warnings.warn(

tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
  /app/fms-hf-tuning/tuning/sft_trainer.py:349: FutureWarning: `tokenizer` is deprecated and removed starting from version 0.16.0 for `SFTTrainer.__init__`. Use `processing_class` instead.
    trainer = TrainerClass(

tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
  /home/tuning/.local/lib/python3.12/site-packages/datasets/utils/_dill.py:385: DeprecationWarning: co_lnotab is deprecated, use co_lines instead.
    obj.co_lnotab,  # for < python 3.10 [not counted in args]

tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
  /home/tuning/.local/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:300: UserWarning: You passed a processing_class with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when training a model in half-precision. You might consider adding `processing_class.padding_side = 'right'` to your code.
    warnings.warn(

tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
tests/test_sft_trainer.py::test_run_moe_lora_and_inference[/app/fms-hf-tuning/tests/artifacts/testdata/jsonl/twitter_complaints_small.jsonl]
  /home/tuning/.local/lib/python3.12/site-packages/peft/utils/save_and_load.py:257: UserWarning: Setting `save_embedding_layers` to `True` as the embedding layer has been resized during finetuning.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================================== 1 passed, 9 warnings in 32.15s ======================================================================================

Was the PR tested

I have added >=1 unit test(s) for every new method I have added.
I have ensured all unit tests pass

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

github-actions · 2025-04-08T18:14:14Z

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

kmehant · 2025-04-16T22:04:26Z

README.md

+      - lora tuning with ScatterMoE is supported, but because of inference restrictions on vLLM/vanilla PEFT, experts should not be trained as `target_modules` for models being tuned with ScatterMoE. Users have control over which `target_modules` they wish to train:
+          - Passing `all-linear` to adapter layers will include the router, which is a linear layer, and all attn layers. This **will not** train the expert layers.
+          - To train only attention layers, specify target modules specifically (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj"]`).
+          - To train expert layers, specify `input_linear` and `output_linear` in target modules along with `router` (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj", "router", "input_linear", "output_linear"]`). If you specify these layers, inference with vLLM/vanilla HF PEFT **is not possible**.


Suggested change

- To train expert layers, specify `input_linear` and `output_linear` in target modules along with `router` (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj", "router", "input_linear", "output_linear"]`). If you specify these layers, inference with vLLM/vanilla HF PEFT **is not possible**.

- To train expert layers, specify `input_linear` and `output_linear` in target modules along with `router` (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj", "router", "input_linear", "output_linear"]`). If you specify these layers, inference with vLLM/vanilla HF PEFT **is not currently supported.**.

kmehant · 2025-04-16T22:08:33Z

README.md

+          - Passing `all-linear` to adapter layers will include the router, which is a linear layer, and all attn layers. This **will not** train the expert layers.
+          - To train only attention layers, specify target modules specifically (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj"]`).
+          - To train expert layers, specify `input_linear` and `output_linear` in target modules along with `router` (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj", "router", "input_linear", "output_linear"]`). If you specify these layers, inference with vLLM/vanilla HF PEFT **is not possible**.
+      - When lora tuning with ScatterMoE, the values `--fast_moe 1` or `--fast_moe True` are not expected to work, as FSDP must be enabled when lora tuning. Run either `--fast_moe False` or `--fast-moe x>1`.


Didnt get your point quite yet here. --fast_moe True disables expert parallel however, experts are sharded by FSDP. So FSDP is active here.

BTW, --fast_moe 1 and --fast_moe False Both have the same effect isn't it? In both the settings, all experts are replicated and deferred from FSDP however, other layers are under FSDP sharding.

May be if you are confortable with a support matrix table, lets do that and pin point case by case.

willmj added 9 commits March 24, 2025 13:39

save peft

cac0b8c

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: model

c522429

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

post process hf converted dir

481dde6

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: convert hf converted checkpoint

397c9ba

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

lora config

79dec24

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

save adapter config

3103720

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fmt + comments

b61cbde

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: add input linear and output linear to target modules

c12be0e

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: extend instead of append

123c2d4

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

willmj requested review from Ssukriti, aluu317, anhuong, fabianlim and kmehant as code owners April 8, 2025 18:14

willmj marked this pull request as draft April 8, 2025 18:14

github-actions bot added the feat label Apr 8, 2025

willmj added 6 commits April 8, 2025 15:40

fix: if hasattr peft config

f68500b

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: remove unneeded target modules

55ec4b5

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

Merge branch 'main' into save-peft-fast-moe

0cfb9f4

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

merge: main into branch

67bed66

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

lint + fmt

2362349

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

docs

a848a9b

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

willmj marked this pull request as ready for review April 11, 2025 20:23

test: lora for scattermoe

42c420c

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

dushyantbehl requested review from dushyantbehl and removed request for Ssukriti, aluu317, anhuong and fabianlim April 15, 2025 12:13

willmj added 2 commits April 15, 2025 09:29

fmt tests

e3e7525

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

docs: notes on restrictions

8449659

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

kmehant reviewed Apr 16, 2025

View reviewed changes

willmj marked this pull request as draft April 17, 2025 00:00

willmj mentioned this pull request Apr 17, 2025

feat: Enable LoRA saving only for non MoE linear layers training with kernels. #530

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Enable LoRA checkpoint utils for ScatterMoE#523

feat: Enable LoRA checkpoint utils for ScatterMoE#523
willmj wants to merge 18 commits intofoundation-model-stack:mainfrom
willmj:save-peft-fast-moe

willmj commented Apr 8, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 8, 2025

Uh oh!

kmehant Apr 16, 2025

Uh oh!

kmehant Apr 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	- To train expert layers, specify `input_linear` and `output_linear` in target modules along with `router` (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj", "router", "input_linear", "output_linear"]`). If you specify these layers, inference with vLLM/vanilla HF PEFT is not possible.
	- To train expert layers, specify `input_linear` and `output_linear` in target modules along with `router` (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj", "router", "input_linear", "output_linear"]`). If you specify these layers, inference with vLLM/vanilla HF PEFT is not currently supported..

Conversation

willmj commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the change

Related issue number

How to verify the PR

Was the PR tested

Uh oh!

github-actions bot commented Apr 8, 2025

Uh oh!

kmehant Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

kmehant Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

willmj commented Apr 8, 2025 •

edited

Loading