diff --git a/docs/source/common_options.rst b/docs/source/common_options.rst index dd35acb..785dfce 100644 --- a/docs/source/common_options.rst +++ b/docs/source/common_options.rst @@ -525,7 +525,7 @@ Only the adapter is saved. Merge it back with the base model to deploy: --lora_path ./checkpoint/llama3-8b-rm \ --output_path ./checkpoint/llama-3-8b-rm-combined \ --is_rm \ - --param_dtype bf16 + --ds.param_dtype bf16 Use ``--is_rm`` when merging a reward model (preserves the score head).