Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions auto_round/utils/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,7 @@ def llm_load_model(

model = model.eval()
check_and_mark_quantized_module(model)
handle_generation_config(model)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call mutates model.generation_config during load, which can change downstream generation behavior (enabling sampling) even if the caller did not intend behavior changes at load time. Since the PR goal is to address generation_config saving failures, consider moving this normalization to the save/export path (or applying it only immediately before serialization) to avoid surprising side effects during loading.

Suggested change
handle_generation_config(model)

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@hshen14 hshen14 Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add TODO with link: huggingface/transformers#43937. Once Transformers has a fix, we can remove the workaround.

model = _to_model_dtype(model, model_dtype)

return model, tokenizer
Expand Down Expand Up @@ -477,6 +478,7 @@ def mllm_load_model(

model = model.eval()
check_and_mark_quantized_module(model)
handle_generation_config(model)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as in llm_load_model: mutating generation settings during model load can unexpectedly change runtime generation behavior. If the intent is specifically to avoid GenerationConfig validation errors on save, prefer applying this right before saving rather than at load time.

Suggested change
handle_generation_config(model)

Copilot uses AI. Check for mistakes.
model = _to_model_dtype(model, model_dtype)

return model, processor, tokenizer, image_processor
Expand Down Expand Up @@ -1549,3 +1551,14 @@ def is_separate_tensor(model: torch.nn.Module, tensor_name: str) -> bool:
return True
else:
return False


def handle_generation_config(model):
if hasattr(model, "generation_config"):
generation_config = model.generation_config
if hasattr(generation_config, "top_p") and generation_config.top_p != 1.0:
model.generation_config.do_sample = True
if hasattr(generation_config, "top_k") and generation_config.top_k != 0:
model.generation_config.do_sample = True
if hasattr(generation_config, "temperature") and generation_config.temperature != 1.0:
model.generation_config.do_sample = True
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intent is to prevent GenerationConfig validation/saving failures caused by inconsistent sampling settings, this handling looks incomplete: Transformers' validation can also consider other sampling-related fields (e.g., typical_p, min_p, epsilon_cutoff, eta_cutoff, etc.). With the current implementation, save/validate can still fail when those are set away from defaults while do_sample remains False. Consider expanding the normalization condition to cover all sampling parameters that require do_sample=True.

Suggested change
model.generation_config.do_sample = True
model.generation_config.do_sample = True
# Additional sampling-related parameters that also imply do_sample=True
if hasattr(generation_config, "typical_p") and generation_config.typical_p is not None:
model.generation_config.do_sample = True
if hasattr(generation_config, "min_p") and generation_config.min_p is not None:
model.generation_config.do_sample = True
if hasattr(generation_config, "epsilon_cutoff") and generation_config.epsilon_cutoff is not None:
model.generation_config.do_sample = True
if hasattr(generation_config, "eta_cutoff") and generation_config.eta_cutoff is not None:
model.generation_config.do_sample = True

Copilot uses AI. Check for mistakes.
Comment on lines +1559 to +1564
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repeats both the attribute checks and the assignment to model.generation_config.do_sample. Since generation_config is already a local variable, it would be clearer to set generation_config.do_sample once based on a combined condition (e.g., compute a boolean like needs_sampling = ... and then assign once). This reduces duplication and makes it easier to extend the list of parameters consistently.

Suggested change
if hasattr(generation_config, "top_p") and generation_config.top_p != 1.0:
model.generation_config.do_sample = True
if hasattr(generation_config, "top_k") and generation_config.top_k != 0:
model.generation_config.do_sample = True
if hasattr(generation_config, "temperature") and generation_config.temperature != 1.0:
model.generation_config.do_sample = True
needs_sampling = (
(hasattr(generation_config, "top_p") and generation_config.top_p != 1.0)
or (hasattr(generation_config, "top_k") and generation_config.top_k != 0)
or (
hasattr(generation_config, "temperature")
and generation_config.temperature != 1.0
)
)
if needs_sampling:
generation_config.do_sample = True

Copilot uses AI. Check for mistakes.
Loading