Skip to content

[On Policy Distillation] resolve log prob dimension mismatch in on-policy distillation with CP > 1#1135

Open
Yuchen-Cao wants to merge 3 commits intoTHUDM:mainfrom
Yuchen-Cao:fix_cp_on_policy_distillation
Open

[On Policy Distillation] resolve log prob dimension mismatch in on-policy distillation with CP > 1#1135
Yuchen-Cao wants to merge 3 commits intoTHUDM:mainfrom
Yuchen-Cao:fix_cp_on_policy_distillation

Conversation

@Yuchen-Cao
Copy link
Copy Markdown

@Yuchen-Cao Yuchen-Cao commented Dec 17, 2025

This PR addresses a dimension mismatch issue in on_policy_distillation when Context Parallel (CP) is enabled (cp_size > 1).

Problem:
Previously, the implementation did not support slicing the teacher's log prob according to the Context Parallel rank. While the student's log prob adhered to CP slicing, the teacher's log prob remained unsliced. This inconsistency caused a shape mismatch error between the student and teacher tensors during loss computation.

Solution:
I have updated the logic to apply CP slicing to the teacher's log prob, ensuring they align with the student's log prob. This fixes the dimension error and enables proper support for on-policy distillation with Context Parallelism.

Comment thread slime/backends/megatron_utils/loss.py Outdated
advantages = [
teacher_log_prob - student_log_prob
for teacher_log_prob, student_log_prob in zip(teacher_log_probs, student_log_probs, strict=False)
t_log_prob[-response_length:] - s_log_prob[-response_length:]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't slice logp with response_length because the current logp is CP sliced?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes!

Just some previous attempts when i tried to match the size between them. will change this line back.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Collaborator

@yitianlian yitianlian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Sorry for the late reply :(.

@yitianlian
Copy link
Copy Markdown
Collaborator

Maybe you should run the pre-commit to fix the format error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants