[On Policy Distillation] resolve log prob dimension mismatch in on-policy distillation with CP > 1 by Yuchen-Cao · Pull Request #1135 · THUDM/slime

Yuchen-Cao · 2025-12-17T05:52:34Z

This PR addresses a dimension mismatch issue in on_policy_distillation when Context Parallel (CP) is enabled (cp_size > 1).

Problem:
Previously, the implementation did not support slicing the teacher's log prob according to the Context Parallel rank. While the student's log prob adhered to CP slicing, the teacher's log prob remained unsliced. This inconsistency caused a shape mismatch error between the student and teacher tensors during loss computation.

Solution:
I have updated the logic to apply CP slicing to the teacher's log prob, ensuring they align with the student's log prob. This fixes the dimension error and enables proper support for on-policy distillation with Context Parallelism.

yitianlian · 2025-12-17T06:15:49Z

        advantages = [
-            teacher_log_prob - student_log_prob
-            for teacher_log_prob, student_log_prob in zip(teacher_log_probs, student_log_probs, strict=False)
+            t_log_prob[-response_length:] - s_log_prob[-response_length:]


I think we shouldn't slice logp with response_length because the current logp is CP sliced?

Oh yes!

Just some previous attempts when i tried to match the size between them. will change this line back.

yitianlian

LGTM! Sorry for the late reply :(.

yitianlian · 2025-12-24T12:27:50Z

Maybe you should run the pre-commit to fix the format error.

Fix student and teacher log prob size mismatch for CP

e12e5db

yitianlian reviewed Dec 17, 2025

View reviewed changes

fix: remove unnecessary slicing as per review comment

f501fab

Yuchen-Cao requested a review from yitianlian December 17, 2025 06:56

yitianlian approved these changes Dec 24, 2025

View reviewed changes

fix format error

db124e2

Yuchen-Cao requested a review from yitianlian December 31, 2025 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[On Policy Distillation] resolve log prob dimension mismatch in on-policy distillation with CP > 1#1135

[On Policy Distillation] resolve log prob dimension mismatch in on-policy distillation with CP > 1#1135
Yuchen-Cao wants to merge 3 commits intoTHUDM:mainfrom
Yuchen-Cao:fix_cp_on_policy_distillation

Yuchen-Cao commented Dec 17, 2025 •

edited

Loading

Uh oh!

yitianlian Dec 17, 2025

Uh oh!

Yuchen-Cao Dec 17, 2025

Uh oh!

Yuchen-Cao Dec 17, 2025

Uh oh!

yitianlian left a comment

Uh oh!

yitianlian commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Yuchen-Cao commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yitianlian Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Yuchen-Cao Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

Yuchen-Cao Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

yitianlian left a comment

Choose a reason for hiding this comment

Uh oh!

yitianlian commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Yuchen-Cao commented Dec 17, 2025 •

edited

Loading