Skip to content
This repository was archived by the owner on Feb 21, 2026. It is now read-only.

Fix scheduler stepping logic and checkpoint state persistence in PR#11#12

Open
Copilot wants to merge 4 commits intomainfrom
copilot/review-pr-11
Open

Fix scheduler stepping logic and checkpoint state persistence in PR#11#12
Copilot wants to merge 4 commits intomainfrom
copilot/review-pr-11

Conversation

Copy link
Contributor

Copilot AI commented Feb 6, 2026

Code review of PR#11 revealed three critical correctness issues in the optimizer/scheduler factory implementation that would cause incorrect training behavior.

Issues Fixed

1. LinearWarmupWrapper dual counter desynchronization

The wrapper maintained both _step_count and inherited last_epoch, causing warmup phase to be off by one step. Now uses only last_epoch from parent _LRScheduler.

Before:

def __init__(self, ...):
    self._step_count = 0  # Custom counter
    super().__init__(optimizer, last_epoch)  # Parent has last_epoch=-1
    
def get_lr(self):
    if self._step_count < self.warmup_steps:  # Uses custom counter
        alpha = self._step_count / self.warmup_steps

After:

def __init__(self, ...):
    # Removed self._step_count
    super().__init__(optimizer, last_epoch)
    
def get_lr(self):
    current_step = self.last_epoch + 1  # Uses parent counter consistently
    if current_step < self.warmup_steps:
        alpha = current_step / self.warmup_steps

2. EMA shadow weights lost on checkpoint resume

EMA.state_dict() existed but was never called. Training resume would reinitialize EMA from current weights, losing accumulated shadow.

  • Added scheduler and ema parameters to save_checkpoint_bundle() and load_checkpoint_into()
  • Reorganized initialization: EMA and scheduler created before checkpoint resume
  • Both save_checkpoint_bundle() call sites updated

3. Scheduler state not persisted

Similar to EMA - scheduler would restart from step 0 on resume, breaking learning rate schedule continuity.

Testing

  • All 12 unit tests pass
  • Ruff linting clean
  • CodeQL: 0 vulnerabilities

Files Modified

  • rtdetr_pose/rtdetr_pose/sched_factory.py - Fixed LinearWarmupWrapper logic
  • rtdetr_pose/tools/train_minimal.py - Added checkpoint persistence for scheduler/EMA state
Original prompt

review PR#11


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits February 6, 2026 09:28
Co-authored-by: thinksyncs <42225585+thinksyncs@users.noreply.github.com>
Copilot AI changed the title [WIP] Review changes in pull request #11 Fix scheduler stepping logic and checkpoint state persistence in PR#11 Feb 6, 2026
Copilot AI requested a review from thinksyncs February 6, 2026 09:33
@thinksyncs thinksyncs marked this pull request as ready for review February 6, 2026 09:40
Copilot AI review requested due to automatic review settings February 6, 2026 09:40
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to fix three critical issues identified in PR#11's optimizer and scheduler factory implementation: dual counter desynchronization in LinearWarmupWrapper, missing EMA state persistence in checkpoints, and missing scheduler state persistence in checkpoints.

Changes:

  • Added checkpoint persistence for scheduler and EMA state by extending save_checkpoint_bundle() and load_checkpoint_into() functions
  • Reorganized initialization order to create scheduler and EMA before checkpoint resume to enable state loading
  • Removed dual counter from LinearWarmupWrapper to use only the inherited last_epoch counter

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
rtdetr_pose/rtdetr_pose/sched_factory.py New scheduler factory module with LinearWarmupWrapper, build_scheduler(), and EMA class implementation
rtdetr_pose/tools/train_minimal.py Added scheduler/EMA parameters to checkpoint functions, integrated factory scheduler/EMA into training loop, added CLI arguments for scheduler/EMA configuration
SECURITY_SUMMARY.md Added security review summary documenting CodeQL scan results and security considerations for PR#11
PR11_REVIEW_FIXES.md Documentation describing the three issues found in PR#11 and the fixes applied

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants