-
Notifications
You must be signed in to change notification settings - Fork 536
Bug: RuntimeError inplace update to inference tensor in model_runner.py (repetition penalty) #403
Copy link
Copy link
Open
Description
Bug Report
Description
Generating music with lm_repetition_penalty != 1.0 always fails with:
RuntimeError: Inplace update to inference tensor outside InferenceMode is not allowed.
You can make a clone to get a normal tensor before doing inplace update.
Root Cause
In acestep/third_parts/nano-vllm/nanovllm/engine/model_runner.py, self.run_model() returns a tensor created inside torch.inference_mode(). The code then does an inplace assignment on a slice of that tensor before cloning it:
logits = self.run_model(input_ids, positions, is_prefill) # inference tensor
reset_context()
# ... inside the repetition penalty block:
logits[i] = torch.where(token_mask, penalty_scores, logits[i]) # CRASH: inplace on inference tensor
# ... only later:
logits = logits.clone() # too late - clone comes AFTER the inplace writeThe clone() call even has the comment # Clone logits to avoid in-place update issues in inference mode, confirming awareness of the issue - but it's placed after the problematic line.
Fix
Move logits = logits.clone() to immediately after reset_context(), before any inplace writes:
logits = self.run_model(input_ids, positions, is_prefill)
reset_context()
logits = logits.clone() # clone before any inplace writes
if self.rank == 0:
if repetition_penalties is not None:
for i, seq in enumerate(seqs):
# repetition penalty logic is now safe
logits[i] = torch.where(token_mask, penalty_scores, logits[i]) # OK
# Remove the old misplaced clone
for i, seq in enumerate(seqs):
# logits processor ...Steps to Reproduce
- Call any generation endpoint with
lm_repetition_penaltyset to any value other than1.0 - 100% failure rate
Impact
Any call with repetition penalty enabled (the default recommended value to avoid token loops is 1.3) always fails. Feature is completely unusable.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels