Skip to content

Bug: RuntimeError inplace update to inference tensor in model_runner.py (repetition penalty) #403

@robcqm-bot

Description

@robcqm-bot

Bug Report

Description

Generating music with lm_repetition_penalty != 1.0 always fails with:

RuntimeError: Inplace update to inference tensor outside InferenceMode is not allowed.
You can make a clone to get a normal tensor before doing inplace update.

Root Cause

In acestep/third_parts/nano-vllm/nanovllm/engine/model_runner.py, self.run_model() returns a tensor created inside torch.inference_mode(). The code then does an inplace assignment on a slice of that tensor before cloning it:

logits = self.run_model(input_ids, positions, is_prefill)  # inference tensor
reset_context()

# ... inside the repetition penalty block:
logits[i] = torch.where(token_mask, penalty_scores, logits[i])  # CRASH: inplace on inference tensor

# ... only later:
logits = logits.clone()  # too late - clone comes AFTER the inplace write

The clone() call even has the comment # Clone logits to avoid in-place update issues in inference mode, confirming awareness of the issue - but it's placed after the problematic line.

Fix

Move logits = logits.clone() to immediately after reset_context(), before any inplace writes:

logits = self.run_model(input_ids, positions, is_prefill)
reset_context()
logits = logits.clone()  # clone before any inplace writes

if self.rank == 0:
    if repetition_penalties is not None:
        for i, seq in enumerate(seqs):
            # repetition penalty logic is now safe
            logits[i] = torch.where(token_mask, penalty_scores, logits[i])  # OK

    # Remove the old misplaced clone
    for i, seq in enumerate(seqs):
        # logits processor ...

Steps to Reproduce

  1. Call any generation endpoint with lm_repetition_penalty set to any value other than 1.0
  2. 100% failure rate

Impact

Any call with repetition penalty enabled (the default recommended value to avoid token loops is 1.3) always fails. Feature is completely unusable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions