Skip to content

Commit 0ff742c

Browse files
committed
fixed dynamic range in case of subfunc issue and nonmatching ctx, prefill seq_len for prefill_only gpt_oss model
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
1 parent 37f3681 commit 0ff742c

File tree

2 files changed

+2
-1
lines changed

2 files changed

+2
-1
lines changed

QEfficient/transformers/models/gpt_oss/modeling_gpt_oss.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -663,7 +663,7 @@ def forward(
663663
}
664664
if self.sliding_window is not None:
665665
sliding_window_len = past_key_value.sliding_window_len
666-
short_read_idx = torch.arange(sliding_window_len)
666+
short_read_idx = torch.arange(past_key_value.key_cache[self.layer_idx].shape[2])
667667
read_idx = short_read_idx + torch.where(
668668
position_ids.max() > sliding_window_len - 1, position_ids.max() - sliding_window_len + 1, 0
669669
)

QEfficient/transformers/models/modeling_auto.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2800,6 +2800,7 @@ def compile(
28002800
batch_size=batch_size,
28012801
kv_cache_batch_size=kv_cache_batch_size,
28022802
full_batch_size=full_batch_size,
2803+
prefill_only=prefill_only,
28032804
)
28042805
)
28052806
if prefill_only is None or not prefill_only:

0 commit comments

Comments
 (0)