Skip to content

Commit d856cd9

Browse files
committed
added errors for prefill-only mode
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
1 parent 74ae064 commit d856cd9

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

QEfficient/transformers/models/modeling_auto.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3011,6 +3011,16 @@ def compile(
30113011
"KV caching requires continuous batching. Please set `full_batch_size` and "
30123012
"enable `continuous_batching=True` in `from_pretrained`."
30133013
)
3014+
else:
3015+
if self.continuous_batching:
3016+
if not enable_chunking:
3017+
raise NotImplementedError(
3018+
"Looks like you are trying to run prefix-caching without chunking, this feature is not available yet!"
3019+
)
3020+
if not isinstance(kv_cache_batch_size, int):
3021+
raise ValueError(
3022+
"Please pass valid integer for kv_cache_batch_size as continuous_batching is enabled for prefill-only model"
3023+
)
30143024

30153025
# For supporting VLLM and Disaggregated with CCL
30163026
if "comp_ctx_lengths_prefill" in compiler_options and "comp_ctx_lengths_decode" in compiler_options:

0 commit comments

Comments
 (0)