Skip to content

Commit e8d1128

Browse files
committed
added errors for prefill-only mode
Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>
1 parent 1b60a5f commit e8d1128

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

QEfficient/transformers/models/modeling_auto.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3069,6 +3069,16 @@ def compile(
30693069
"KV caching requires continuous batching. Please set `full_batch_size` and "
30703070
"enable `continuous_batching=True` in `from_pretrained`."
30713071
)
3072+
else:
3073+
if self.continuous_batching:
3074+
if not enable_chunking:
3075+
raise NotImplementedError(
3076+
"Looks like you are trying to run prefix-caching without chunking, this feature is not available yet!"
3077+
)
3078+
if not isinstance(kv_cache_batch_size, int):
3079+
raise ValueError(
3080+
"Please pass valid integer for kv_cache_batch_size as continuous_batching is enabled for prefill-only model"
3081+
)
30723082

30733083
# if ccl_enabled is True read Compute-Context-Length lists
30743084
if self.ccl_enabled:

0 commit comments

Comments
 (0)