Skip to content

Commit a2d0e7e

Browse files
committed
lookup, lookahead: fix crash when n_ctx not specified
Since PR ggml-org#16653 (Dec 15, 2025), the default n_ctx is 0 to enable automatic GPU memory fitting. This causes llama-lookup and llama-lookahead to crash when run without explicit -c flag: GGML_ASSERT(batch.seq_id[batch.n_tokens] && "llama_batch size exceeded") Root cause: Both examples use params.n_ctx directly for batch initialization, but params.n_ctx remains 0 even after the context is properly initialized to n_ctx_train internally. Bug history: - Nov 2023: lookahead.cpp created (PR ggml-org#4207) with params.n_ctx pattern - Dec 2023: lookup.cpp created (PR ggml-org#4484) with same pattern - Nov 2024: default n_ctx changed to 4096 (PR ggml-org#10136) - bug dormant - Dec 2025: default n_ctx changed to 0 (PR ggml-org#16653) - bug activated The bug was dormant for 2+ years because params.n_ctx defaulted to 512, then 4096. PR ggml-org#16653 changed it to 0 for GPU auto-fitting, triggering the crash. Fix: Use llama_n_ctx(ctx) to get the actual runtime context size, matching the pattern already used elsewhere in lookup.cpp (line 72) and in speculative.cpp/speculative-simple.cpp. Tested: llama-lookup now works without -c flag (12.5% acceptance on Gemma-3-1B). Note: llama-lookahead has a separate pre-existing issue with sequence initialization (n_seq_max=1 vs W+G+1 needed) that is unrelated to this fix.
1 parent 9ac2693 commit a2d0e7e

2 files changed

Lines changed: 2 additions & 2 deletions

File tree

examples/lookahead/lookahead.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ int main(int argc, char ** argv) {
115115
// seq_id == 0 : the current input token
116116
// seq_id [1, W] : tokens from the past N - 1 Jacobi iterations
117117
// seq_id [W + 1, W + G] : verification n-grams
118-
llama_batch batch = llama_batch_init(params.n_ctx, 0, W + G + 1);
118+
llama_batch batch = llama_batch_init(llama_n_ctx(ctx), 0, W + G + 1);
119119

120120
// target model sampling context
121121
struct common_sampler * smpl = common_sampler_init(model, params.sampling);

examples/lookup/lookup.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ int main(int argc, char ** argv){
106106

107107
std::vector<llama_token> draft;
108108

109-
llama_batch batch_tgt = llama_batch_init(params.n_ctx, 0, 1);
109+
llama_batch batch_tgt = llama_batch_init(llama_n_ctx(ctx), 0, 1);
110110

111111
const auto t_dec_start = ggml_time_us();
112112

0 commit comments

Comments
 (0)