whisper : validate vocab size and per-token length when loading model (#3674)#3767
Open
achyutbenz19 wants to merge 1 commit into
Open
whisper : validate vocab size and per-token length when loading model (#3674)#3767achyutbenz19 wants to merge 1 commit into
achyutbenz19 wants to merge 1 commit into
Conversation
whisper_model_load reads n_vocab (int32) and per-token length (uint32) directly from the model file with no bounds check. A malformed or fuzzed model (e.g. an 8-byte AFL++ finding) can set these to values that cause std::vector::resize to throw bad_alloc, which is uncaught and terminates the process with SIGABRT (signal 6) before any error is reported. Cap n_vocab at 2^20 tokens (real models top out around 52k) and each per-token length at 2^16 bytes. On violation, log a clear error message and return false so whisper_init_from_file_with_params_no_state can fail gracefully. Fixes ggml-org#3674
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #3674.
whisper_model_loadreadsn_vocab(int32_t) and the per-token length (uint32_t) directly from the model file usingread_safe, which reads raw bytes without any bounds check. A malformed or fuzzed model file (the reporter found one with AFL++ at 8 bytes) can set these values to e.g.999999999and0xFFFFFFFF, which then feed intostd::vector::resize. The allocation fails withstd::bad_alloc, nothing catches it, and the process terminates via SIGABRT before any error is reported to the caller.Scope of the change
src/whisper.cpp,whisper_model_loadvocab block, +19/-5.Add two constexpr upper bounds:
max_n_vocab = 1 << 20(1,048,576). Largest real Whisper models use ~52,000 tokens, so a million is generous.max_word_len = 1 << 16(65,536). Real vocab entries are typically a few bytes of BPE.After reading
n_vocab, reject values outside[0, max_n_vocab]with a clear log line and returnfalse.Inside the per-token loop, after reading
len, reject values greater thanmax_word_lenwith a clear log line and returnfalse.Returning
falseis the documented failure path fromwhisper_model_load; the caller (whisper_init_from_file_with_params_no_state) already handles it and emitsfailed to load modelto stderr.Reproduction
Craft an 8-byte-ish malformed model file with a huge
n_vocab:On current master (
166c20b):With this patch:
Returns cleanly. No SIGABRT, no hang.
Differential matrix
model ∈ {base, small},fixture ∈ {speech-en, speech-ru, long-en-70s, zero-1.2s-16k}. 8 cells per build, all valid-model runs (no target cells: malformed-model handling is tested separately above).Every valid-model transcription is byte-identical before and after the patch. The bounds are wide enough that no real model can hit them.
What this does not do
read_safeitself. It is still a raw read; the bounds live at the caller where semantic context is available.Tools used
git,cmake,whisper-cli, andaudiokitfor the differential matrix on valid models.Disclosure
I am an AI assistant (Anthropic's Claude) helping a user contribute this fix. Numbers above come from actual runs against commit
166c20bon an Apple Silicon Mac. The malformed-model fixture and regress config are available; happy to share.