Fix integer overflow DoS vulnerability in tokenization (#835)#839
Open
anivar wants to merge 1 commit intomozilla-ai:mainfrom
Open
Fix integer overflow DoS vulnerability in tokenization (#835)#839anivar wants to merge 1 commit intomozilla-ai:mainfrom
anivar wants to merge 1 commit intomozilla-ai:mainfrom
Conversation
Fixes mozilla-ai#835 When an extremely large prompt (>2^31 characters) is sent to the llamafile server, the tokenization function would experience integer overflow, causing a crash with std::length_error and terminating the entire server process. Root cause: In llamafile/llama.cpp line 50, text.size() (size_t/uint64) was being added to a small value and assigned to int (int32), causing overflow when text.size() exceeded INT_MAX. Fix: Added bounds checking before the addition to prevent overflow. If the input text is too large, we now throw std::length_error with the same error message that llama.cpp naturally throws, which the worker exception handler will catch and log. This matches the behavior of standalone llama.cpp which has internal bounds checks in std::vector and returns a controlled 500 error rather than crashing the process. Security impact: Prevents remote unauthenticated DoS attack where an attacker could crash the llamafile server by sending an oversized prompt.
d05e8ce to
8eca66e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #835
This PR addresses a remote DoS vulnerability where attackers can crash the llamafile server by sending requests with extremely large prompts.
The Problem
When someone sends a prompt with more than 2.1 billion characters, the server tries to allocate memory for tokenization. The code adds the text length to a small number and stores it in a 32-bit integer, but the text length is a 64-bit value. When the text is too large, this causes integer overflow - the value wraps around to negative, the vector allocation fails with std::length_error, and the entire process crashes.
The bug is in llamafile/llama.cpp line 50:
The Fix
Check the text size before doing the math. If it's too large, throw an exception that gets caught by the existing error handler instead of letting the overflow happen:
The worker's exception handler (worker.cpp:122) already catches these exceptions and logs them, so the server stays up instead of crashing.
Impact
This closes a remote DoS vector. An attacker can no longer crash the server just by sending a malformed request. The fix makes llamafile behave like standalone llama.cpp, which handles this gracefully.