Fix integer overflow DoS vulnerability in tokenization (#835) by anivar · Pull Request #839 · mozilla-ai/llamafile

anivar · 2025-12-07T04:52:23Z

Fixes #835

This PR addresses a remote DoS vulnerability where attackers can crash the llamafile server by sending requests with extremely large prompts.

The Problem

When someone sends a prompt with more than 2.1 billion characters, the server tries to allocate memory for tokenization. The code adds the text length to a small number and stores it in a 32-bit integer, but the text length is a 64-bit value. When the text is too large, this causes integer overflow - the value wraps around to negative, the vector allocation fails with std::length_error, and the entire process crashes.

The bug is in llamafile/llama.cpp line 50:

int n_tokens = text.size() + 2 * add_special;

The Fix

Check the text size before doing the math. If it's too large, throw an exception that gets caught by the existing error handler instead of letting the overflow happen:

if (text.size() > static_cast<size_t>(INT_MAX) - 2) {
    throw std::length_error("cannot create std::vector larger than max_size()");
}

The worker's exception handler (worker.cpp:122) already catches these exceptions and logs them, so the server stays up instead of crashing.

Impact

This closes a remote DoS vector. An attacker can no longer crash the server just by sending a malformed request. The fix makes llamafile behave like standalone llama.cpp, which handles this gracefully.

Fixes mozilla-ai#835 When an extremely large prompt (>2^31 characters) is sent to the llamafile server, the tokenization function would experience integer overflow, causing a crash with std::length_error and terminating the entire server process. Root cause: In llamafile/llama.cpp line 50, text.size() (size_t/uint64) was being added to a small value and assigned to int (int32), causing overflow when text.size() exceeded INT_MAX. Fix: Added bounds checking before the addition to prevent overflow. If the input text is too large, we now throw std::length_error with the same error message that llama.cpp naturally throws, which the worker exception handler will catch and log. This matches the behavior of standalone llama.cpp which has internal bounds checks in std::vector and returns a controlled 500 error rather than crashing the process. Security impact: Prevents remote unauthenticated DoS attack where an attacker could crash the llamafile server by sending an oversized prompt.

github-actions Bot added the llamafile label Dec 7, 2025

anivar force-pushed the fix/integer-overflow-dos-835 branch from d05e8ce to 8eca66e Compare December 7, 2025 04:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix integer overflow DoS vulnerability in tokenization (#835)#839

Fix integer overflow DoS vulnerability in tokenization (#835)#839
anivar wants to merge 1 commit intomozilla-ai:mainfrom
anivar:fix/integer-overflow-dos-835

anivar commented Dec 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anivar commented Dec 7, 2025

The Problem

The Fix

Impact

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant