Fix integer overflow DoS vulnerability in tokenization

anivar · anivar · commit d05e8cef3692 · 2025-12-06T23:50:48.000-05:00
Fixes #835 When an extremely large prompt (>2^31 characters) is sent to the llamafile server, the tokenization function would experience integer overflow, causing a crash with std::length_error and terminating the entire server process. Root cause: In llamafile/llama.cpp line 50, text.size() (size_t/uint64) was being added to a small value and assigned to int (int32), causing overflow when text.size() exceeded INT_MAX. Fix: Added bounds checking before the addition to prevent overflow. If the input text is too large, we now throw std::length_error with the same error message that llama.cpp naturally throws, which the worker exception handler will catch and log. This matches the behavior of standalone llama.cpp which has internal bounds checks in std::vector and returns a controlled 500 error rather than crashing the process. Security impact: Prevents remote unauthenticated DoS attack where an attacker could crash the llamafile server by sending an oversized prompt.
diff --git a/llamafile/llama.cpp b/llamafile/llama.cpp
@@ -18,6 +18,8 @@
 #include "llama.h"
 #include "llama.cpp/llama.h"
 #include <cassert>
+#include <climits>
+#include <stdexcept>
 #include <string>
 #include <vector>
 
@@ -47,6 +49,12 @@ std::string llamafile_token_to_piece(const llama_context *ctx, llama_token token
 std::vector<llama_token> llamafile_tokenize(const struct llama_model *model,
                                             const std::string_view &text, bool add_special,
                                             bool parse_special) {
+    // Prevent integer overflow: ensure text.size() + 2 fits in an int
+    // INT_MAX is typically 2147483647, so check before the addition
+    if (text.size() > static_cast<size_t>(INT_MAX) - 2) {
+        throw std::length_error("cannot create std::vector larger than max_size()");
+    }
+
     int n_tokens = text.size() + 2 * add_special;
     std::vector<llama_token> result(n_tokens);
     n_tokens = llama_tokenize(model, text.data(), text.size(), result.data(), result.size(),