UPSTREAM PR #21405: vendor : update cpp-httplib to 0.40.1 by loci-dev · Pull Request #1331 · auroralabs-loci/llama.cpp

loci-dev · 2026-04-04T02:17:22Z

Note

Source pull request: ggml-org/llama.cpp#21405

Overview

Additional information

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

Replace httplib::any / std::map<std::string, httplib::any> based res.user_data with a new type-safe UserData class that provides set() / get() API by @yhirose (Fix RequestHandlerTest.ResponseUserDataInPreRouting segfaulting yhirose/cpp-httplib#2416)
Add parse_url utility function with UrlComponents struct for parsing URLs into scheme, host, port, path, and query components by @yhirose
[cmake] Allow using pre-existing zstd::libzstd target if it already exists, useful for projects that bundle their own zstd by @crueter in [cmake] Allow using pre-existing zstd target if it exists yhirose/cpp-httplib#2390
Add header parser and URL parser fuzzers for OSS-Fuzz coverage improvement by @DavidKorczynski in Add two new fuzzers yhirose/cpp-httplib#2412

loci-review · 2026-04-04T04:35:51Z

Overview

Single commit updates cpp-httplib vendor library (0.40.0 → 0.40.1). Analysis covers 125,669 functions across 15 binaries: 146 modified (0.12%), 186 new (0.15%), 93 removed (0.07%), 125,244 unchanged (99.66%).

Power consumption changes (all <0.2%):

build.bin.llama-bench: +0.185%
build.bin.llama-cvector-generator: +0.176%
build.bin.llama-tts: +0.092%
All other binaries (libllama.so, libmtmd.so, llama-tokenize, llama-quantize, llama-qwen2vl-cli, llama-gemma3-cli, llama-gguf-split, llama-llava-cli, llama-minicpmv-cli, libggml.so, libggml-cpu.so, libggml-base.so): 0.000%

Key finding: Major HTTP/WebSocket client initialization improvements (85-88% faster) with minor regressions in non-critical utility functions.

Function Analysis

Major Improvements (HTTP Client Initialization):

httplib::Client::Client() (llama-bench, llama-tts, llama-cvector-generator):

Response time: 61,787-61,825 ns → 7,131-7,165 ns (-88.4% to -88.5%, ~54,000 ns savings)
Throughput time: 591-609 ns → 379 ns (-36% to -38%)
Source change: Replaced regex-based URL parsing (std::regex_match, 43,000 ns compilation + 7,400 ns matching) with structured detail::parse_url() (1,733 ns) using UrlComponents. Eliminated expensive regex compilation from constructor hot path.

httplib::WebSocketClient::WebSocketClient() (llama-bench, llama-tts, llama-cvector-generator):

Response time: 62,761-62,819 ns → 8,893-8,945 ns (-85.7% to -85.8%, ~54,000 ns savings)
Throughput time: 629-638 ns → 442-444 ns (-29.7% to -30.4%)
Source change: Same optimization pattern—replaced regex validation with structured parsing, applied move semantics for host/path assignments.

STL Template Improvements (llama-tts):

std::vector<jinja::token>::begin(): 265 ns → 84 ns (-68.2%)
nlohmann::json::get(): 243 ns → 61 ns throughput (-75.0%)
__gnu_cxx::__ops::__pred_iter(): 269 ns → 93 ns throughput (-65.6%)
Cause: cpp-httplib eliminated std::any and std::regex, reducing template bloat and enabling better compiler optimizations for all STL templates.

Minor Regressions (Non-Critical Paths):

std::make_error_code() (llama-bench):

Response time: 145 ns → 333 ns (+128.7%, +187 ns)
Throughput time: 109 ns → 296 ns (+171.5%, +187 ns)
Cause: Compiler-generated entry block indirection (9 blocks → 11 blocks). Error handling path, not in hot path.

serialize_parser_variant() (llama-cvector-generator):

Throughput time: 62 ns → 182 ns (+193.3%, +120 ns)
Cause: Unnecessary entry block indirection from compiler code generation. Called once per grammar compilation, not per token.

httplib::ClientImpl::stop() (llama-bench):

Throughput time: 130 ns → 308 ns (+137.3%, +178 ns)
Cause: Entry point indirect branching. Client cleanup function, infrequently called.

Other analyzed functions (std::swap, std::function::operator=, SSEClient::set_headers, Jinja template utilities) showed minor regressions (65-86 ns) in non-critical paths due to compiler code generation changes.

Flame Graph Comparison

Selected function: httplib::Client::Client() (llama-tts) — best illustrates the 88% response time improvement from regex elimination.

Base version:

Target version:

Base version dominated by regex compilation (_M_compile: 42,990 ns, 68% of total) and matching (7,374 ns). Target version eliminates all regex operations, replacing with lightweight parse_url (1,730 ns, 24% of total). Call depth reduced from 10 to 4 levels, achieving 8.7x speedup.

Additional Findings

Zero impact on core inference operations: All changes isolated to HTTP/networking infrastructure. No modifications to performance-critical paths: matrix operations (GEMM), attention mechanisms, KV cache, quantization kernels, or GPU backends (CUDA, Metal, HIP, Vulkan, SYCL). Core inference libraries (libllama.so, libggml.so, libggml-cpu.so) show 0.000% power consumption change.

Architectural alignment: Update follows llama.cpp's philosophy of using specialized implementations over generic libraries. HTTP client initialization improvements benefit model downloading, API communication, and benchmarking infrastructure without touching inference engine.

💬 Questions? Tag @loci-dev

vendor : update cpp-httplib to 0.40.1

9738471

loci-dev temporarily deployed to PROD__AL_DEMO April 4, 2026 02:17 — with GitHub Actions Inactive

loci-dev force-pushed the main branch 8 times, most recently from 34734bc to 55afbee Compare April 11, 2026 02:17

loci-dev force-pushed the main branch 9 times, most recently from d101579 to 63ab8d1 Compare April 18, 2026 02:17

loci-dev force-pushed the main branch 2 times, most recently from 7638ab4 to f1b46d5 Compare April 20, 2026 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #21405: vendor : update cpp-httplib to 0.40.1#1331

UPSTREAM PR #21405: vendor : update cpp-httplib to 0.40.1#1331
loci-dev wants to merge 1 commit intomainfrom
loci/pr-21405-cpp-httplib-041

loci-dev commented Apr 4, 2026

Uh oh!

loci-review bot commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Apr 4, 2026

Overview

Additional information

Requirements

Uh oh!

loci-review bot commented Apr 4, 2026

Overview

Function Analysis

Flame Graph Comparison

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants