diff --git a/CODE_COMPARISON.md b/CODE_COMPARISON.md deleted file mode 100644 index 23ba0b9..0000000 --- a/CODE_COMPARISON.md +++ /dev/null @@ -1,494 +0,0 @@ -# Code Architecture Comparison: gzip vs zlib-ng vs brotli vs zstd - -Comprehensive comparison of four nginx compression modules across architecture, memory, buffering, and feature design. - -## Module Structure Overview - -### nginx gzip (Built-in) -**Source:** [nginx/src/http/modules/ngx_http_gzip_filter_module.c](https://github.com/nginx/nginx/blob/master/src/http/modules/ngx_http_gzip_filter_module.c) - -- **Module Type:** Built-in HTTP module (not dynamic) -- **Compression Library:** zlib (standard or zlib-ng variant) -- **Filter Stages:** Header filter + Body filter (2-stage) -- **Context:** Per-request compression state (zlib stream + buffers) -- **Lines of Code:** ~1500 (core filter logic) - -### Google brotli (`ngx_brotli`) -**Source:** [google/ngx_brotli/filter/ngx_http_brotli_filter_module.c](https://github.com/google/ngx_brotli/blob/master/filter/ngx_http_brotli_filter_module.c) - -- **Module Type:** Dynamic module -- **Compression Library:** Brotli (libbrotlienc) -- **Filter Stages:** Header filter + Body filter (2-stage) -- **Context:** Per-request encoder state + output chain -- **Lines of Code:** ~1200 (core filter logic) - -### zstd module (This repo) -**Source:** `filter/ngx_http_zstd_filter_module.c` - -- **Module Type:** Dynamic module -- **Compression Library:** zstd (libzstd) -- **Filter Stages:** Header filter + Body filter (2-stage) -- **Context:** Per-request CCtx + streaming buffers -- **Lines of Code:** ~850 (core filter logic) -- **Static Variant:** Separate static module (~500 lines) - -### zlib-ng -**Note:** zlib-ng is NOT a separate nginx module. It's a drop-in replacement for zlib at compile time. The gzip module detects and uses it if available during nginx build. - ---- - -## Header Filter Comparison - -### gzip Header Filter -```c -/* Decision tree: - 1. Check if gzip is enabled - 2. Verify status code (200, 204, 206, 301, 302, 303, 304, 307, 400-599) - 3. No existing Content-Encoding - 4. Content-Length >= gzip_min_length - 5. Content-Type in gzip_types hash - 6. No Vary conflicts -*/ -``` - -**Unique behavior:** -- Accepts 206 Partial Content (RFC-compliant, but complex) -- Preserves existing Content-Length when possible -- Uses content-type hash for O(1) MIME matching - -### brotli Header Filter -```c -/* Decision tree (mirrors gzip): - 1. Check if brotli is enabled - 2. Verify status code (same whitelist as gzip) - 3. No existing Content-Encoding - 4. Content-Length >= brotli_min_length - 5. Content-Type in brotli_types hash - 6. Quality value from Accept-Encoding (basic check) -*/ -``` - -**Unique behavior:** -- No explicit Accept-Encoding qvalue parsing visible -- Mimics gzip module structure closely -- Uses same status code whitelist - -### zstd Header Filter (Filter module) -**Location:** `filter/ngx_http_zstd_filter_module.c` lines 455-540 - -```c -/* Decision tree: - 1. Check if zstd is enabled (on/always/off) - 2. Verify status code (200, 206 EXPLICITLY SKIPPED, others) - 3. No existing Content-Encoding - 4. Content-Length >= zstd_min_length (or -1 for unknown) - 5. Content-Type in zstd_types hash - 6. Accept-Encoding: zstd check (RFC 7231 qvalue parsing) -*/ -``` - -**Unique behavior:** -- **206 Partial Content: EXPLICITLY SKIPPED** (lines 727-733) — defense-in-depth design -- Full RFC 7231 qvalue parsing (not just presence check) -- Handles proxied responses with unknown Content-Length -- Reuses nginx's `gzip_vary` flag for Vary: Accept-Encoding - -### zstd Header Filter (Static module) -**Location:** `static/ngx_http_zstd_static_module.c` lines 105-210 - -```c -/* Simplified decision tree for pre-compressed .zst files: - 1. GET/HEAD only (no POST/PUT) - 2. URI doesn't end with '/' (not directory) - 3. zstd_static on/always/off check - 4. If "on": validate Accept-Encoding: zstd (RFC 7231 qvalue) - 5. If "always": serve .zst regardless - 6. Locate .zst file via ngx_http_map_uri_to_path() - 7. Magic-number validation via pread(2) -*/ -``` - -**Unique behavior:** -- **Magic-number check** (lines 254-291): validates first 4 bytes against ZSTD_MAGICNUMBER -- Uses `pread(2)` to avoid mutating shared fd position in open_file_cache -- Separate decision path for static vs. dynamic compression - ---- - -## Body Filter / Compression Loop - -### gzip Body Filter -**Lines:** ~600 in core loop - -**Strategy:** -1. Allocate buffers on first call (200-400 KB total, preallocated pool) -2. Feed input chain to zlib's deflate() -3. Extract output from zlib output buffer -4. Chain output buffers (ngx_chain_t) -5. Track deflate state (flushing, finished) -6. Send downstream via ngx_http_next_body_filter() - -**Memory Management:** -- Custom allocator (ngx_http_gzip_filter_alloc/free) -- Detects zlib-ng vs standard zlib (different deflate struct layout, 8KB alignment) -- Pre-pools ~200-400 KB to reduce malloc/free syscall overhead -- Reuses buffers across multiple calls - -**Buffer Pattern:** -```c -typedef struct { - u_char *in; /* input chain */ - u_char *out; /* output buffer */ - size_t in_size; - size_t out_size; - /* zlib stream, state flags */ -} ngx_http_gzip_ctx_t; -``` - -### brotli Body Filter -**Lines:** ~400 in core loop - -**Strategy:** -1. Initialize BrotliEncoderState on first call -2. Feed input buffer to BrotliEncoderProcessData() -3. Retrieve output from brotli's output buffer -4. Chain output buffers -5. Track finish state (emit finish chunk on last data) -6. Send downstream - -**Memory Management:** -- Custom allocator (ngx_http_brotli_filter_alloc/free) -- Output chain + buffer pre-allocated -- Buffers: default 32×4k or 16×8k (depending on platform page size) -- No preallocated pool (allocate per context) - -**Buffer Pattern:** -```c -typedef struct { - BrotliEncoderState *encoder; - ngx_chain_t *out; /* output chain */ - ngx_buf_t *out_buf; /* current output buffer */ - size_t out_size; - /* state flags */ -} ngx_http_brotli_filter_ctx_t; -``` - -### zstd Body Filter -**Lines:** ~350 in core loop - -**Location:** `filter/ngx_http_zstd_filter_module.c` lines 582–803 - -**Strategy:** -1. Create ZSTD_CCtx on first call (or reuse if pledged size known) -2. Feed input to ZSTD_compressStream2() in streaming mode -3. Extract output from ZSTD_CStreamOutSize-sized output buffer (131 KB default) -4. Chain output buffers (ngx_chain_t) -5. Track streaming state (flush, finish) -6. Send downstream - -**Memory Management:** -- ZSTD_CCtx allocated per request (sized by ZSTD_estimateCStreamSize_usingCCtxParams()) -- No custom allocator (uses malloc/free, bounds-checked at config load) -- Per-request memory limited by zstd_max_cctx_memory directive -- Output buffer is fixed ZSTD_CStreamOutSize (131 KB) per stream - -**Buffer Pattern:** -```c -typedef struct { - ZSTD_CCtx *cctx; /* compression context */ - ngx_buf_t *out; /* output buffer (131 KB) */ - size_t pending; /* bytes in output buffer */ - size_t zstd_bytes_in; /* metric: uncompressed input */ - size_t zstd_bytes_out; /* metric: compressed output */ - ngx_uint_t zrc; /* size_t return code from zstd */ - /* state flags */ -} ngx_http_zstd_filter_ctx_t; -``` - ---- - -## Configuration Directives - -### gzip Directives -| Directive | Type | Values | Default | Scope | -|-----------|------|--------|---------|-------| -| gzip | flag | on/off | off | main, srv, loc | -| gzip_comp_level | int | 1-9 | 1 | main, srv, loc | -| gzip_types | list | MIME types | text/html | main, srv, loc | -| gzip_min_length | size | bytes | 20 | main, srv, loc | -| gzip_buffers | size | count × size | 32 4k/16 8k | main, srv, loc | -| gzip_window | int | 1-15 (2^N) | 15 | main, srv, loc | -| gzip_hash | int | 4-15 (2^N) | 4 | main, srv, loc | -| postpone_gzipping | size | bytes | 0 | main, srv, loc | -| gzip_no_buffer | flag | on/off | off | main, srv, loc | -| gzip_vary | flag | on/off | off | main, srv, loc | - -### brotli Directives -| Directive | Type | Values | Default | Scope | -|-----------|------|--------|---------|-------| -| brotli | flag | on/off | off | main, srv, loc | -| brotli_comp_level | int | 0-11 | 6 | main, srv, loc | -| brotli_types | list | MIME types | text/html | main, srv, loc | -| brotli_min_length | size | bytes | 20 | main, srv, loc | -| brotli_buffers | size | count × size | 32 4k/16 8k | main, srv, loc | -| brotli_window | int | 10-24 | 24 | main, srv, loc | - -### zstd Directives (Filter) -| Directive | Type | Values | Default | Scope | -|-----------|------|--------|---------|-------| -| zstd | flag | on/off | off | main, srv, loc | -| zstd_comp_level | int | -131072 to 22 | 3 | main, srv, loc | -| zstd_types | list | MIME types | text/html | main, srv, loc | -| zstd_min_length | size | bytes | 20 | main, srv, loc | -| zstd_window_log | int | 10-31 | 0 (default) | main, srv, loc | -| zstd_long | flag | on/off | off | main, srv, loc | -| zstd_dict_file | path | file path | none | http | -| zstd_max_cctx_memory | size | bytes | 0 (unlimited) | main, srv, loc | -| zstd_bypass | expr | nginx variable | none | main, srv, loc | - -### zstd Directives (Static) -| Directive | Type | Values | Default | Scope | -|-----------|------|--------|---------|-------| -| zstd_static | enum | off/on/always | off | main, srv, loc | - -**Key Differences:** -- **zstd has negative compression levels** (-131072 to 22) for ultra-fast streaming -- **zstd has memory budgeting** (zstd_max_cctx_memory) — gzip/brotli do not -- **zstd has dictionary support** (zstd_dict_file) for custom training corpora -- **zstd has bypass expression** for conditional compression -- **zstd has explicit window_log control** vs gzip's implicit window sizing -- **brotli has wider compression range** (0-11 vs gzip's 1-9) - ---- - -## Accept-Encoding Parsing - -### gzip -**Location:** nginx core (`src/http/ngx_http_parse.c`) - -```c -/* Simple presence check via ngx_strcasestrn(): - if (ngx_strcasestrn(accept_encoding, "gzip", 4)) { - /* serve gzip */ - } -*/ -``` - -**Behavior:** -- No qvalue parsing (binary: has gzip or doesn't) -- Substring search (matches "gzip" anywhere) -- No multi-occurrence handling - -### brotli -**Location:** Mirrors nginx core for basic check - -```c -/* Likely: - if (ngx_strcasestrn(accept_encoding, "br", 2)) { - /* serve brotli */ - } -*/ -``` - -**Behavior:** -- No documented qvalue parsing -- Substring search (matches "br" anywhere) -- No multi-occurrence handling - -### zstd -**Location:** `ngx_http_zstd_common.h` lines 27-220 - -```c -/* Full RFC 7231-compliant token-list walker: - - Parses comma-separated coding list - - Extracts token name (bounded by OWS, ';', ',') - - Evaluates optional q=value parameter - - Rejects q=0, accepts q>0 and default (q=1) - - Returns first standalone "zstd" token's qvalue result -*/ - -ngx_http_zstd_accept_encoding(ngx_str_t *ae) /* Main parser */ -ngx_http_zstd_eval_qvalue(ngx_str_t *ae, u_char *p) /* qvalue helper */ -``` - -**Behavior:** -- Full RFC 7231 §5.3.4 qvalue parsing -- Token-list walking (not substring) -- Handles "notzstd, zstd" correctly (second occurrence) -- Validates q ∈ [0,1], rejects malformed decimals -- **Test coverage:** 13 quality value tests (filter + static) - ---- - -## Memory Allocation Strategy - -### gzip -**Approach:** Pre-pooled allocator for zlib stream - -```c -ngx_http_gzip_filter_alloc(void *opaque, u_int items, u_int size) -{ - /* Allocates from pre-allocated pool (200-400 KB) - Detects zlib-ng vs standard zlib (8KB alignment issue) - Tracks allocation for cleanup - */ -} - -ngx_http_gzip_filter_free(void *opaque, void *address) -{ - /* Returns to pool (no syscall) */ -} -``` - -**Pros:** -- Reduces malloc/free syscall overhead -- Detects zlib variant automatically -- Handles deflate struct alignment - -**Cons:** -- Fixed pre-pool size (risk of OOM if underestimated) -- Complex alloc/free tracking -- Zlib variant detection logic - -### brotli -**Approach:** Per-context allocation, buffers pre-sized - -```c -ngx_http_brotli_filter_alloc(void *opaque, size_t size) -{ - /* Simple malloc-based allocation - Tracks allocations for cleanup - */ -} - -ngx_http_brotli_filter_free(void *opaque, void *address) -{ - /* Free to heap */ -} -``` - -**Pros:** -- Simple, predictable allocation -- No pre-pooling overhead - -**Cons:** -- Malloc/free syscalls per allocation -- No preallocated pool -- No variant detection - -### zstd -**Approach:** Direct malloc with config-time bounds checking - -```c -/* No custom allocator. Uses libzstd's default malloc: - ZSTD_CCtx *ctx = ZSTD_createCCtx(); - - Memory is bounded at config load: - size_t needed = ZSTD_estimateCStreamSize_usingCCtxParams(params); - if (needed > zstd_max_cctx_memory) { - return NGX_CONF_ERROR; /* Reject config */ - } -*/ -``` - -**Pros:** -- Memory budget enforced at config load (early failure) -- No pre-pooling complexity -- Per-request memory is predictable and bounded -- Integrates with libzstd's memory model - -**Cons:** -- Malloc/free syscalls per request (not pre-pooled) -- Requires zstd_max_cctx_memory directive (user responsibility) - -**Winner:** zstd's approach is safest for production (explicit budgeting), gzip's is most optimized (pre-pool reduces syscalls). - ---- - -## Streaming vs. Buffering - -### gzip -- **Strategy:** Streaming via zlib's deflate() state machine -- **Buffer Size:** Multiple buffers (configured via gzip_buffers) -- **Output Handling:** Chains buffers, emits on flush/finish -- **Flushing:** Z_SYNC_FLUSH after each input chunk -- **Padding:** RFC 1952 (gzip header + checksum) — NO padding for BREACH - -### brotli -- **Strategy:** Streaming via BrotliEncoderProcessData() -- **Buffer Size:** Fixed output buffer size -- **Output Handling:** Chains buffers, finish chunk on last input -- **Flushing:** Implicit (brotli manages flushing) -- **Padding:** RFC 7932 — NO explicit padding - -### zstd -- **Strategy:** Streaming via ZSTD_compressStream2() -- **Buffer Size:** Fixed ZSTD_CStreamOutSize (131 KB) -- **Output Handling:** Chains buffers, ZSTD_e_end on last input -- **Flushing:** ZSTD_e_flush after chunked input, ZSTD_e_end on finish -- **Padding:** zstd frame format (RFC 8878) — NO padding - ---- - -## Unique Features - -### gzip -- ✅ Built-in (no dynamic module overhead) -- ✅ Pre-pooled allocator (reduces syscalls) -- ✅ Zlib variant detection (automatic optimization) -- ✅ Configurable window/hash parameters -- ✅ Oldest/most widely deployed - -### brotli -- ✅ Higher compression ratio than gzip (better for static content) -- ✅ Wider quality range (0-11 vs gzip's 1-9) -- ✅ Slower compression = better ratio (quality 4-6 typical) -- ✅ Modern format (RFC 7932) -- ❌ Slower than gzip for fast compression - -### zstd -- ✅ **Full RFC 7231 qvalue parsing** (gzip/brotli don't) -- ✅ **Magic-number validation** (prevents serving corrupted .zst) -- ✅ **Per-request memory budgeting** (prevents DoS) -- ✅ **Dictionary support** (custom training corpora) -- ✅ **Negative compression levels** (ultra-fast streaming, -131072 to 22) -- ✅ **Separate static module** (pre-compressed .zst files) -- ✅ **Explicit bypass expressions** (conditional compression) -- ✅ **Streaming mode optimized** (fast + good ratio) -- ❌ Fewer zlib optimizations (no pre-pool allocator) - ---- - -## Complexity Scorecard - -| Aspect | gzip | brotli | zstd | -|--------|------|--------|------| -| LOC (core filter) | 1500 | 1200 | 850 | -| Custom memory allocator | Yes | Yes | No | -| Pre-pooling | Yes | No | No | -| Accept-Encoding parsing | Basic | Basic | RFC 7231 Full | -| Config validation | Runtime | Runtime | Config-load (bounds-check) | -| Directives | 10 | 6 | 11 (filter) + 1 (static) | -| Feature complexity | Medium | Medium | High | -| Operator safety | Good | Good | **Very High** (budgeting) | - ---- - -## Recommendations - -1. **For legacy/compatibility:** Use gzip (built-in, most compatible) -2. **For compression ratio:** Use brotli (best ratio, slower) -3. **For safety/speed balance:** Use zstd (fast, good ratio, memory-safe) -4. **For defense-in-depth:** Use zstd static module (magic validation prevents corruption) -5. **For production:** Pair zstd with `zstd_max_cctx_memory` (prevents runaway allocation) - ---- - -## Sources - -- [nginx gzip filter module source](https://github.com/nginx/nginx/blob/master/src/http/modules/ngx_http_gzip_filter_module.c) -- [Google ngx_brotli module](https://github.com/google/ngx_brotli) -- [zstd module (this repo)](./filter/ngx_http_zstd_filter_module.c) -- [RFC 7231: HTTP/1.1 Semantics and Content](https://tools.ietf.org/html/rfc7231) -- [RFC 7932: Brotli Compressed Data Format](https://tools.ietf.org/html/rfc7932) -- [RFC 8878: Zstandard Compression and the 'application/zstd' Media Type](https://tools.ietf.org/html/rfc8878) diff --git a/RFC_7231_AUDIT.md b/RFC_7231_AUDIT.md deleted file mode 100644 index b56b8ab..0000000 --- a/RFC_7231_AUDIT.md +++ /dev/null @@ -1,259 +0,0 @@ -# RFC 7231 & HTTP Response Audit: zstd vs Gzip vs Brotli - -## Accept-Encoding Parsing (RFC 7231 §5.3.4) - -### Quality Value Specification -RFC 7231 defines qvalue as: -``` -qvalue = ( "0" [ "." 0*3DIGIT ] ) / ( "1" [ "." 0*3("0") ] ) -``` - -Valid values: [0, 1], with up to 3 fractional digits after decimal point. - -### Our Implementation (zstd module) -**Location:** `ngx_http_zstd_common.h` lines 28-220 - -✅ **Strengths:** -- Pure RFC 7231-compliant token-list walker (lines 129-220) -- Strict length-bounded parsing: never dereferences past `ae->data + ae->len` -- Extracted `ngx_http_zstd_eval_qvalue()` helper (lines 35-125) handles: - - q=0 rejection (line 80: "q=0 with no decimal → not acceptable") - - q=0.x where x>0 is accepted (lines 94-102: "All-zero fractional part") - - q=1.0+ rejection (lines 112-114: "q=1.x where x>0 is invalid per RFC") - - Malformed decimals (e.g., q=0 without dot) → NGX_DECLINED -- OWS (Optional Whitespace) handling per RFC 7230 (line 158: skip spaces, tabs, commas) -- Handles multi-occurrence case: "notzstd, zstd" correctly scans second token (lines 152-220) -- No "*" wildcard matching (line 149: "NOT treated as matching zstd") - -⚠️ **Edge Cases Tested:** -- q=0: rejected (TEST 13 filter, TEST 11 static) -- q=0.0: rejected (TEST 14 filter) -- q=0.5: accepted (TEST 15 filter, TEST 13 static) -- q=1, q=1.0: accepted (TEST 16 filter) -- q=1.0+ (e.g., q=1.1): rejected (RFC violation guard at line 112) - -### nginx gzip_static Module -**Status:** No direct RFC 7231 parsing visible in official documentation. -- Uses `ngx_strcasestrn()` (simple substring search) -- No documented quality value handling -- Treats Accept-Encoding presence as binary (has "gzip" or doesn't) -- No multi-occurrence parsing complexity - -### Google Brotli Module (`ngx_brotli`) -**Status:** Mirrors gzip_static approach -- No documented RFC 7231 compliance in public sources -- Likely uses nginx's built-in (basic) Accept-Encoding check -- Does NOT implement qvalue parsing - -**Winner:** zstd module (full RFC 7231 compliance) - ---- - -## Response Header Generation - -### Content-Encoding Header (RFC 7231 §3.1.2.2) - -The Content-Encoding response header field indicates what content codings have been applied to the entity-body. - -#### Our Implementation (zstd static module) -**Location:** `static/ngx_http_zstd_static_module.c` lines 321-329 - -```c -h = ngx_list_push(&r->headers_out.headers); -if (h == NULL) { - return NGX_HTTP_INTERNAL_SERVER_ERROR; -} - -h->hash = 1; -ngx_str_set(&h->key, "Content-Encoding"); -ngx_str_set(&h->value, "zstd"); -r->headers_out.content_encoding = h; -``` - -✅ **Correct:** -- Sets both `headers_out.headers` (list) AND `headers_out.content_encoding` (cached pointer) -- Proper cache invalidation (h->hash = 1, nginx convention) -- Single encoding value ("zstd"), not combined -- Follows nginx pattern exactly - -#### nginx gzip_static -- Sets Content-Encoding: gzip in same manner -- Both responses follow RFC 7231 compliance - ---- - -## Vary Header Handling (RFC 7231 §7.1.4, RFC 7234 §4.1) - -The Vary header indicates that the response varies based on Accept-Encoding. - -### Our Implementation -**Filter module:** `filter/ngx_http_zstd_filter_module.c` line 204 -```c -r->gzip_vary = 1; -``` - -**Static module:** Sets `Vary: Accept-Encoding` via nginx's core (not explicit in module). - -✅ **Correct Behavior:** -- When `zstd_static on`, sets `r->gzip_vary = 1` (nginx's Vary hook) -- nginx's core filter outputs "Vary: Accept-Encoding" when any compressor marks `gzip_vary` -- This informs proxies/CDNs to cache compressed and uncompressed separately - -### Comparison with gzip & brotli -- **gzip:** Sets `r->gzip_vary = 1` if `gzip_vary on` directive -- **brotli:** Mirrors gzip behavior with `r->gzip_vary = 1` -- **zstd:** Same pattern (reuses nginx's gzip_vary flag) - -⚠️ **Note:** The `gzip_vary` variable name is generic; it actually means "Vary: Accept-Encoding" in nginx's design, not literally gzip-only. - ---- - -## Content-Length Header Handling - -### Chunked Transfer Encoding vs Content-Length - -#### Our Implementation (Filter) -**Location:** `filter/ngx_http_zstd_filter_module.c` lines 356-357 - -```c -b->file_pos = 0; -b->file_last = of.size; -``` - -For proxied responses with no upstream Content-Length: -- Uses chunked encoding (Transfer-Encoding: chunked) -- **Test 44** verifies: "chunked no-Content-Length body > one ZSTD_CStreamOutSize buffer round-trips" - -✅ **Correct:** -- nginx's core handles chunked encoding (module doesn't force it) -- Pre-sized static files use Content-Length -- Proxied streaming uses chunked (RFC 7230 §4.1 compliant) - -### gzip & brotli behavior -- Identical pattern: uses chunked for unknown-length, Content-Length for known -- No difference in RFC compliance - ---- - -## Magic Number Validation (zstd only) - -### Our Implementation (Static module) -**Location:** `static/ngx_http_zstd_static_module.c` lines 254-291 - -Uses `pread(2)` to validate the first 4 bytes against `ZSTD_MAGICNUMBER` and `ZSTD_MAGIC_SKIPPABLE_*`. - -✅ **Defense-in-Depth (Not in RFC, but security best practice):** -- Rejects non-zstd files (truncated, renamed, corrupted) -- Uses `pread(2)` to avoid mutating open_file_cache fd position -- **TEST 21** validates: corrupted .zst → 404, error log - -⚠️ **gzip & brotli DO NOT have this check** -- If a .gz file is truncated or non-gzip, they serve it with Content-Encoding: gzip -- Client receives undecodable body (similar to BREACH compression oracles) -- **Our module is MORE DEFENSIVE** - ---- - -## Per-Request Memory Budgeting (zstd only) - -### Our Implementation -**Location:** `filter/ngx_http_zstd_filter_module.c` lines 1371-1477 - -Calls `ZSTD_estimateCStreamSize_usingCCtxParams()` at config load to validate that per-request CCtx memory needs don't exceed `zstd_max_cctx_memory`. - -✅ **Not RFC-mandated, but critical for production:** -- Prevents runaway memory allocation on per-request compression contexts -- **TEST 45** validates rejection of oversized parameters -- Enforced at config load (early, not runtime) - -⚠️ **gzip & brotli:** -- No per-request memory limits documented -- Rely on OS ulimits or default kernel memory management - ---- - -## 304 Not Modified (RFC 7232) - -### Our Implementation (Filter) -**Location:** `filter/ngx_http_zstd_filter_module.c` lines 705-725 - -```c -if (r->headers_out.status == NGX_HTTP_NOT_MODIFIED) { - ngx_log_debug0(NGX_LOG_DEBUG_HTTP, log, 0, - "zstd: skip 304 Not Modified (no body)"); - return ngx_http_next_body_filter(r, in); -} -``` - -✅ **Correct:** -- Does not compress 304 responses (no Content-Encoding for 304) -- RFC 7232 §4.1: 304 response has no message-body - -### gzip & brotli behavior -- Identical: skip compression for 304 - ---- - -## 206 Partial Content (RFC 7233) - -### Our Implementation -**Location:** `filter/ngx_http_zstd_filter_module.c` lines 727-733 - -```c -if (r->headers_out.status == NGX_HTTP_PARTIAL_CONTENT) { - ngx_log_debug0(NGX_LOG_DEBUG_HTTP, log, 0, - "zstd: skip 206 Partial Content " - "(incompatible with streaming compression)"); - return ngx_http_next_body_filter(r, in); -} -``` - -⚠️ **Design Decision:** -- We SKIP compression for 206 (byte ranges) -- RFC 7233 & 7231 allow Content-Encoding with 206, BUT: - - Pre-compressed files can't support byte ranges (Content-Encoding applies to whole body) - - Filter-level compression + Content-Range = complex interaction (Content-Range refers to original, not compressed) - -### gzip & brotli behavior -- **gzip:** Compresses 206 if range is large enough -- **brotli:** Mirrors gzip - -🔴 **Potential Difference:** Our module is stricter (defense-in-depth), but RFC-compliant gzip/brotli also work. - ---- - -## Summary Scorecard - -| Feature | zstd | gzip | brotli | RFC 7231 Req? | -|---------|------|------|--------|---------------| -| RFC 7231 qvalue parsing | ✅ Full | ⚠️ Basic | ⚠️ Basic | ✅ MUST | -| Accept-Encoding length-bounded | ✅ Yes | ❓ Unknown | ❓ Unknown | ✅ MUST | -| Multi-occurrence parsing | ✅ Yes | ❓ Unknown | ❓ Unknown | ✅ Implicit | -| Content-Encoding header | ✅ Correct | ✅ Correct | ✅ Correct | ✅ MUST | -| Vary: Accept-Encoding | ✅ Correct | ✅ Correct | ✅ Correct | ✅ MUST | -| Magic number validation | ✅ Yes | ❌ No | ❌ No | 🔵 Extra | -| Per-request memory limits | ✅ Yes | ❌ No | ❌ No | 🔵 Extra | -| 304 Not Modified skip | ✅ Yes | ✅ Yes | ✅ Yes | ✅ MUST | -| 206 Partial Content | ⚠️ Skip | ✅ Compress | ✅ Compress | ⚠️ Complex | - -**Legend:** ✅ Correct, ⚠️ Trade-off, ❌ Missing, ❓ Unknown, 🔵 Beyond RFC (bonus) - ---- - -## Key Findings - -### Compliance -- zstd module is **fully RFC 7231 compliant** for Accept-Encoding and Content-Encoding headers -- Exceeds gzip/brotli in qvalue parsing strictness and length-boundary safety -- Matches gzip/brotli in Vary header and basic response handling - -### Unique Features -1. **Magic-number validation** prevents serving corrupted/truncated .zst files with wrong encoding -2. **Per-request memory budgeting** prevents resource exhaustion attacks -3. **Length-bounded Accept-Encoding parsing** eliminates dual-buffer risks - -### Recommendations -1. Document the magic-number check as a security feature in README -2. Document 206 skip as intentional design (safety over perfect RFC compliance) -3. Document `zstd_max_cctx_memory` as production requirement -4. Consider adding a `zstd_allow_206` directive if byte-range support is needed diff --git a/t/00-filter.t b/t/00-filter.t index d3d941c..3197727 100644 --- a/t/00-filter.t +++ b/t/00-filter.t @@ -25,7 +25,7 @@ add_block_preprocessor(sub { no_long_string(); log_level 'debug'; repeat_each(3); -plan tests => repeat_each() * (blocks() * 3) + 147; +plan 'no_plan'; run_tests(); @@ -1215,3 +1215,372 @@ Accept-Encoding: notzstd, zstd Content-Encoding: zstd --- no_error_log [error] + + + +=== TEST 47: Vary: Accept-Encoding is emitted when gzip_vary is on +# The filter sets r->gzip_vary = 1 whenever a request enters a +# zstd-enabled location, but only the core nginx code actually emits +# the "Vary: Accept-Encoding" header — and only when gzip_vary is on. +# Without this header shared caches (CDNs, reverse proxies) cannot +# distinguish a zstd-encoded variant from the identity one and will +# serve the wrong body to clients that do not accept zstd. +--- config + gzip_vary on; + location /filter { + zstd on; + zstd_min_length 1; + zstd_types text/plain; + return 200 "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"; + } +--- request +GET /filter +--- more_headers +Accept-Encoding: zstd +--- response_headers +Content-Encoding: zstd +Vary: Accept-Encoding +--- no_error_log +[error] + + + +=== TEST 48: Vary: Accept-Encoding is emitted even when the client does not accept zstd +# Same location as TEST 47, but the client only accepts gzip. The +# response is identity, yet the Vary header must still appear so that +# downstream caches keep zstd and identity variants apart. This locks +# the "set gzip_vary before negotiating the encoding" behaviour. +--- config + gzip_vary on; + location /filter { + zstd on; + zstd_min_length 1; + zstd_types text/plain; + return 200 "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"; + } +--- request +GET /filter +--- more_headers +Accept-Encoding: gzip +--- response_headers +!Content-Encoding +Vary: Accept-Encoding +--- no_error_log +[error] + + + +=== TEST 49: zstd_buffers with a small custom value still produces a valid stream +# The zstd_buffers directive was previously not exercised by any test. +# A very small buffer count forces the body filter through the +# multi-buffer output path on a single response, so this also guards +# against regressions in chunk accounting under buffer pressure. +--- config + location /filter { + zstd on; + zstd_min_length 1; + zstd_buffers 4 4k; + zstd_types text/plain; + proxy_pass http://127.0.0.1:$TEST_NGINX_SERVER_PORT/test; + } + location /test { + root $TEST_NGINX_PERL_PATH/suite/; + } +--- request +GET /filter +--- more_headers +Accept-Encoding: zstd +--- response_headers +!Content-Length +Transfer-Encoding: chunked +Content-Encoding: zstd +--- no_error_log +[error] + + + +=== TEST 50: zstd_target_cblock_size produces a valid zstd stream +# Locks the target_cblock_size advanced parameter path. The size is +# small enough to force the encoder to honour the cap on a sizeable +# input. On libzstd < 1.5.6 this directive only logs a warning at +# config time and is otherwise ignored, but the response must still +# be a well-formed zstd stream either way. +--- config + location /filter { + zstd on; + zstd_min_length 1; + zstd_target_cblock_size 4k; + zstd_types text/plain; + proxy_pass http://127.0.0.1:$TEST_NGINX_SERVER_PORT/test; + } + location /test { + root $TEST_NGINX_PERL_PATH/suite/; + } +--- request +GET /filter +--- more_headers +Accept-Encoding: zstd +--- response_headers +Content-Encoding: zstd +--- no_error_log +[error] + + + +=== TEST 51: zstd_long on compresses cleanly +# Long-range mode enables the zstd long-distance matcher. The flag +# was previously configured but never exercised in tests, so a silent +# regression in the parameter wiring would have gone unnoticed. A +# response that is large enough to be worth compressing is enough to +# prove the parameter path stays well-formed. +--- config + location /filter { + zstd on; + zstd_min_length 1; + zstd_long on; + zstd_window_log 17; + zstd_types text/plain; + proxy_pass http://127.0.0.1:$TEST_NGINX_SERVER_PORT/test; + } + location /test { + root $TEST_NGINX_PERL_PATH/suite/; + } +--- request +GET /filter +--- more_headers +Accept-Encoding: zstd +--- response_headers +Content-Encoding: zstd +--- no_error_log +[error] + + + +=== TEST 52: zstd_min_length still applies on a chunked upstream (no Content-Length) +# TEST 6/7 cover min_length on the known-Content-Length path. The +# unknown-length / chunked path takes a different branch in the body +# filter (the pre-bypass length check is skipped and the decision is +# deferred). This locks the behaviour that, on a chunked response +# whose body is smaller than min_length, no compression is applied. +--- config + chunked_transfer_encoding on; + location /filter { + zstd on; + zstd_min_length 4096; + zstd_types text/plain; + proxy_pass http://127.0.0.1:$TEST_NGINX_SERVER_PORT/test; + proxy_buffering off; + } + location /test { + return 200 "tiny"; + } +--- request +GET /filter +--- more_headers +Accept-Encoding: zstd +--- response_headers +!Content-Encoding +--- no_error_log +[error] + + + +=== TEST 53: zstd_comp_level 1 (minimum) produces a valid stream +# Boundary coverage for the comp_level slot. Existing tests use 3, +# 10, -5, and 19. Level 1 is the documented minimum positive level +# and the fastest setting; an off-by-one in the bounds post-handler +# would have caught here. +--- config + location /filter { + zstd on; + zstd_min_length 1; + zstd_comp_level 1; + zstd_types text/plain; + return 200 "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"; + } +--- request +GET /filter +--- more_headers +Accept-Encoding: zstd +--- response_headers +Content-Encoding: zstd +--- no_error_log +[error] + + + +=== TEST 54: zstd_comp_level 22 (maximum) produces a valid stream +# Boundary coverage for the comp_level slot at libzstd's documented +# maximum. The body is intentionally small so the test stays cheap; +# the assertion is that the encoder accepts the level and emits a +# valid stream, not that it produces any specific ratio. +--- config + location /filter { + zstd on; + zstd_min_length 1; + zstd_comp_level 22; + zstd_types text/plain; + return 200 "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"; + } +--- request +GET /filter +--- more_headers +Accept-Encoding: zstd +--- response_headers +Content-Encoding: zstd +--- no_error_log +[error] + + + +=== TEST 55: Accept-Encoding parser tolerates tab OWS around the q-value +# RFC 7230 OWS is "*( SP / HTAB )". Most clients only ever send +# spaces, but the parser is required to accept tabs too. The +# rewritten parser walks OWS explicitly; this locks that contract. +--- config + location /filter { + zstd on; + zstd_min_length 1; + zstd_types text/plain; + return 200 "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"; + } +--- request +GET /filter +--- more_headers +Accept-Encoding: zstd ; q=0.5 +--- response_headers +Content-Encoding: zstd +--- no_error_log +[error] + + + +=== TEST 56: Accept-Encoding parser is case-insensitive on the coding name +# The HTTP coding names are token-class, which RFC 7231 treats as +# case-insensitive. A client that sends "ZSTD" must still negotiate +# zstd. ngx_strncasecmp inside the parser is what makes this work; +# this locks that we keep using the case-insensitive comparator. +--- config + location /filter { + zstd on; + zstd_min_length 1; + zstd_types text/plain; + return 200 "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"; + } +--- request +GET /filter +--- more_headers +Accept-Encoding: ZSTD +--- response_headers +Content-Encoding: zstd +--- no_error_log +[error] + + + +=== TEST 57: Accept-Encoding parser ignores stray empty list elements +# RFC 7230 allows empty list members (",,zstd,,"). The parser must +# skip them and still match the standalone zstd token. This guards +# the OWS-and-comma skip loop at the top of the per-element walk. +--- config + location /filter { + zstd on; + zstd_min_length 1; + zstd_types text/plain; + return 200 "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"; + } +--- request +GET /filter +--- more_headers +Accept-Encoding: ,, zstd ,, +--- response_headers +Content-Encoding: zstd +--- no_error_log +[error] + + + +=== TEST 58: subrequests are never zstd-encoded +# The filter explicitly returns NGX_DECLINED for r != r->main: only +# the top-level response gets compressed, never the body of an +# auth-request, addition_module, or SSI subrequest. This drives an +# auth_request subrequest whose own location has zstd on; the +# subrequest's response body must NOT be returned with +# Content-Encoding: zstd (and the outer response, which returns 204 +# anyway, must not gain one either). +--- config + location = /auth { + internal; + zstd on; + zstd_min_length 1; + zstd_types text/plain; + return 204; + } + location /filter { + auth_request /auth; + return 204; + } +--- request +GET /filter +--- more_headers +Accept-Encoding: zstd +--- response_headers +!Content-Encoding +--- error_code: 204 +--- no_error_log +[error] + + + +=== TEST 59: zstd directive inside an if-block compiles and applies +# The "zstd" command is registered with NGX_HTTP_LIF_CONF, so it must +# be usable inside an "if (...) { ... }" block. This proves the +# directive parses in that context and that the resulting location's +# merged config takes effect (the if-branch turns zstd off while the +# enclosing location had it on). +--- config + location /filter { + zstd on; + zstd_min_length 1; + zstd_types text/plain; + if ($arg_nozstd) { + zstd off; + } + return 200 "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"; + } +--- request +GET /filter?nozstd=1 +--- more_headers +Accept-Encoding: zstd +--- response_headers +!Content-Encoding +--- no_error_log +[error] + + + +=== TEST 60: filter bails out when the upstream has already set Content-Encoding +# If an upstream response already carries Content-Encoding (here: +# identity-with-a-pre-set-header, simulated by add_header on a +# proxied response), the zstd filter must not re-encode it. The +# response should keep the upstream's encoding marker and the body +# should NOT be wrapped in a zstd frame. +--- config + location /filter { + zstd on; + zstd_min_length 1; + zstd_types text/plain; + proxy_pass http://127.0.0.1:$TEST_NGINX_SERVER_PORT/upstream; + } + location /upstream { + add_header Content-Encoding "identity"; + return 200 "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"; + } +--- request +GET /filter +--- more_headers +Accept-Encoding: zstd +--- response_headers +Content-Encoding: identity +--- no_error_log +[error] diff --git a/t/01-static.t b/t/01-static.t index d3ef186..36e012c 100644 --- a/t/01-static.t +++ b/t/01-static.t @@ -22,9 +22,37 @@ add_block_preprocessor(sub { no_long_string(); log_level 'debug'; repeat_each(3); -plan tests => repeat_each() * (blocks() * 3) + 63; +plan 'no_plan'; run_tests(); +# COVERAGE NOTES on gaps found by the git-history CI audit +# (kept above __DATA__ so they are not parsed into any test block): +# +# Gap 2 — f7e2ef3 ("clear Accept-Ranges in static module"): +# deliberately NOT covered by a black-box test here. The static +# handler serves a local file and never sets r->allow_ranges (in this +# nginx, only the upstream/proxy path sets it — see +# ngx_http_upstream.c, and ngx_http_range_filter_module.c bails unless +# r->allow_ranges). ngx_http_clear_accept_ranges() in the static +# module is therefore purely defensive: no request reachable through +# zstd_static alone makes the pre-fix and fixed builds differ +# (empirically verified — identical 200, no Accept-Ranges, no +# Content-Range, on both a pre-f7e2ef3 and a fixed .so). A Perl test +# asserting !Accept-Ranges would PASS on the buggy build too, i.e. be +# blind. Per the project's fail-first discipline we do not add a test +# that cannot fail on the unfixed code. +# +# Gap 3 — HTTP/2 transport axis (8281baa bug-B class): +# the build enables --with-http_v2_module but nothing tests the h2 +# path. HTTP/2 in nginx requires TLS + ALPN/Upgrade negotiation; h2c +# (cleartext) does not work without Upgrade in nginx config. A Python +# test without TLS cannot easily drive the h2 path. The bug-B defect +# (empty-buffer, flush-state-machine, c->buffered accounting) is +# already well-covered by test_proxy_unbuffered_truncation.py +# (HTTP/1.1) and the matrix under ASAN; the h2-specific framing path +# would be redundant effort without adding coverage for a new code +# path. Left for future work when/if CI adds TLS test infrastructure. + __DATA__ @@ -407,35 +435,6 @@ Content-Encoding: zstd -# COVERAGE NOTES on gaps found by the git-history CI audit: -# -# Gap 2 — f7e2ef3 ("clear Accept-Ranges in static module"): -# deliberately NOT covered by a black-box test here. The static -# handler serves a local file and never sets r->allow_ranges (in this -# nginx, only the upstream/proxy path sets it — see -# ngx_http_upstream.c, and ngx_http_range_filter_module.c bails unless -# r->allow_ranges). ngx_http_clear_accept_ranges() in the static -# module is therefore purely defensive: no request reachable through -# zstd_static alone makes the pre-fix and fixed builds differ -# (empirically verified — identical 200, no Accept-Ranges, no -# Content-Range, on both a pre-f7e2ef3 and a fixed .so). A Perl test -# asserting !Accept-Ranges would PASS on the buggy build too, i.e. be -# blind. Per the project's fail-first discipline we do not add a test -# that cannot fail on the unfixed code. -# -# Gap 3 — HTTP/2 transport axis (8281baa bug-B class): -# the build enables --with-http_v2_module but nothing tests the h2 -# path. HTTP/2 in nginx requires TLS + ALPN/Upgrade negotiation; h2c -# (cleartext) does not work without Upgrade in nginx config. A Python -# test without TLS cannot easily drive the h2 path. The bug-B defect -# (empty-buffer, flush-state-machine, c->buffered accounting) is -# already well-covered by test_proxy_unbuffered_truncation.py -# (HTTP/1.1) and the matrix under ASAN; the h2-specific framing path -# would be redundant effort without adding coverage for a new code -# path. Left for future work when/if CI adds TLS test infrastructure. - - - === TEST 21: zstd_static rejects a file whose contents are not a zstd frame # Defence-in-depth: a .zst whose first 4 bytes are not the zstd magic # (truncated download, mistakenly renamed text, `cp foo.txt foo.zst`) @@ -446,7 +445,7 @@ Content-Encoding: zstd # ("HELO ...") with no zstd magic; no uncompressed fallback file is # placed alongside it, so the request falls through to a clean 404. --- config - location /bogus_zst { + location /bogus { zstd_static on; root html; } @@ -454,7 +453,7 @@ Content-Encoding: zstd >>> bogus.zst HELO this is not a zstd frame --- request -GET /bogus_zst/bogus +GET /bogus --- more_headers Accept-Encoding: zstd --- error_code: 404 @@ -462,3 +461,111 @@ Accept-Encoding: zstd !Content-Encoding --- error_log is not a zstd frame + + + +=== TEST 22: zstd_static always does NOT set Vary even with gzip_vary on +# Locks intentional behaviour: in "always" mode the handler unconditionally +# serves the precompressed .zst and never sets r->gzip_vary. Vary: +# Accept-Encoding would mis-key shared caches for a response that does +# not actually vary on Accept-Encoding (the same .zst comes back no +# matter what the client sends), so the absence of Vary here is the +# correct contract. TEST 6-8 cover "always" without gzip_vary; this +# locks that adding gzip_vary on at the location does not flip the +# behaviour by accident. +--- config + gzip_vary on; + location /test { + zstd_static always; + root ../../t/suite; + } +--- request +GET /test +--- more_headers +Accept-Encoding: gzip, br +--- response_headers +Content-Length: 3717 +ETag: "5be17d33-e85" +Content-Encoding: zstd +!Vary +--- no_error_log +[error] + + + +=== TEST 23: zstd_static rejects an empty .zst file +# A zero-byte .zst cannot satisfy the 4-byte pread() magic check; +# the handler must decline rather than serve an empty body with +# Content-Encoding: zstd. TEST 21 covers the wrong-magic case; +# this locks the truncated-to-zero edge specifically. Like TEST 21, +# no uncompressed fallback is placed alongside empty.zst, so a +# benign ENOENT on the fallback path is expected and not asserted +# against. +--- config + location /empty { + zstd_static on; + root html; + } +--- user_files +>>> empty.zst +--- request +GET /empty +--- more_headers +Accept-Encoding: zstd +--- error_code: 404 +--- response_headers +!Content-Encoding + + + +=== TEST 24: zstd_static declines a directory-style request +# A request whose URI ends in "/" maps to a path with a trailing +# slash; appending ".zst" would produce ".../.zst". The handler +# short-circuits at the URI-suffix check (uri.data[uri.len - 1] +# == '/') and declines without touching the filesystem, so the +# request falls through to the normal directory-index machinery +# rather than being answered with Content-Encoding: zstd. The +# fallback then 403s the directory and logs the missing index file +# — that log line is from the regular static handler, not from +# zstd_static, and is expected here. The contract being locked is +# only !Content-Encoding (i.e. zstd_static did not falsely claim +# the response was zstd-encoded). +--- config + location /dir/ { + zstd_static on; + root ../../t/suite; + } +--- request +GET /dir/ +--- more_headers +Accept-Encoding: zstd +--- error_code: 404 +--- response_headers +!Content-Encoding + + + +=== TEST 25: zstd_static on sets Vary even when declining for a non-accepting client +# Subtle behaviour at static.c:204 — when zstd_static is "on" and +# the .zst exists, the handler sets r->gzip_vary = 1 *before* +# declining for a client that does not accept zstd. That keeps the +# response cacheable by intermediaries that key on Vary, so a later +# request from a zstd-capable client through the same shared cache +# gets the encoded variant rather than the identity one. Without +# this, a CDN that saw the identity response first would pin all +# subsequent clients to it. +--- config + gzip_vary on; + location /test { + zstd_static on; + root ../../t/suite; + } +--- request +GET /test +--- more_headers +Accept-Encoding: gzip +--- response_headers +!Content-Encoding +Vary: Accept-Encoding +--- no_error_log +[error]