Performance characteristics, benchmarks, optimizations, and deployment patterns for zigttp.
zigts outperforms QuickJS in historical benchmark runs (QuickJS as external baseline). See the zigttp-bench repository for raw results and scripts.
| Benchmark | zigts | QuickJS | Ratio |
|---|---|---|---|
| stringOps | 16.3M ops/s | 258K ops/s | 63x faster |
| objectCreate | 8.1M ops/s | 1.7M ops/s | 4.8x faster |
| propertyAccess | 13.2M ops/s | 3.4M ops/s | 3.9x faster |
| httpHandler | 1.0M ops/s | 332K ops/s | 3.1x faster |
| functionCalls | 12.4M ops/s | 5.1M ops/s | 2.4x faster |
| stringConcat | 8.3M ops/s | 6.2M ops/s | 1.3x faster |
| arrayOps | 8.7M ops/s | 6.6M ops/s | 1.3x faster |
| jsonOps | 77K ops/s | 71K ops/s | 1.1x faster |
Run Zig-native benchmarks with: zig build bench
Trivial JSON handler, hey -n 300000 -c 50 on loopback, Apple M4 Pro / macOS 26,
each runtime in its default configuration (measured 2026-05-22):
| Runtime | Req/sec | p50 latency | p99 latency |
|---|---|---|---|
| zigttp | 112,393 | 0.20 ms | 2.00 ms |
| Deno | 113,058 | 0.40 ms | 0.90 ms |
| Node.js | 86,927 | 0.60 ms | 1.20 ms |
zigttp runs at parity with Deno and roughly 30% ahead of Node.js on this
workload. The benchmark harness and raw run-to-run results live in the
zigttp-bench repository.
JIT compilation is enabled by default for hot functions (after JIT_THRESHOLD executions).
Cold start is wall time from process launch to the first complete HTTP
response, measured with zigttp serve -e <handler> on Apple M4 Pro, macOS 26,
ReleaseFast (2026-05-22):
| Measure | Value |
|---|---|
| Floor (best case) | ~3.5 ms |
| Typical (p50) | ~7-15 ms, depending on host load |
| Tail (p95 to max) | up to ~60 ms under host scheduling contention |
Cold start is variance-dominated. The floor is runtime initialization itself - roughly 3 ms to load or parse the handler, start the I/O backend, build the handler pool, and bind the listener. The spread above the floor is host scheduling jitter, not work zigttp performs: a quiet host lands near the floor, a busy host stretches the tail. Treat cold start as a distribution - cite the floor and the p50, not a single figure.
Embedding the handler bytecode at build time (zig build -Dhandler=... or
zigttp build) removes handler parsing and compilation from the floor; the
self-extracting binary loads bytecode directly from memory.
Raw measurements, the harness, and the full run-to-run distribution live in the
zigttp-bench repository.
HandlerPool reuses pre-warmed contexts for subsequent requests:
- O(1) pool slot acquisition via
free_hintatomic - Zero runtime parsing overhead (bytecode already compiled)
- Minimal GC overhead via hybrid arena allocation
- Predictable latency (no JIT compilation jitter)
Shape Preallocation (packages/zigts/src/context.zig:352-434): HTTP Request and Response objects use preallocated hidden class shapes, eliminating transitions. Direct slot writes via setSlot() bypass property lookup.
Shapes:
- Request: method, url, path, query, body, headers (6 props)
- Response: body, status, statusText, ok, headers (5 props)
- Response headers: content-type, content-length, cache-control (3 props)
- Request headers: authorization, content-type, accept, host, user-agent, accept-encoding, connection (7 props)
Polymorphic Inline Cache (PIC) (packages/zigts/src/interpreter.zig:259-335): 8-entry cache per property access site with last-hit optimization. O(1) monomorphic lookups, megamorphic transition after 9th distinct shape.
Binary Search for Large Objects (packages/zigts/src/object.zig:751, 831-835): Objects with 8+ properties use binary search on sorted property arrays (BINARY_SEARCH_THRESHOLD = 8).
JIT Baseline IC Integration (packages/zigts/src/jit/baseline.zig:1604-1765): x86-64 and ARM64 fast paths check PIC entry[0] inline, falling back to helper on miss.
JIT Object Literal Shapes (packages/zigts/src/context.zig:746-779, packages/zigts/src/jit/baseline.zig:3646-3670): Object literals with static keys use pre-compiled hidden class shapes. new_object_literal creates objects with the final hidden class directly, and set_slot writes inline without lookup. Fast path uses arena bump allocation.
Type Feedback (packages/zigts/src/type_feedback.zig): Call site and value type profiling for JIT decisions. Inlining threshold lowered to 5 calls (from 10) for faster FaaS warmup.
Lazy String Hashing (packages/zigts/src/string.zig:18-24, 44-54): Hash deferred until needed via hash_computed flag. Reduces overhead for strings never used as hash keys.
Pre-interned HTTP Atoms (packages/zigts/src/object.zig:237-264): 27 common headers with O(1) lookup: content-type, content-length, accept, host, user-agent, authorization, cache-control, CORS headers, connection, accept-encoding, cookie, x-forwarded-for, x-request-id, content-encoding, transfer-encoding, vary.
HTTP String Cache (packages/zigts/src/context.zig:111-135, 462+): Pre-allocated status texts, content-type strings, and HTTP method strings.
When the compile-time contract proves a handler is pure or deterministic+read_only, the runtime caches GET/HEAD responses by request hash (method + URL including query string). Cache hits return the memoized response directly from Zig memory without acquiring a runtime or entering JS.
- Activation: automatic when contract proves the required properties. No configuration needed.
- Cache key: Wyhash of method + URL (path + query string).
- Thread safety: RwLock allows concurrent cache reads; writes take an exclusive lock.
- Eviction: FIFO with configurable capacity (default 1024 entries).
- TTL: configurable per-entry expiry (default 5 minutes). Expired entries are lazily evicted.
- Max body: responses larger than 256KB (default) are not cached.
- Visibility: cached responses carry
X-Zigttp-Proof-Cache: hitheader. - Implementation:
packages/runtime/src/proof_adapter.zig, integrated intopackages/runtime/src/server.zigon the supported threaded HTTP path.
Pool Slot Hint (packages/zigts/src/pool.zig): free_hint atomic reduces slot acquisition from O(N) to O(1).
Relaxed Atomic Ordering (packages/runtime/src/runtime_pool.zig): in_use counter uses .monotonic ordering (metrics only, not synchronization).
LRU Static Cache (packages/runtime/src/server.zig): Doubly-linked list LRU eviction instead of clear-all, eliminating latency spikes.
Adaptive Backoff (packages/runtime/src/runtime_pool.zig): Three-phase pool contention handling:
- Phase 1: 10 spin iterations using
spinLoopHint - Phase 2: Sleep 10us-1ms with jitter (prevents thundering herd)
- Phase 3: Circuit breaker fails fast after 100 retries
Zero-Copy Response (packages/runtime/src/zruntime.zig): Borrowed mode avoids memcpy when arena lifetime is guaranteed.
When the BoolChecker can prove both operands of an arithmetic or comparison are numbers, it populates a NodeTypeMap. The CodeGen reads this map and emits specialized opcodes that skip runtime type dispatch:
| Specialized Opcode | Replaces | Benefit |
|---|---|---|
add_num |
add |
Skips string-concatenation check |
sub_num |
sub |
Skips type coercion |
mul_num |
mul |
Skips type coercion |
div_num |
div |
Skips type coercion |
lt_num, gt_num, lte_num, gte_num |
lt, gt, lte, gte |
Skip polymorphic comparison |
concat_2 |
add (string case) |
Dedicated string concatenation |
These opcodes also omit type feedback recording, providing faster cold-start execution before JIT warmup. Type-directed codegen is active in precompiled handlers (-Dhandler). Dev mode (zig build run) uses generic opcodes because BoolChecker type annotations are not wired to the dev-mode CodeGen path.
Handler Precompilation (packages/tools/src/precompile.zig, build.zig): -Dhandler=<path> compiles handlers at build time. Bytecode embedded in binary, eliminating runtime parsing.
Build flow: precompile.zig compiles handler, serializes bytecode with atoms and shapes, generates packages/runtime/generated/embedded_handler.zig. Server loads via loadFromCachedBytecode().
The precompile pipeline also supports:
-Dverify- compile-time handler verification (see verification.md)-Dcontract- emit contract.json with handler properties and proven capabilitieszigttp deploy- one-command local build, proof, attestation, and self-contained runtime packaging-Dtest-file=tests.jsonl- run declarative handler tests at build time-Dreplay=traces.jsonl- replay-verify recorded traces before embedding-Dgenerate-tests=true- exhaustive path enumeration and fault coverage analysis
Structured concurrency (packages/zigts/src/modules/workflow/io.zig, packages/runtime/src/zruntime.zig):
parallel() and race() overlap outbound HTTP without async/await overhead.
The three-phase model (collect descriptors, dispatch to OS threads, join results)
avoids event loop machinery entirely.
Performance characteristics:
- Single fetch: inline execution, zero thread overhead
- Multiple fetches: one OS thread per descriptor, each with its own
std.http.Clientand I/O backend - no contention between workers - Maximum 8 concurrent operations per call
- Thread spawn failure degrades gracefully to sequential inline execution
- The JS heap is never touched from worker threads, so no locking or write barriers are needed during the concurrent phase
The latency of a parallel() call equals the slowest fetch plus thread
spawn/join overhead (typically under 100us). For a handler making 3 API calls
at 50ms each, parallel() reduces total I/O time from ~150ms to ~50ms.
For request-scoped workloads, zigts uses a hybrid memory model:
- Arena allocator: O(1) bulk reset between requests, zero per-object overhead
- Escape detection: Write barriers prevent arena objects from leaking to persistent storage
- GC disabled in hybrid mode: No collection pauses during request handling
# Default (0 = no limit)
./zig-out/bin/zigttp serve handler.js
# Set explicit limit (1MB)
./zig-out/bin/zigttp serve -m 1m handler.js
# Smaller limit (64KB)
./zig-out/bin/zigttp serve -m 64k handler.jsNative code emitted by the baseline and optimized JIT tiers is allocated per
CompiledFunction through packages/zigts/src/jit/alloc.zig. Lifetime follows
the function object: code is freed when the function is collected, and the
whole cache is reset on --watch hot-swap (live_reload rebuilds the handler
and discards the prior runtime pool).
There is no size-bounded LRU on the JIT cache in v0.1.0. The design assumes
short-lived processes (FaaS cold-start to N requests to teardown), where the
working set of compiled functions is bounded by the handler's static call
graph and tier promotion stops after warmup. For long-running servers with
many distinct handlers loaded in the same process, code-cache eviction is
deferred to a future release. If you observe unbounded growth, capture
zigttp doctor output and file a report - the mitigation is currently to
recycle the process.
./zigttp serve handler.jsEach instance handles one request at a time for isolation.
FROM scratch
COPY zig-out/bin/zigttp /zigttp
COPY handler.js /handler.js
EXPOSE 8080
ENTRYPOINT ["/zigttp", "serve", "-q", "-h", "0.0.0.0", "/handler.js"]Binary size: ~4.8 MB with embedded handler (single self-contained binary, no handler file needed); the developer CLI is ~9.1 MB. Both ReleaseFast, measured 2026-05-22.
# AWS Lambda (x86-64)
zig build -Doptimize=ReleaseFast -Dtarget=x86_64-linux
# AWS Lambda (ARM64, recommended for price/performance)
zig build -Doptimize=ReleaseFast -Dtarget=aarch64-linux
# With precompiled handler (fastest cold starts)
zig build -Doptimize=ReleaseFast -Dtarget=x86_64-linux -Dhandler=handler.jsRun the Zig-native benchmark suite: zig build bench
For HTTP load testing and comparative benchmarks, see the zigttp-bench repository.
Load test with external tools:
wrk -t4 -c100 -d30s http://localhost:8080/
hey -n 10000 -c 100 http://localhost:8080/Nine-phase plan delivered end-to-end. Baseline promoted twice during the
cycle: first after Phase 4 to capture the call_ic emission win, then
again after Phase 7 to lock the String-method inline fast path in as the
new regression floor.
Aggregate result vs pre-plan baseline (75th-percentile of 7 runs
each, zig build bench -Doptimize=ReleaseFast -- --json --quiet):
| Benchmark | Before | After | Ratio |
|---|---|---|---|
| functionCalls | 11.42M/s | 16.77M/s | 1.469x |
| stringOps | 17.93M/s | 19.56M/s | 1.091x |
| stringConcat | 26.32M/s | 27.28M/s | 1.036x |
| arrayOps | 16.06M/s | 16.14M/s | 1.005x |
| intArithmetic | 19.45M/s | 19.48M/s | 1.002x |
| propertyAccess | 17.29M/s | 17.40M/s | 1.007x |
| forOfLoop | 2.00G/s | 2.00G/s | 1.000x |
| recursion | 3,380/s | 3,365/s | 0.996x |
| httpHandlerHeavy | 1.23M/s | 1.22M/s | 0.992x |
| httpHandler | 6.96M/s | 6.83M/s | 0.981x |
| gcPressure | 8.46M/s | 8.23M/s | 0.973x |
| objectCreate | 14.48M/s | 13.98M/s | 0.965x |
| jsonOps | 3.99M/s | 3.77M/s | 0.944x |
| geomean | 1.035x |
Best-of-5 sampling on the same post-Phase-7 code shows stringOps at 1.111x vs the prior baseline; the 75th-percentile table above smooths the peak toward the median for a more conservative floor. Per-bench drops in jsonOps/objectCreate/gcPressure sit inside the ±5% noise band the microbench harness exhibits on a quiet machine and do not trip the bench-check gate (8% per-bench, 3% geomean).
Hardware: Apple Darwin 25.3.0 arm64. Zig 0.16.0.
Commit range: 10684c0 (pre-plan tip) through the post-Phase-7
baseline commit.
Per-phase highlights. Phase 1 froze the benchmark JSON at
schema_version: 1 and added a snapshot test so downstream tooling can
consume it. Phase 2 shipped scripts/bench-diff.sh and the
zig build bench-check step that compares a best-of-N run against the
committed baseline; the gate trips on any per-bench regression >8% or
geomean regression >3%. Phase 3 added the drop_goto superinstruction
(fires 1-3 times per compiled function via bytecode_opt.zig:tryFuseAt).
Phase 4 turned call_ic from a stub into a fully-wired opcode: the
interpreter records feedback at the call site, baseline and optimized JIT
walk/emit past it, the baseline inliner blacklist no longer rejects it,
and codegen emits it at non-method call sites behind
enable_call_ic_emission. A compensating fusion rule folds
push_const + call_ic back into push_const_call so constant-callee
sites do not regress. The aggregate functionCalls +47% is almost entirely
this phase. Phase 5 added megamorphic recovery to the PIC and aligned PIC
polymorphic capacity with TypeFeedbackSite so the two layers report
matching monomorphic/polymorphic/megamorphic classifications. Phase 6
added deopt-storm suppression in profileFunctionEntry so functions that
deopt three times in a thousand invocations stop being re-promoted to
the optimized tier. Phase 7 added an inline native fast path at the top
of the .call_method handler covering String.prototype.indexOf and
String.prototype.slice; it bypasses the generic doCall prologue
(trace defers, guard check, arg collection loop, isCallable check) and
falls through to doCall for any mismatch. That fast path is where the
stringOps +9% comes from. Phase 8 shipped packages/runtime/src/compile_benchmark.zig
and zig build compile-bench for parse+codegen ns/bytes/IR-node
measurement, then used its numbers to replace three allocator.dupe
calls in CodeGen.emitFunctionExpr with toOwnedSlice (same heap
ownership, skips the copy) and recalibrate CodeGen.reserveCapacity
from node_count * 4 down to @max(32, node_count) because the old
formula was over-reserving 10-25x. Compile-bench codegen_bytes dropped
9-11% across fixtures without touching runtime numbers.
Configuration and rollback. Three perf comptime flags carry the bytecode-emission changes:
packages/zigts/src/parser/codegen.zig:63-enable_peephole_opt- default truepackages/zigts/src/parser/codegen.zig:68-enable_call_ic_emission- default truepackages/zigts/src/interpreter.zig:345-pic_entries_tracks_feedback- default true
Flip any one to false and rebuild to disable the corresponding change. A
master -Dperf-opts=off build option is not yet wired; it is the obvious
follow-up if we start needing to A/B flags against one another in CI.
Runtime-policy toggles documented elsewhere in this file remain env-driven
and take effect on restart.
Protected benches. scripts/bench-diff.sh exempts forOfLoop,
httpHandler, and httpHandlerHeavy from the per-bench regression check.
These are sub-millisecond microbenches whose per-run variance exceeds 5%
even on a quiet machine; they stay in the JSON report and still count
toward geomean, but a single-bench regression on them does not block a
merge. Revisit once the harness runs each bench long enough to push
per-iteration cost comfortably above timer resolution.
Open follow-ups. The Phase 7 fast-path switch covers two natives;
extending it to String.prototype.substring, charCodeAt, or whatever
surfaces next from production feedback_summary is a drop-in add. The
Phase 4 interpreter monomorphic fast path for .call_ic is also still
unbuilt; the baseline JIT already exploits monomorphic call sites via
getInlineCandidate, and the interpreter-side win is speculative until
measured against a richer corpus. The compile-bench counter-allocator
currently wraps parser + codegen together; narrowing it further by
passing distinct allocators to Parser vs CodeGen (the
parseWithCodegenAllocator hook landed in Phase 8) is a one-line swap
when the tuner wants that precision.