perf: cold proxy +3-4x — stats_query_range fast path + 4 follow-up alloc fixes by szibis · Pull Request #350 · ReliablyObserve/loki-vl-proxy

szibis · 2026-05-12T07:54:44Z

Summary

Performance branch: eliminate allocations and O(n) scans from Loki-compatible hot paths. Seven independent perf improvements + three bug fixes stacked on each other.

1. `stats_query_range` fast path for `sum()` / `sum by()` count/rate (`dbaf8dd`)

What: For sum(rate({...}[w])) and sum by (label)(count_over_time({...}[w])) — the most common Grafana dashboard query shapes — the proxy now uses VL's native /select/logsql/stats_query_range instead of decomposing the range into N manual sub-windows and scanning raw log bytes.

Why it matters: Manual sub-window fan-out is O(total log entries in the time range). The stats endpoint is O(matched series × time buckets) — typically 100–1000× less work for high-cardinality streams.

Constraint respected: The fast path fires only when range == step (tumbling windows). Grafana sets $__auto so that range == step for standard dashboards. Sliding-window queries (range > step) still use the correct manual-accumulation path.

2. Extend fast path to `bytes_rate` / `bytes_over_time` (`7ddc008`)

What: Same stats_query_range bypass extended to byte-based aggregations: sum(bytes_rate({...}[w])) and sum(bytes_over_time({...}[w])).

How: VL's stats endpoint supports sum_len() natively. Proxy maps Loki's bytes semantics → sum_len() → __lvp_bytes → translates back.

3. `fastjson` in stats response hot paths (`70a7d33`)

What: Replaced encoding/json (reflection-based marshal/unmarshal) with fastjson in the stats response translation path.

Savings:

Eliminated json.Unmarshal allocations for every VL stats bucket
Zero-copy label extraction via fastjson.Value.GetStringBytes
Benchmark: −41% allocations, −28% ns/op on the stats translation path

4. Scratch pool for `MarshalTo` + `wrapAsLokiResponse` fix (`24e26cb`)

What: Reused []byte scratch buffers across the per-step stats assembly loop via sync.Pool. Also fixed wrapAsLokiResponse to correctly wrap already-translated stats — was double-wrapping in some paths.

5. Eliminate per-request HTTP header allocs in VL sub-window hot path (`d92a125`)

What: VL sub-window requests were allocating a new http.Header map per sub-window per step. Switched to a pre-built header template cloned once per request, not per sub-window.

Savings: −1 alloc per sub-window call. For a 24h range at 1m step = 1440 sub-windows: −1440 allocs/request.

6. Replace `json.Marshal` reflection in windowed streams result (`e864a1c`)

What: marshalWindowedStreamsResult was using json.Marshal to encode each stream's label map and each log entry. Replaced with a direct byte-builder using appendJSONString and manual JSON construction.

Savings: Eliminated per-stream and per-entry reflection allocs. For a typical log query returning 500 entries × 10 streams: −5500 allocs/response.

7. Eliminate `appendJSONString` allocations in stats label translation (`ddcb3de`)

What: appendJSONString was allocating for every label value escape check even when no escaping was needed (the common case for well-formed label values). Added a fast-path check: if the string contains no special chars, write directly without allocation.

8. Fix `encodingFlags` ordering in windowed streams response (`e11561d`)

What (bug fix): Grafana's Loki datasource streaming parser reads JSON object keys sequentially and uses encodingFlags to decide how to parse result. If encodingFlags appears after result in the JSON object, the parser has already committed to the wrong parse mode — values are decoded as string pairs [ts, msg] instead of triplets [ts, msg, {meta}], causing a ReadArray error and blank log panels in Grafana.

Fix: marshalWindowedStreamsResult now emits encodingFlags before result:

{"status":"success","data":{"resultType":"streams","encodingFlags":["categorize-labels"],"result":[...]}}

Affected: Any log query to a proxy with categorize-labels enabled (the default for metadata-aware variants).

9. `sum(rate({...} | json [$__auto]))` slow path bypass (`b4d0903`)

What (perf fix): sum(rate({cluster="us-east-1"} | json [$__auto])) was ~10× slower than sum by (app)(rate({...}[$__auto])) despite both being count/rate queries.

Root cause: The translator's buildRateLikeQuery always adds _stream as the inner group-by when the query has a parser pipeline (| json). parseStatsCompatSpec sees | stats by (_stream) and sets hasStreamSentinel = true, which blocks the collectRangeMetricHits fast path and falls through to collectRangeMetricSamples — a full raw log scan.

Fix: Queries containing | math are translator-built multi-stage VL rate pipelines. For non-sliding windows (range ≤ step), VL can execute them natively via stats_query_range without decomposition. Added bypass in handleStatsCompatRange:

if strings.Contains(logsqlQuery, "| math ") {
    step, stepOk := parsePositiveStepDuration(r.FormValue("step"))
    origSpec, hasOrigSpec := parseOriginalRangeMetricSpec(originalLogql)
    if stepOk && hasOrigSpec && origSpec.Window > 0 && origSpec.Window <= step {
        return false // let proxyStatsQueryRangeDirect handle it
    }
}

Sliding window queries (range > step) still fall through to the correct manual path.

10. Remove unused functions (`996b167`)

What (lint fix): golangci-lint flagged two unused functions:

marshalStringMapJSON in query_translation.go — was used during development, superseded by direct builder
isVLDataResultTypeResponse in stream_processing.go — dead code with a misleading comment

Both removed.

11. `parseLogfmtFields` sentinel loop rewrite (`9a017e0`)

What: Replaced the O(n²) sentinel scan in parseLogfmtFields with a single-pass linear scan. The old code re-scanned the entire parsed field list on every sentinel match.

12. e2e port remapping + CI fixes (`80adeed`)

What: All host-side ports in docker-compose.yml prefixed with 1 (3100→13100, 9428→19428, 3101→13101, etc.) to avoid collision with a co-located compose stack. Internal container-to-container ports unchanged. Updated wait_e2e_stack.sh, ci.yaml, security-pr.yaml, security-heavy.yaml to match the new host ports.

pprof progression

Baseline	After fastjson stats	After scratch pool	After header allocs	After appendJSONString
847 B/op	612 B/op	501 B/op	438 B/op	389 B/op
21 allocs	14 allocs	11 allocs	9 allocs	7 allocs

Compute cold → hot progression (req/s on MacBook M3 Pro, 24h range 1440 steps):

Stage	Cold	Hot
Baseline (manual sub-window)	38	52
+ stats_query_range fast path	140	210
+ fastjson + pools + header fix	190	280

Test plan

go test ./internal/proxy/... — 1625 tests pass
TestContract_QueryRange_MatrixFormat_SlidingRateUsesManualLogFetch — sliding window still uses manual path
TestQueryRange_SlidingWindowWithout_UsesManualPath — bypassed only for non-sliding
TestSumByBytesRate_NoParser_UsesSumLen — bytes_rate fast path
e2e-compat stack starts and all services healthy with new 1-prefixed ports
golangci-lint passes (no unused function warnings)
encodingFlags ordering verified: Grafana log panels render correctly for categorize-labels queries

The default 500 series limit truncates results for broad queries, creating false parity comparisons in e2e tests.

4 test suites covering: - Error propagation: invalid LogQL must return same errors as Loki - Parser stages: json, logfmt, line_format, label_format, drop, keep - Line filters: contains, not_contains, regex, chained combinations - Multi-stage: parser → filter → format combinations - Metric queries: count_over_time, rate, sum by, topk, bottomk Findings: 2 error propagation gaps, 3 metric query gaps confirmed.

Two test suites: - ErrorParity (31 cases): walks ALL invalid LogQL patterns, verifies proxy returns same error codes as Loki. Score: 14/31 (45%) — 17 silent failures found (all binary ops on log queries, malformed selectors, empty queries) - QueryParity (50 cases): walks ALL valid LogQL operations, verifies both Loki and proxy return success. Score: 50/50 (100%) Categories tested: selectors, line filters, parsers (json/logfmt/ unpack/decolorize), field filters, format stages, drop/keep, multi- stage pipelines, metric queries, unwrap metrics, comparisons, vector ops, absent_over_time.

Validate queries before translation to catch patterns Loki would reject: - Binary operations on log/pipeline expressions (==, !=, <, <=, >, >=, +, -, *, /, %, ^) - Malformed selectors ({app=}, unclosed braces) - Empty queries, missing selectors (| json) Returns plain text 'parse error' matching Loki's exact format. Fixes 17 silent failure cases where proxy returned 200 instead of 400.

Expanded from 80 to 116 test cases covering: - 31 error cases: binary ops, malformed selectors, empty queries - 85 valid operations: 6 selectors, 4 line filters, 6 parsers, 8 field filters, 4 formats, 5 drop/keep, 8 multi-stage, 7 range metrics, 10 aggregations, 9 unwrap, 8 comparisons, 4 vector ops, 2 grouping, 1 absent, 3 empty/basic Fixed validator false positive: field filter comparisons like | json | status>=400 were incorrectly caught as binary ops.

…ntax - Suppress errcheck lint warning for fmt.Fprint response write (write error not actionable after headers are sent) - Fix isBinaryOpOnLogQuery falsely rejecting valid label filter stages like '| json | status >= 400': check last pipe-separated segment and allow identifier op number comparisons when identifier is not a parser keyword

Logfmt fields arrive as Go strings so inferDetectedType always returned "string" even for numeric values like status=400 or latency_ms=300. Grafana filters unwrap field candidates to type=int/float, so all logfmt numeric fields were invisible in the unwrap dropdown. Now tries ParseInt and ParseFloat before falling back to string, matching Loki's behaviour of returning type=int for integer-valued fields.

Grafana passes the full query including the incomplete '| unwrap' stage (with no field name yet) when calling detected_fields to populate the unwrap field dropdown. The stripFieldDetectionStages regex required a field name after unwrap so the bare stage was left in, causing the query translation to fail and returning zero options. Makes the field-name part optional so both '| unwrap field' and bare '| unwrap' are stripped before field scanning.

- Add _msg field to all JSON-format generators (api-gateway, auth-service, frontend-ssr, batch-etl, ml-serving) so VL stores log messages correctly; VictoriaLogs v1.50.0 requires _msg key (not msg) in JSON log lines - Switch docker-compose root stack to use published v1.16.0 image - Fix VL healthcheck from localhost to 127.0.0.1 (IPv4 vs IPv6) - Add native Loki service for side-by-side parity testing - Fix proxy datasource URL and add Loki (direct) datasource in Grafana config

Documents the addition of _msg fields to JSON log generators so the changelog gate passes for PR #255.

Keep Unreleased e2e _msg fix entry from the feature branch.

The underscore proxy variant is the default datasource in Grafana. Adding -patterns-autodetect-from-queries=true caused it to report pattern_ingester_enabled=true, breaking drilldown_limits_shape test which expects false for the default (non-autodetect) datasource. Pattern autodetect flags belong only on loki-vl-proxy-patterns-autodetect.

…enchmark docs - bench/internal/pprof: new package for CPU/heap/alloc/goroutine capture from proxy /debug/pprof/* endpoints during bench runs - loki-bench: --pprof-proxy, --pprof-no-cache, --pprof-auth-token, --pprof-duration flags; CPU profile runs concurrently with each bench run; heap/allocs/goroutine captured after each run; profiles saved to bench/results/pprof/<workload>-c<N>-<target>-<type>.pprof - run-comparison.sh: always build proxy binary from source before spawning the no-cache instance — prevents stale binary missing current flags (e.g. -server.enable-pprof); add -server.enable-pprof and admin token to no-cache proxy spawn; pass --pprof-* flags to loki-bench - docker-compose: enable -server.enable-pprof + admin token on main proxy - docs/benchmarks.md: expand three-way to four-way comparison section with compute workload, jitter=2h data, cold-proxy numbers, pprof hot path table, and updated running instructions

Resolved conflicts in: - CHANGELOG.md: combined unreleased entries from both branches - bench/cmd/loki-bench/main.go: take warmup-with-jitter improvement from main - bench/run-comparison.sh: take port-conflict detection + CB flags from main - docs/benchmarks.md: take updated system specs and tables from main

Switch collectRangeMetricHits from /select/logsql/hits to /select/logsql/stats_query_range. The hits endpoint cannot group by stream labels stored in VL's binary _stream field, so explicit label groupBy (e.g. app, namespace) returned empty series. stats_query_range understands stream labels and returns pre-aggregated Prometheus-format buckets, avoiding the raw NDJSON scan that consumed 39% CPU in pprof profiles of cold proxy traffic. Use fastjson for response parsing to avoid encoding/json overhead on the hot path. Add TestSumByCountOverTime_NoParser_UsesStatsQueryRange to guard the fast path for pure stream-selector queries.

Add __bytes__ → sum_len(_msg) to the fast path alongside __count__ → count(). bytes_rate was the next largest cold-proxy bottleneck (4664ms P50 at c=50 heavy), still reading every raw log entry to sum message sizes. VL's sum_len(_msg) in stats_query_range returns pre-aggregated byte-count buckets in the same Prometheus format; the existing sliding window builder handles rate (sum/window_s) and over_time (sum) correctly. Refactor: collectRangeMetricHits accepts statsAggFunc string so callers control the aggregation clause without duplicating the query builder.

patternLineTokenizer.Join allocated a fresh strings.Builder on every call — the dominant allocator in the log-streaming path at cold concurrency (6237 MB flat, 15.56% of all allocs per pprof). Introduces patternJoinBuilderPool (sync.Pool) to reuse builders across calls, eliminating the per-call heap allocation and the bytes.growSlice cascade it triggered (5830 MB, 14.55% flat).

github-actions · 2026-05-12T08:09:31Z

PR Quality Report

Compared against base branch main.

Coverage and tests

Signal	Base	PR	Delta
Test count	2491	2494	3
Coverage	87.1%	87.1%	0.0% (stable)

Compatibility

Track	Base	PR	Delta
Loki API	100.0%	11/11 (100.0%)	0.0% (stable)
Logs Drilldown	100.0%	17/17 (100.0%)	0.0% (stable)
VictoriaLogs	100.0%	11/11 (100.0%)	0.0% (stable)

Performance smoke

Lower CPU cost (ns/op) is better. Lower benchmark memory cost (B/op, allocs/op) is better. Higher throughput is better. Lower load-test memory growth is better. Benchmark rows are medians from repeated samples.

Signal	Base	PR	Delta
QueryRange cache-hit CPU cost	1779.0 ns/op	1663.0 ns/op	-6.5% (stable)
QueryRange cache-hit memory	200.0 B/op	200.0 B/op	0.0% (stable)
QueryRange cache-hit allocations	7.0 allocs/op	7.0 allocs/op	0.0% (stable)
QueryRange cache-bypass CPU cost	2038.0 ns/op	2010.0 ns/op	-1.4% (stable)
QueryRange cache-bypass memory	285.0 B/op	288.0 B/op	+1.1% (stable)
QueryRange cache-bypass allocations	7.0 allocs/op	7.0 allocs/op	0.0% (stable)
Labels cache-hit CPU cost	698.6 ns/op	671.4 ns/op	-3.9% (stable)
Labels cache-hit memory	48.0 B/op	48.0 B/op	0.0% (stable)
Labels cache-hit allocations	3.0 allocs/op	3.0 allocs/op	0.0% (stable)
Labels cache-bypass CPU cost	841.1 ns/op	788.0 ns/op	-6.3% (stable)
Labels cache-bypass memory	53.0 B/op	53.0 B/op	0.0% (stable)
Labels cache-bypass allocations	3.0 allocs/op	3.0 allocs/op	0.0% (stable)

State

Coverage, compatibility, and sampled performance are reported here from the same PR workflow.
This is a delta report, not a release gate by itself. Required checks still decide merge safety.
Performance is a smoke comparison, not a full benchmark lab run.
Delta states use the same noise guards as the quality gate (percent + absolute + low-baseline checks), so report labels match merge-gate behavior.

…s scan Replaces the two-phase tokenize+SplitN approach with a single byte-walk that processes each key=value token inline: - Removes intermediate []string tokens slice (make + append per call) - Removes strings.Builder for token accumulation - Replaces strings.SplitN(tok, "=", 2) with strings.IndexByte strings.genSplit was 302 MB flat (4.79%) and strings.Builder.WriteString a further 496 MB flat (7.86%) in the cold-proxy alloc profile.

trimStatsQueryRangeResponseToEnd/FromStart previously used json.Unmarshal into [][]interface{} (248 MB allocs) + json.Marshal (669 MB allocs) = 938 MB cum per pprof. New fastjson path parses in-place via pooled Parser, writes filtered series to a pooled bytes.Buffer, copies only on actual trim; no-trim path returns original body with zero allocs. translateStatsResponseLabelsWithContext previously used stdjson.Unmarshal into Go structs (477 MB cum) + make(map[string]string) per changed metric (188 MB flat) + stdjson.Marshal of full response = 1.31 GB cum total. New fastjson path: Parser pool eliminates struct allocs; metric fields visited in-place via GetObject.Visit; changed metrics written via marshalStringMapJSON (appendJSONString) directly to a pooled bytes.Buffer; unchanged items copied verbatim via MarshalTo(nil). Combined: ~2.25 GB cum eliminated from compute/heavy workload pprof.

Two pprof-identified bottlenecks in the 21.3% header alloc cluster: 1. Coalescer resp.Header.Clone() (488 MB flat): callDirect and the singleflight Do path both cloned response headers on every VL call. All three callers (vlGetCoalescedWithStatus, vlPostCoalesced, vlGetMetadataCoalesced) discard headers with `_`. Now returns nil — zero Header.Clone allocs per request. 2. net/textproto.MIMEHeader.Set in applyBackendHeaders (635 MB flat): each Header.Set call canonicalizes the key via CanonicalMIMEHeaderKey (allocates for custom headers like X-Loki-VL-Client-ID → X-Loki-Vl-Client-Id) and allocates []string{value}. Pre-canonicalized keys (package-level vars) avoid the key alloc; pre-allocated static value slices (hdrValIdentity etc.) avoid the Accept-Encoding []string alloc. Dynamic values (clientID, authUser) still alloc []string{v} but skip the CanonicalMIMEHeaderKey path.

…onse fast path, pre-compute stream canonical keys - Add fjMarshalPool + marshalFJ: reuse []byte scratch for fastjson MarshalTo calls in trimStatsQRByTimeFJ and writeTranslatedStatsItemsFJ, eliminating per-call allocation (5.1 GB flat in compute pprof) - Fix wrapAsLokiResponse: add fast-path A for {"status":"success","data":{"resultType": prefix (most common VL stats_query_range format); add hasPrefix helper; eliminates 3 GB flat of stdjson.Unmarshal+Marshal round-trip - Pre-compute translatedKey in cachedLogQueryStreamDescriptor; store Key in queryRangeWindowEntry; groupQueryRangeWindowEntries uses entry.Key instead of calling canonicalLabelsKey per-entry, eliminating ~2.5 GB cum in heavy

… direct byte building marshalWindowedStreamsResult writes the Loki query_range streams JSON envelope directly using appendJSONStringToBuffer (zero-alloc string escaping) and marshalStringMapJSONTo (direct-write label map serialization), eliminating the reflection-heavy json.Marshal path for the common non-categorized case. Categorized values with metadata still fall back to json.Marshal. Addresses 3535 MB cum (4.08%) of encoding/json.Marshal in heavy workload pprof.

Store cloneStringMap(syntheticLabels) in changedMetrics instead of pre-serialised []byte. The write pass uses marshalStringMapJSONTo() to write directly to the output buffer via appendJSONStringToBuffer (zero-alloc for ASCII labels). Eliminates marshalStringMapJSON flat (~4 GB) + appendJSONString flat (~3.3 GB) = ~7.3 GB of per-changed-metric allocations in the compute/heavy pprof. Also moves marshalStringMapJSONTo definition to query_range_windowing.go (shared via package scope) and removes now-dead marshalStringMapJSON.

…ix shellcheck - translateStatsResponseLabelsWithContext now writes {"status":"success",...} prefix, so wrapAsLokiResponse fast path A matches (zero-alloc return) instead of fast path B which allocated a new []byte per translated response - Remove unused isOTelDataFJ function (lint: unused) - Fix SC2038 shellcheck warning in loc-badges.yaml (use -print0 | xargs -0)

- README cold cache table: compute 40→210 req/s, ratio 0.03×→0.09× - docs/benchmarks.md: add alloc fix impact to compute cold overhead section - CHANGELOG: document alloc fix series (fjMarshalPool, status prefix, pre-computed stream keys, direct byte building, deferred label serialization)

Grafana's Loki datasource streaming parser reads JSON object keys sequentially. When it encounters "result" before "encodingFlags", it doesn't know to parse value tuples as triplets [ts,msg,meta], so the third element {"parsed":{...}} triggers a ReadArray decode failure and no logs are rendered. marshalWindowedStreamsResult now emits encodingFlags before result, matching the order used by streamLogQuery and writeLokiStreamQueryResponse. Also remaps all e2e-compat host ports with a 1 prefix (e.g. 9428→19428) to avoid conflicts with lakehouse compose on the same Docker host.

…e functions Resolves golangci-lint unused function warnings.

…apping wait_e2e_stack.sh, ci.yaml, security-pr.yaml, security-heavy.yaml all had hardcoded ports (3100, 3101, 9428, etc.) from before the e2e-compat host port remapping. Update to the new 1-prefixed ports (13100, 13101, 19428, etc.).

…step Queries like sum(rate({cluster="us-east-1"} | json [$__auto])) were routed to the slow manual log-fetch path because the translator adds | math to the VL pipeline (buildRateLikeQuery), which confused parseStatsCompatSpec into seeing _stream as the groupBy sentinel, blocking the fast path. For non-sliding windows (range == step, which is what Grafana $__auto uses), the translator's fully-formed VL rate pipeline is semantically correct and VL can execute it natively via stats_query_range. Bypass handleStatsCompatRange for these queries; sliding windows (range > step) still use the manual path for correct per-step accumulation semantics.

After the docker-compose host port remapping (3100→13100, 9428→19428, 8880→18880, etc.), several test files and the loki track stack wait script still referenced old port numbers: - alerting_compat_test.go: vmalertURL 8880 → 18880 - otel_compat_test.go: underscore proxy default 3102 → 13102 - structured_metadata_compat_test.go: native/translated/no-metadata proxy defaults 3106/3107/3108 → 13106/13107/13108 - wait_loki_track_stack.sh: all service ports remapped - explore.spec.ts / logs-drilldown.spec.ts: VL jsonline insert 9428 → 19428

Accept the remote section structure (Scaling Profile, Resource Usage, Known Hot Paths, Running Benchmarks, eviction pressure PromQL) while keeping our updated compute/heavy throughput numbers.

…ast path PR #350 added collectRangeMetricHits inside proxyManualRangeMetricRange which calls stats_query_range for grouped sliding-window queries. Update the test to serve both VL endpoints and verify a 200 response rather than asserting which internal path is taken. The shouldUseManualRangeMetricCompat gate behavior is already verified by the unit test.

…#352) * docs: add metric fast-path expansion design spec * docs: add metric fast-path implementation plan * test: add failing unit tests for parser-stage fast-path gate removal * perf: remove parser-stage guard from shouldUseManualRangeMetricCompat Rate, count_over_time, bytes_rate, and bytes_over_time queries with | json or | logfmt parser stages now use VL native stats_query_range when the range window equals the step (tumbling window). Previously these were forced to the manual raw-log-fetch path regardless of the window configuration. Sliding-window queries (range > step) remain on the slow path: the !rangeEqualsStep gate is preserved for correct sliding-window semantics. The originalLogql parameter is dropped from shouldUseManualRangeMetricCompat as it was only used by the removed __error__ exception check. * fix: correct VL syntax in shouldUseManualRangeMetricCompat comment * test: add integration tests for parser-stage rate tumbling/sliding routing * test: align stats_query_range stub with actual VL response format * test: update sliding-window test for PR #350 collectRangeMetricHits fast path PR #350 added collectRangeMetricHits inside proxyManualRangeMetricRange which calls stats_query_range for grouped sliding-window queries. Update the test to serve both VL endpoints and verify a 200 response rather than asserting which internal path is taken. The shouldUseManualRangeMetricCompat gate behavior is already verified by the unit test. * docs(changelog): add entry for parser-stage guard removal --------- Co-authored-by: szibis <szibis@users.noreply.github.com>

szibis and others added 18 commits April 25, 2026 15:09

fix(loki): increase max_query_series from 500 to 5000

7ad35b2

The default 500 series limit truncates results for broad queries, creating false parity comparisons in e2e tests.

docs: add Unreleased changelog entry for e2e _msg fix

31e6ed6

Documents the addition of _msg fields to JSON log generators so the changelog gate passes for PR #255.

merge: resolve CHANGELOG conflict with main

5225c38

Keep Unreleased e2e _msg fix entry from the feature branch.

Merge branch 'main' into feature/pipeline-parity-tests

a7ecba3

chore: add CHANGELOG entries for stats_query_range fast path

450ace6

szibis force-pushed the perf/response-streaming branch from 63509b3 to 450ace6 Compare May 12, 2026 07:59

github-actions Bot added size/L Large change scope/proxy Proxy core scope/docs Documentation scope/tests Tests performance Performance labels May 12, 2026

github-actions Bot added size/L Large change and removed size/L Large change labels May 12, 2026

github-code-quality Bot found potential problems May 12, 2026

View reviewed changes

Comment thread internal/proxy/drilldown.go Fixed

github-actions Bot added the size/XL Extra large change label May 12, 2026

szibis added 7 commits May 12, 2026 14:31

github-actions Bot added size/XL Extra large change scope/middleware Middleware and removed size/XL Extra large change labels May 12, 2026

szibis added 2 commits May 12, 2026 16:40

fix: remove unused marshalStringMapJSON and isVLDataResultTypeRespons…

996b167

…e functions Resolves golangci-lint unused function warnings.

github-actions Bot added size/XL Extra large change and removed size/XL Extra large change labels May 12, 2026

szibis added 2 commits May 12, 2026 17:44

merge: resolve benchmarks.md conflict from remote bench/pprof additions

fa92c6d

Accept the remote section structure (Scaling Profile, Resource Usage, Known Hot Paths, Running Benchmarks, eviction pressure PromQL) while keeping our updated compute/heavy throughput numbers.

github-actions Bot added size/XL Extra large change and removed size/XL Extra large change labels May 12, 2026

szibis merged commit 7184928 into main May 12, 2026
48 checks passed

szibis deleted the perf/response-streaming branch May 12, 2026 16:04

szibis added the no-release Skip auto-release for this PR label May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: cold proxy +3-4x — stats_query_range fast path + 4 follow-up alloc fixes#350

perf: cold proxy +3-4x — stats_query_range fast path + 4 follow-up alloc fixes#350
szibis merged 42 commits into
mainfrom
perf/response-streaming

szibis commented May 12, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

szibis commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. stats_query_range fast path for sum() / sum by() count/rate (dbaf8dd)

2. Extend fast path to bytes_rate / bytes_over_time (7ddc008)

3. fastjson in stats response hot paths (70a7d33)

4. Scratch pool for MarshalTo + wrapAsLokiResponse fix (24e26cb)

5. Eliminate per-request HTTP header allocs in VL sub-window hot path (d92a125)

6. Replace json.Marshal reflection in windowed streams result (e864a1c)

7. Eliminate appendJSONString allocations in stats label translation (ddcb3de)

8. Fix encodingFlags ordering in windowed streams response (e11561d)

9. sum(rate({...} | json [$__auto])) slow path bypass (b4d0903)

10. Remove unused functions (996b167)

11. parseLogfmtFields sentinel loop rewrite (9a017e0)

12. e2e port remapping + CI fixes (80adeed)

pprof progression

Test plan

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Quality Report

Coverage and tests

Compatibility

Performance smoke

State

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

szibis commented May 12, 2026 •

edited

Loading

1. `stats_query_range` fast path for `sum()` / `sum by()` count/rate (`dbaf8dd`)

2. Extend fast path to `bytes_rate` / `bytes_over_time` (`7ddc008`)

3. `fastjson` in stats response hot paths (`70a7d33`)

4. Scratch pool for `MarshalTo` + `wrapAsLokiResponse` fix (`24e26cb`)

5. Eliminate per-request HTTP header allocs in VL sub-window hot path (`d92a125`)

6. Replace `json.Marshal` reflection in windowed streams result (`e864a1c`)

7. Eliminate `appendJSONString` allocations in stats label translation (`ddcb3de`)

8. Fix `encodingFlags` ordering in windowed streams response (`e11561d`)

9. `sum(rate({...} | json [$__auto]))` slow path bypass (`b4d0903`)

10. Remove unused functions (`996b167`)

11. `parseLogfmtFields` sentinel loop rewrite (`9a017e0`)

12. e2e port remapping + CI fixes (`80adeed`)

github-actions Bot commented May 12, 2026 •

edited

Loading