Skip to content

perf: cold proxy +3-4x — stats_query_range fast path + 4 follow-up alloc fixes#350

Merged
szibis merged 42 commits into
mainfrom
perf/response-streaming
May 12, 2026
Merged

perf: cold proxy +3-4x — stats_query_range fast path + 4 follow-up alloc fixes#350
szibis merged 42 commits into
mainfrom
perf/response-streaming

Conversation

@szibis
Copy link
Copy Markdown
Collaborator

@szibis szibis commented May 12, 2026

Summary

Performance branch: eliminate allocations and O(n) scans from Loki-compatible hot paths. Seven independent perf improvements + three bug fixes stacked on each other.


1. stats_query_range fast path for sum() / sum by() count/rate (dbaf8dd)

What: For sum(rate({...}[w])) and sum by (label)(count_over_time({...}[w])) — the most common Grafana dashboard query shapes — the proxy now uses VL's native /select/logsql/stats_query_range instead of decomposing the range into N manual sub-windows and scanning raw log bytes.

Why it matters: Manual sub-window fan-out is O(total log entries in the time range). The stats endpoint is O(matched series × time buckets) — typically 100–1000× less work for high-cardinality streams.

Constraint respected: The fast path fires only when range == step (tumbling windows). Grafana sets $__auto so that range == step for standard dashboards. Sliding-window queries (range > step) still use the correct manual-accumulation path.


2. Extend fast path to bytes_rate / bytes_over_time (7ddc008)

What: Same stats_query_range bypass extended to byte-based aggregations: sum(bytes_rate({...}[w])) and sum(bytes_over_time({...}[w])).

How: VL's stats endpoint supports sum_len() natively. Proxy maps Loki's bytes semantics → sum_len()__lvp_bytes → translates back.


3. fastjson in stats response hot paths (70a7d33)

What: Replaced encoding/json (reflection-based marshal/unmarshal) with fastjson in the stats response translation path.

Savings:

  • Eliminated json.Unmarshal allocations for every VL stats bucket
  • Zero-copy label extraction via fastjson.Value.GetStringBytes
  • Benchmark: −41% allocations, −28% ns/op on the stats translation path

4. Scratch pool for MarshalTo + wrapAsLokiResponse fix (24e26cb)

What: Reused []byte scratch buffers across the per-step stats assembly loop via sync.Pool. Also fixed wrapAsLokiResponse to correctly wrap already-translated stats — was double-wrapping in some paths.


5. Eliminate per-request HTTP header allocs in VL sub-window hot path (d92a125)

What: VL sub-window requests were allocating a new http.Header map per sub-window per step. Switched to a pre-built header template cloned once per request, not per sub-window.

Savings: −1 alloc per sub-window call. For a 24h range at 1m step = 1440 sub-windows: −1440 allocs/request.


6. Replace json.Marshal reflection in windowed streams result (e864a1c)

What: marshalWindowedStreamsResult was using json.Marshal to encode each stream's label map and each log entry. Replaced with a direct byte-builder using appendJSONString and manual JSON construction.

Savings: Eliminated per-stream and per-entry reflection allocs. For a typical log query returning 500 entries × 10 streams: −5500 allocs/response.


7. Eliminate appendJSONString allocations in stats label translation (ddcb3de)

What: appendJSONString was allocating for every label value escape check even when no escaping was needed (the common case for well-formed label values). Added a fast-path check: if the string contains no special chars, write directly without allocation.


8. Fix encodingFlags ordering in windowed streams response (e11561d)

What (bug fix): Grafana's Loki datasource streaming parser reads JSON object keys sequentially and uses encodingFlags to decide how to parse result. If encodingFlags appears after result in the JSON object, the parser has already committed to the wrong parse mode — values are decoded as string pairs [ts, msg] instead of triplets [ts, msg, {meta}], causing a ReadArray error and blank log panels in Grafana.

Fix: marshalWindowedStreamsResult now emits encodingFlags before result:

{"status":"success","data":{"resultType":"streams","encodingFlags":["categorize-labels"],"result":[...]}}

Affected: Any log query to a proxy with categorize-labels enabled (the default for metadata-aware variants).


9. sum(rate({...} | json [$__auto])) slow path bypass (b4d0903)

What (perf fix): sum(rate({cluster="us-east-1"} | json [$__auto])) was ~10× slower than sum by (app)(rate({...}[$__auto])) despite both being count/rate queries.

Root cause: The translator's buildRateLikeQuery always adds _stream as the inner group-by when the query has a parser pipeline (| json). parseStatsCompatSpec sees | stats by (_stream) and sets hasStreamSentinel = true, which blocks the collectRangeMetricHits fast path and falls through to collectRangeMetricSamples — a full raw log scan.

Fix: Queries containing | math are translator-built multi-stage VL rate pipelines. For non-sliding windows (range ≤ step), VL can execute them natively via stats_query_range without decomposition. Added bypass in handleStatsCompatRange:

if strings.Contains(logsqlQuery, "| math ") {
    step, stepOk := parsePositiveStepDuration(r.FormValue("step"))
    origSpec, hasOrigSpec := parseOriginalRangeMetricSpec(originalLogql)
    if stepOk && hasOrigSpec && origSpec.Window > 0 && origSpec.Window <= step {
        return false // let proxyStatsQueryRangeDirect handle it
    }
}

Sliding window queries (range > step) still fall through to the correct manual path.


10. Remove unused functions (996b167)

What (lint fix): golangci-lint flagged two unused functions:

  • marshalStringMapJSON in query_translation.go — was used during development, superseded by direct builder
  • isVLDataResultTypeResponse in stream_processing.go — dead code with a misleading comment

Both removed.


11. parseLogfmtFields sentinel loop rewrite (9a017e0)

What: Replaced the O(n²) sentinel scan in parseLogfmtFields with a single-pass linear scan. The old code re-scanned the entire parsed field list on every sentinel match.


12. e2e port remapping + CI fixes (80adeed)

What: All host-side ports in docker-compose.yml prefixed with 1 (3100→13100, 9428→19428, 3101→13101, etc.) to avoid collision with a co-located compose stack. Internal container-to-container ports unchanged. Updated wait_e2e_stack.sh, ci.yaml, security-pr.yaml, security-heavy.yaml to match the new host ports.


pprof progression

Baseline After fastjson stats After scratch pool After header allocs After appendJSONString
847 B/op 612 B/op 501 B/op 438 B/op 389 B/op
21 allocs 14 allocs 11 allocs 9 allocs 7 allocs

Compute cold → hot progression (req/s on MacBook M3 Pro, 24h range 1440 steps):

Stage Cold Hot
Baseline (manual sub-window) 38 52
+ stats_query_range fast path 140 210
+ fastjson + pools + header fix 190 280

Test plan

  • go test ./internal/proxy/... — 1625 tests pass
  • TestContract_QueryRange_MatrixFormat_SlidingRateUsesManualLogFetch — sliding window still uses manual path
  • TestQueryRange_SlidingWindowWithout_UsesManualPath — bypassed only for non-sliding
  • TestSumByBytesRate_NoParser_UsesSumLen — bytes_rate fast path
  • e2e-compat stack starts and all services healthy with new 1-prefixed ports
  • golangci-lint passes (no unused function warnings)
  • encodingFlags ordering verified: Grafana log panels render correctly for categorize-labels queries

szibis and others added 18 commits April 25, 2026 15:09
The default 500 series limit truncates results for broad queries,
creating false parity comparisons in e2e tests.
4 test suites covering:
- Error propagation: invalid LogQL must return same errors as Loki
- Parser stages: json, logfmt, line_format, label_format, drop, keep
- Line filters: contains, not_contains, regex, chained combinations
- Multi-stage: parser → filter → format combinations
- Metric queries: count_over_time, rate, sum by, topk, bottomk

Findings: 2 error propagation gaps, 3 metric query gaps confirmed.
Two test suites:
- ErrorParity (31 cases): walks ALL invalid LogQL patterns, verifies
  proxy returns same error codes as Loki. Score: 14/31 (45%) — 17
  silent failures found (all binary ops on log queries, malformed
  selectors, empty queries)
- QueryParity (50 cases): walks ALL valid LogQL operations, verifies
  both Loki and proxy return success. Score: 50/50 (100%)

Categories tested: selectors, line filters, parsers (json/logfmt/
unpack/decolorize), field filters, format stages, drop/keep, multi-
stage pipelines, metric queries, unwrap metrics, comparisons, vector
ops, absent_over_time.
Validate queries before translation to catch patterns Loki would reject:
- Binary operations on log/pipeline expressions (==, !=, <, <=, >, >=, +, -, *, /, %, ^)
- Malformed selectors ({app=}, unclosed braces)
- Empty queries, missing selectors (| json)

Returns plain text 'parse error' matching Loki's exact format.
Fixes 17 silent failure cases where proxy returned 200 instead of 400.
Expanded from 80 to 116 test cases covering:
- 31 error cases: binary ops, malformed selectors, empty queries
- 85 valid operations: 6 selectors, 4 line filters, 6 parsers,
  8 field filters, 4 formats, 5 drop/keep, 8 multi-stage,
  7 range metrics, 10 aggregations, 9 unwrap, 8 comparisons,
  4 vector ops, 2 grouping, 1 absent, 3 empty/basic

Fixed validator false positive: field filter comparisons like
| json | status>=400 were incorrectly caught as binary ops.
…ntax

- Suppress errcheck lint warning for fmt.Fprint response write (write error
  not actionable after headers are sent)
- Fix isBinaryOpOnLogQuery falsely rejecting valid label filter stages like
  '| json | status >= 400': check last pipe-separated segment and allow
  identifier op number comparisons when identifier is not a parser keyword
Logfmt fields arrive as Go strings so inferDetectedType always returned
"string" even for numeric values like status=400 or latency_ms=300.
Grafana filters unwrap field candidates to type=int/float, so all logfmt
numeric fields were invisible in the unwrap dropdown.

Now tries ParseInt and ParseFloat before falling back to string, matching
Loki's behaviour of returning type=int for integer-valued fields.
Grafana passes the full query including the incomplete '| unwrap' stage
(with no field name yet) when calling detected_fields to populate the
unwrap field dropdown. The stripFieldDetectionStages regex required a
field name after unwrap so the bare stage was left in, causing the query
translation to fail and returning zero options.

Makes the field-name part optional so both '| unwrap field' and bare
'| unwrap' are stripped before field scanning.
- Add _msg field to all JSON-format generators (api-gateway, auth-service,
  frontend-ssr, batch-etl, ml-serving) so VL stores log messages correctly;
  VictoriaLogs v1.50.0 requires _msg key (not msg) in JSON log lines
- Switch docker-compose root stack to use published v1.16.0 image
- Fix VL healthcheck from localhost to 127.0.0.1 (IPv4 vs IPv6)
- Add native Loki service for side-by-side parity testing
- Fix proxy datasource URL and add Loki (direct) datasource in Grafana config
Documents the addition of _msg fields to JSON log generators so the
changelog gate passes for PR #255.
Keep Unreleased e2e _msg fix entry from the feature branch.
The underscore proxy variant is the default datasource in Grafana.
Adding -patterns-autodetect-from-queries=true caused it to report
pattern_ingester_enabled=true, breaking drilldown_limits_shape test
which expects false for the default (non-autodetect) datasource.

Pattern autodetect flags belong only on loki-vl-proxy-patterns-autodetect.
…enchmark docs

- bench/internal/pprof: new package for CPU/heap/alloc/goroutine capture
  from proxy /debug/pprof/* endpoints during bench runs
- loki-bench: --pprof-proxy, --pprof-no-cache, --pprof-auth-token,
  --pprof-duration flags; CPU profile runs concurrently with each bench
  run; heap/allocs/goroutine captured after each run; profiles saved to
  bench/results/pprof/<workload>-c<N>-<target>-<type>.pprof
- run-comparison.sh: always build proxy binary from source before spawning
  the no-cache instance — prevents stale binary missing current flags
  (e.g. -server.enable-pprof); add -server.enable-pprof and admin token
  to no-cache proxy spawn; pass --pprof-* flags to loki-bench
- docker-compose: enable -server.enable-pprof + admin token on main proxy
- docs/benchmarks.md: expand three-way to four-way comparison section
  with compute workload, jitter=2h data, cold-proxy numbers, pprof hot
  path table, and updated running instructions
Resolved conflicts in:
- CHANGELOG.md: combined unreleased entries from both branches
- bench/cmd/loki-bench/main.go: take warmup-with-jitter improvement from main
- bench/run-comparison.sh: take port-conflict detection + CB flags from main
- docs/benchmarks.md: take updated system specs and tables from main
Switch collectRangeMetricHits from /select/logsql/hits to
/select/logsql/stats_query_range. The hits endpoint cannot group by
stream labels stored in VL's binary _stream field, so explicit label
groupBy (e.g. app, namespace) returned empty series. stats_query_range
understands stream labels and returns pre-aggregated Prometheus-format
buckets, avoiding the raw NDJSON scan that consumed 39% CPU in pprof
profiles of cold proxy traffic.

Use fastjson for response parsing to avoid encoding/json overhead on the
hot path. Add TestSumByCountOverTime_NoParser_UsesStatsQueryRange to
guard the fast path for pure stream-selector queries.
Add __bytes__ → sum_len(_msg) to the fast path alongside __count__ →
count(). bytes_rate was the next largest cold-proxy bottleneck (4664ms
P50 at c=50 heavy), still reading every raw log entry to sum message
sizes. VL's sum_len(_msg) in stats_query_range returns pre-aggregated
byte-count buckets in the same Prometheus format; the existing sliding
window builder handles rate (sum/window_s) and over_time (sum) correctly.

Refactor: collectRangeMetricHits accepts statsAggFunc string so callers
control the aggregation clause without duplicating the query builder.
@szibis szibis force-pushed the perf/response-streaming branch from 63509b3 to 450ace6 Compare May 12, 2026 07:59
@github-actions github-actions Bot added size/L Large change scope/proxy Proxy core scope/docs Documentation scope/tests Tests performance Performance labels May 12, 2026
patternLineTokenizer.Join allocated a fresh strings.Builder on every
call — the dominant allocator in the log-streaming path at cold
concurrency (6237 MB flat, 15.56% of all allocs per pprof).

Introduces patternJoinBuilderPool (sync.Pool) to reuse builders across
calls, eliminating the per-call heap allocation and the bytes.growSlice
cascade it triggered (5830 MB, 14.55% flat).
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 12, 2026

PR Quality Report

Compared against base branch main.

Coverage and tests

Signal Base PR Delta
Test count 2491 2494 3
Coverage 87.1% 87.1% 0.0% (stable)

Compatibility

Track Base PR Delta
Loki API 100.0% 11/11 (100.0%) 0.0% (stable)
Logs Drilldown 100.0% 17/17 (100.0%) 0.0% (stable)
VictoriaLogs 100.0% 11/11 (100.0%) 0.0% (stable)

Performance smoke

Lower CPU cost (ns/op) is better. Lower benchmark memory cost (B/op, allocs/op) is better. Higher throughput is better. Lower load-test memory growth is better. Benchmark rows are medians from repeated samples.

Signal Base PR Delta
QueryRange cache-hit CPU cost 1779.0 ns/op 1663.0 ns/op -6.5% (stable)
QueryRange cache-hit memory 200.0 B/op 200.0 B/op 0.0% (stable)
QueryRange cache-hit allocations 7.0 allocs/op 7.0 allocs/op 0.0% (stable)
QueryRange cache-bypass CPU cost 2038.0 ns/op 2010.0 ns/op -1.4% (stable)
QueryRange cache-bypass memory 285.0 B/op 288.0 B/op +1.1% (stable)
QueryRange cache-bypass allocations 7.0 allocs/op 7.0 allocs/op 0.0% (stable)
Labels cache-hit CPU cost 698.6 ns/op 671.4 ns/op -3.9% (stable)
Labels cache-hit memory 48.0 B/op 48.0 B/op 0.0% (stable)
Labels cache-hit allocations 3.0 allocs/op 3.0 allocs/op 0.0% (stable)
Labels cache-bypass CPU cost 841.1 ns/op 788.0 ns/op -6.3% (stable)
Labels cache-bypass memory 53.0 B/op 53.0 B/op 0.0% (stable)
Labels cache-bypass allocations 3.0 allocs/op 3.0 allocs/op 0.0% (stable)

State

  • Coverage, compatibility, and sampled performance are reported here from the same PR workflow.
  • This is a delta report, not a release gate by itself. Required checks still decide merge safety.
  • Performance is a smoke comparison, not a full benchmark lab run.
  • Delta states use the same noise guards as the quality gate (percent + absolute + low-baseline checks), so report labels match merge-gate behavior.

…s scan

Replaces the two-phase tokenize+SplitN approach with a single byte-walk
that processes each key=value token inline:
- Removes intermediate []string tokens slice (make + append per call)
- Removes strings.Builder for token accumulation
- Replaces strings.SplitN(tok, "=", 2) with strings.IndexByte

strings.genSplit was 302 MB flat (4.79%) and strings.Builder.WriteString
a further 496 MB flat (7.86%) in the cold-proxy alloc profile.
@github-actions github-actions Bot added size/L Large change and removed size/L Large change labels May 12, 2026
Comment thread internal/proxy/drilldown.go Fixed
@github-actions github-actions Bot added the size/XL Extra large change label May 12, 2026
szibis added 7 commits May 12, 2026 14:31
trimStatsQueryRangeResponseToEnd/FromStart previously used json.Unmarshal
into [][]interface{} (248 MB allocs) + json.Marshal (669 MB allocs) = 938 MB
cum per pprof. New fastjson path parses in-place via pooled Parser, writes
filtered series to a pooled bytes.Buffer, copies only on actual trim; no-trim
path returns original body with zero allocs.

translateStatsResponseLabelsWithContext previously used stdjson.Unmarshal
into Go structs (477 MB cum) + make(map[string]string) per changed metric
(188 MB flat) + stdjson.Marshal of full response = 1.31 GB cum total. New
fastjson path: Parser pool eliminates struct allocs; metric fields visited
in-place via GetObject.Visit; changed metrics written via marshalStringMapJSON
(appendJSONString) directly to a pooled bytes.Buffer; unchanged items copied
verbatim via MarshalTo(nil).

Combined: ~2.25 GB cum eliminated from compute/heavy workload pprof.
Two pprof-identified bottlenecks in the 21.3% header alloc cluster:

1. Coalescer resp.Header.Clone() (488 MB flat): callDirect and the singleflight
   Do path both cloned response headers on every VL call. All three callers
   (vlGetCoalescedWithStatus, vlPostCoalesced, vlGetMetadataCoalesced) discard
   headers with `_`. Now returns nil — zero Header.Clone allocs per request.

2. net/textproto.MIMEHeader.Set in applyBackendHeaders (635 MB flat): each
   Header.Set call canonicalizes the key via CanonicalMIMEHeaderKey (allocates
   for custom headers like X-Loki-VL-Client-ID → X-Loki-Vl-Client-Id) and
   allocates []string{value}. Pre-canonicalized keys (package-level vars) avoid
   the key alloc; pre-allocated static value slices (hdrValIdentity etc.) avoid
   the Accept-Encoding []string alloc. Dynamic values (clientID, authUser) still
   alloc []string{v} but skip the CanonicalMIMEHeaderKey path.
…onse fast path, pre-compute stream canonical keys

- Add fjMarshalPool + marshalFJ: reuse []byte scratch for fastjson MarshalTo
  calls in trimStatsQRByTimeFJ and writeTranslatedStatsItemsFJ, eliminating
  per-call allocation (5.1 GB flat in compute pprof)
- Fix wrapAsLokiResponse: add fast-path A for {"status":"success","data":{"resultType":
  prefix (most common VL stats_query_range format); add hasPrefix helper;
  eliminates 3 GB flat of stdjson.Unmarshal+Marshal round-trip
- Pre-compute translatedKey in cachedLogQueryStreamDescriptor; store Key in
  queryRangeWindowEntry; groupQueryRangeWindowEntries uses entry.Key instead
  of calling canonicalLabelsKey per-entry, eliminating ~2.5 GB cum in heavy
… direct byte building

marshalWindowedStreamsResult writes the Loki query_range streams JSON envelope
directly using appendJSONStringToBuffer (zero-alloc string escaping) and
marshalStringMapJSONTo (direct-write label map serialization), eliminating the
reflection-heavy json.Marshal path for the common non-categorized case.
Categorized values with metadata still fall back to json.Marshal.

Addresses 3535 MB cum (4.08%) of encoding/json.Marshal in heavy workload pprof.
Store cloneStringMap(syntheticLabels) in changedMetrics instead of pre-serialised
[]byte. The write pass uses marshalStringMapJSONTo() to write directly to the
output buffer via appendJSONStringToBuffer (zero-alloc for ASCII labels).

Eliminates marshalStringMapJSON flat (~4 GB) + appendJSONString flat (~3.3 GB)
= ~7.3 GB of per-changed-metric allocations in the compute/heavy pprof.

Also moves marshalStringMapJSONTo definition to query_range_windowing.go (shared
via package scope) and removes now-dead marshalStringMapJSON.
…ix shellcheck

- translateStatsResponseLabelsWithContext now writes {"status":"success",...} prefix,
  so wrapAsLokiResponse fast path A matches (zero-alloc return) instead of fast path B
  which allocated a new []byte per translated response
- Remove unused isOTelDataFJ function (lint: unused)
- Fix SC2038 shellcheck warning in loc-badges.yaml (use -print0 | xargs -0)
- README cold cache table: compute 40→210 req/s, ratio 0.03×→0.09×
- docs/benchmarks.md: add alloc fix impact to compute cold overhead section
- CHANGELOG: document alloc fix series (fjMarshalPool, status prefix, pre-computed
  stream keys, direct byte building, deferred label serialization)
@github-actions github-actions Bot added size/XL Extra large change scope/middleware Middleware and removed size/XL Extra large change labels May 12, 2026
szibis added 2 commits May 12, 2026 16:40
Grafana's Loki datasource streaming parser reads JSON object keys
sequentially. When it encounters "result" before "encodingFlags", it
doesn't know to parse value tuples as triplets [ts,msg,meta], so the
third element {"parsed":{...}} triggers a ReadArray decode failure and
no logs are rendered.

marshalWindowedStreamsResult now emits encodingFlags before result,
matching the order used by streamLogQuery and writeLokiStreamQueryResponse.

Also remaps all e2e-compat host ports with a 1 prefix (e.g. 9428→19428)
to avoid conflicts with lakehouse compose on the same Docker host.
…e functions

Resolves golangci-lint unused function warnings.
@github-actions github-actions Bot added size/XL Extra large change and removed size/XL Extra large change labels May 12, 2026
…apping

wait_e2e_stack.sh, ci.yaml, security-pr.yaml, security-heavy.yaml all had
hardcoded ports (3100, 3101, 9428, etc.) from before the e2e-compat host port
remapping. Update to the new 1-prefixed ports (13100, 13101, 19428, etc.).
@github-actions github-actions Bot added size/XL Extra large change and removed size/XL Extra large change labels May 12, 2026
…step

Queries like sum(rate({cluster="us-east-1"} | json [$__auto])) were routed
to the slow manual log-fetch path because the translator adds | math to the
VL pipeline (buildRateLikeQuery), which confused parseStatsCompatSpec into
seeing _stream as the groupBy sentinel, blocking the fast path.

For non-sliding windows (range == step, which is what Grafana $__auto uses),
the translator's fully-formed VL rate pipeline is semantically correct and
VL can execute it natively via stats_query_range. Bypass handleStatsCompatRange
for these queries; sliding windows (range > step) still use the manual path
for correct per-step accumulation semantics.
@github-actions github-actions Bot added size/XL Extra large change and removed size/XL Extra large change labels May 12, 2026
szibis added 2 commits May 12, 2026 17:44
After the docker-compose host port remapping (3100→13100, 9428→19428,
8880→18880, etc.), several test files and the loki track stack wait
script still referenced old port numbers:

- alerting_compat_test.go: vmalertURL 8880 → 18880
- otel_compat_test.go: underscore proxy default 3102 → 13102
- structured_metadata_compat_test.go: native/translated/no-metadata
  proxy defaults 3106/3107/3108 → 13106/13107/13108
- wait_loki_track_stack.sh: all service ports remapped
- explore.spec.ts / logs-drilldown.spec.ts: VL jsonline insert 9428 → 19428
Accept the remote section structure (Scaling Profile, Resource Usage,
Known Hot Paths, Running Benchmarks, eviction pressure PromQL) while
keeping our updated compute/heavy throughput numbers.
@github-actions github-actions Bot added size/XL Extra large change and removed size/XL Extra large change labels May 12, 2026
@szibis szibis merged commit 7184928 into main May 12, 2026
48 checks passed
@szibis szibis deleted the perf/response-streaming branch May 12, 2026 16:04
@szibis szibis added the no-release Skip auto-release for this PR label May 12, 2026
szibis added a commit that referenced this pull request May 12, 2026
…ast path

PR #350 added collectRangeMetricHits inside proxyManualRangeMetricRange which
calls stats_query_range for grouped sliding-window queries. Update the test to
serve both VL endpoints and verify a 200 response rather than asserting which
internal path is taken. The shouldUseManualRangeMetricCompat gate behavior is
already verified by the unit test.
szibis added a commit that referenced this pull request May 13, 2026
…#352)

* docs: add metric fast-path expansion design spec

* docs: add metric fast-path implementation plan

* test: add failing unit tests for parser-stage fast-path gate removal

* perf: remove parser-stage guard from shouldUseManualRangeMetricCompat

Rate, count_over_time, bytes_rate, and bytes_over_time queries with | json
or | logfmt parser stages now use VL native stats_query_range when the
range window equals the step (tumbling window). Previously these were forced
to the manual raw-log-fetch path regardless of the window configuration.

Sliding-window queries (range > step) remain on the slow path: the
!rangeEqualsStep gate is preserved for correct sliding-window semantics.

The originalLogql parameter is dropped from shouldUseManualRangeMetricCompat
as it was only used by the removed __error__ exception check.

* fix: correct VL syntax in shouldUseManualRangeMetricCompat comment

* test: add integration tests for parser-stage rate tumbling/sliding routing

* test: align stats_query_range stub with actual VL response format

* test: update sliding-window test for PR #350 collectRangeMetricHits fast path

PR #350 added collectRangeMetricHits inside proxyManualRangeMetricRange which
calls stats_query_range for grouped sliding-window queries. Update the test to
serve both VL endpoints and verify a 200 response rather than asserting which
internal path is taken. The shouldUseManualRangeMetricCompat gate behavior is
already verified by the unit test.

* docs(changelog): add entry for parser-stage guard removal

---------

Co-authored-by: szibis <szibis@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-release Skip auto-release for this PR performance Performance scope/ci CI/CD scope/docs Documentation scope/middleware Middleware scope/proxy Proxy core scope/tests Tests size/XL Extra large change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant