perf: remove parser-stage guard to expand stats_query_range fast path by szibis · Pull Request #352 · ReliablyObserve/loki-vl-proxy

szibis · 2026-05-12T20:56:47Z

Summary

Removes the queryUsesParserStages guard from shouldUseManualRangeMetricCompat for rate, bytes_rate, count_over_time, and bytes_over_time — queries with | json or | logfmt parser stages now use VL native stats_query_range when range == step (tumbling window)
Drops the now-unused originalLogql 4th parameter from shouldUseManualRangeMetricCompat and updates both callers
Sliding-window queries (range > step) remain on the slow manual path — !rangeEqualsStep gate is preserved for correct semantics

Why

The guard was conservative: VL stats_query_range natively handles inline filter pipelines including | unpack_json and | unpack_logfmt. Parser-stage queries were forced onto the raw-log-fetch + O(N) aggregation slow path even in tumbling-window configurations where VL native stats is semantically equivalent. This removed a significant bottleneck for common Grafana panels where step == $__interval == range.

Test Plan

Unit tests: TestShouldUseManualRangeMetricCompat_ParserStageRate (9 cases) — parser-stage rate/count/bytes with tumbling window → fast path, sliding window → slow path preserved
Integration tests: TestQueryRange_RateParserStageTumblingUsesStatsQueryRange and TestQueryRange_RateParserStageSlidingUsesSlowPath
1634 unit tests pass (go test ./internal/proxy/... -count=1)
TestLogQL_Exhaustive_ErrorParity — 32 error parity cases pass
TestLogQL_Exhaustive_QueryParity — 84 query parity cases pass
TestPipeline_MetricQueries — parser-stage metric queries pass
TestRangeMetricCompatibilityMatrix failures (rate, count_over_time, bytes_rate, bytes_over_time with step=60) are pre-existing and unrelated to this change (queries use no parser stages; code path is identical before/after)

Rate, count_over_time, bytes_rate, and bytes_over_time queries with | json or | logfmt parser stages now use VL native stats_query_range when the range window equals the step (tumbling window). Previously these were forced to the manual raw-log-fetch path regardless of the window configuration. Sliding-window queries (range > step) remain on the slow path: the !rangeEqualsStep gate is preserved for correct sliding-window semantics. The originalLogql parameter is dropped from shouldUseManualRangeMetricCompat as it was only used by the removed __error__ exception check.

…uting

…ast path PR #350 added collectRangeMetricHits inside proxyManualRangeMetricRange which calls stats_query_range for grouped sliding-window queries. Update the test to serve both VL endpoints and verify a 200 response rather than asserting which internal path is taken. The shouldUseManualRangeMetricCompat gate behavior is already verified by the unit test.

github-actions · 2026-05-12T21:12:36Z

PR Quality Report

Compared against base branch main.

Coverage and tests

Signal	Base	PR	Delta
Test count	2494	2506	12
Coverage	87.1%	87.2%	+0.1% (improved)

Compatibility

Track	Base	PR	Delta
Loki API	100.0%	11/11 (100.0%)	0.0% (stable)
Logs Drilldown	100.0%	17/17 (100.0%)	0.0% (stable)
VictoriaLogs	100.0%	11/11 (100.0%)	0.0% (stable)

Performance smoke

Lower CPU cost (ns/op) is better. Lower benchmark memory cost (B/op, allocs/op) is better. Higher throughput is better. Lower load-test memory growth is better. Benchmark rows are medians from repeated samples.

Signal	Base	PR	Delta
QueryRange cache-hit CPU cost	1753.0 ns/op	1756.0 ns/op	+0.2% (stable)
QueryRange cache-hit memory	200.0 B/op	200.0 B/op	0.0% (stable)
QueryRange cache-hit allocations	7.0 allocs/op	7.0 allocs/op	0.0% (stable)
QueryRange cache-bypass CPU cost	2044.0 ns/op	1975.0 ns/op	-3.4% (stable)
QueryRange cache-bypass memory	286.0 B/op	286.0 B/op	0.0% (stable)
QueryRange cache-bypass allocations	7.0 allocs/op	7.0 allocs/op	0.0% (stable)
Labels cache-hit CPU cost	692.8 ns/op	682.0 ns/op	-1.6% (stable)
Labels cache-hit memory	48.0 B/op	48.0 B/op	0.0% (stable)
Labels cache-hit allocations	3.0 allocs/op	3.0 allocs/op	0.0% (stable)
Labels cache-bypass CPU cost	828.6 ns/op	806.0 ns/op	-2.7% (stable)
Labels cache-bypass memory	53.0 B/op	53.0 B/op	0.0% (stable)
Labels cache-bypass allocations	3.0 allocs/op	3.0 allocs/op	0.0% (stable)

State

Coverage, compatibility, and sampled performance are reported here from the same PR workflow.
This is a delta report, not a release gate by itself. Required checks still decide merge safety.
Performance is a smoke comparison, not a full benchmark lab run.
Delta states use the same noise guards as the quality gate (percent + absolute + low-baseline checks), so report labels match merge-gate behavior.

szibis added 8 commits May 12, 2026 22:59

docs: add metric fast-path expansion design spec

305557a

docs: add metric fast-path implementation plan

9b6fc79

test: add failing unit tests for parser-stage fast-path gate removal

8bd7f81

fix: correct VL syntax in shouldUseManualRangeMetricCompat comment

149cbe5

test: add integration tests for parser-stage rate tumbling/sliding ro…

a4cd0c5

…uting

test: align stats_query_range stub with actual VL response format

4635e96

szibis force-pushed the perf/metric-fast-path branch from fa1f677 to 7ec5e78 Compare May 12, 2026 21:03

github-actions Bot added size/XL Extra large change scope/proxy Proxy core scope/docs Documentation scope/tests Tests performance Performance labels May 12, 2026

docs(changelog): add entry for parser-stage guard removal

af0dd93

github-actions Bot added size/XL Extra large change and removed size/XL Extra large change labels May 13, 2026

szibis merged commit df9a988 into main May 13, 2026
48 checks passed

szibis deleted the perf/metric-fast-path branch May 13, 2026 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: remove parser-stage guard to expand stats_query_range fast path#352

perf: remove parser-stage guard to expand stats_query_range fast path#352
szibis merged 9 commits into
mainfrom
perf/metric-fast-path

szibis commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

szibis commented May 12, 2026

Summary

Why

Test Plan

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Quality Report

Coverage and tests

Compatibility

Performance smoke

State

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 12, 2026 •

edited

Loading