fix: address code review issues — parser-stage correctness, memory cap, O(N²) chunked fetch, and docs by szibis · Pull Request #355 · ReliablyObserve/loki-vl-proxy

szibis · 2026-05-13T11:50:31Z

Summary

Fix 1 (HIGH correctness): Restore __error__ guard for parser-stage metric queries. VL stats_query_range counts parse-failed lines while Loki excludes them. All rate/bytes_rate/count_over_time/bytes_over_time queries with parser stages now route to the slow log-fetch path unless the caller explicitly adds | drop __error__ to opt in to VL's count-all semantics. The opt-in check uses the inner LogQL base query (without outer aggregation or range window brackets) so hasDropErrorOnlyPostParserStage correctly identifies the clause.
Fix 2 (MEDIUM performance): Eliminate O(N²) copy amplification in coldBackwardChunkedFetch. Replaced the append(chunkBody, accumulated...) prepend loop with a slice of chunks reversed and joined once at the end.
Fix 3 (MEDIUM memory): Cap unbounded io.ReadAll in collectRangeMetricHits at 64 MB via the existing readBodyLimited helper to bound RSS on oversized VL error response bodies.
Fix 4 & 5 (documentation): Document optional tenant header behaviour and multi-tenant serial fanout in KNOWN_ISSUES.md. Add a TODO comment in multitenant.go referencing the tracking entry.

Test Plan

go test ./internal/proxy/... passes (1638 tests, 0 failures)
TestQueryRange_RateParserStageTumblingUsesSlowPath verifies parser-stage rate without | drop __error__ uses slow path
TestQueryRange_RateParserStageDropErrTumblingUsesStatsQueryRange verifies | drop __error__ opt-in restores fast path
TestQueryRange_CountOverTimeParserUsesDirectStatsRange verifies grouped count_over_time with | drop __error__, __error_details__ uses stats_query_range
TestCompatHelpers_ParseQuantileAndUnwrapErrorName verifies updated unit assertions for parser-stage slow path

Fix 1 (correctness): Restore __error__ guard to parser-stage metric queries. VL native stats_query_range counts parse-failed lines while Loki excludes them. Queries with parser stages (| json, | logfmt, etc.) now route to the slow log-fetch path unless the caller explicitly opts in via " | drop __error__". The opt-in check uses origSpec.BaseQuery (inner LogQL pipeline without outer aggregation or range window brackets) so hasDropErrorOnlyPostParserStage correctly identifies the drop-error clause. The | math guard for grouped rate queries and a new pre-check for non-| math queries (count_over_time etc.) both apply this fast-path bypass before shouldUseManualRangeMetricCompat is called. Fix 2 (performance): Eliminate O(N²) copy amplification in coldBackwardChunkedFetch. Replaced prepend-into-growing-buffer (accumulated = append(chunkBody, accumulated...)) with a slice of chunks that is reversed once at the end and joined. Fix 3 (memory): Cap unbounded io.ReadAll in collectRangeMetricHits at 64 MB via the existing readBodyLimited helper to bound RSS on oversized VL error bodies. Fix 4 (documentation): Document optional tenant header behaviour in KNOWN_ISSUES.md. Fix 5 (documentation + code): Document multi-tenant serial fanout in KNOWN_ISSUES.md and add a TODO comment in multitenant.go referencing the tracking entry.

github-actions · 2026-05-13T11:59:33Z

PR Quality Report

Compared against base branch main.

Coverage and tests

Signal	Base	PR	Delta
Test count	2506	2507	1
Coverage	87.2%	87.2%	0.0% (stable)

Compatibility

Track	Base	PR	Delta
Loki API	100.0%	11/11 (100.0%)	0.0% (stable)
Logs Drilldown	100.0%	17/17 (100.0%)	0.0% (stable)
VictoriaLogs	100.0%	11/11 (100.0%)	0.0% (stable)

Performance smoke

Lower CPU cost (ns/op) is better. Lower benchmark memory cost (B/op, allocs/op) is better. Higher throughput is better. Lower load-test memory growth is better. Benchmark rows are medians from repeated samples.

Signal	Base	PR	Delta
QueryRange cache-hit CPU cost	1603.0 ns/op	1653.0 ns/op	+3.1% (stable)
QueryRange cache-hit memory	200.0 B/op	200.0 B/op	0.0% (stable)
QueryRange cache-hit allocations	7.0 allocs/op	7.0 allocs/op	0.0% (stable)
QueryRange cache-bypass CPU cost	1916.0 ns/op	1942.0 ns/op	+1.4% (stable)
QueryRange cache-bypass memory	281.0 B/op	282.0 B/op	+0.4% (stable)
QueryRange cache-bypass allocations	7.0 allocs/op	7.0 allocs/op	0.0% (stable)
Labels cache-hit CPU cost	631.8 ns/op	666.5 ns/op	+5.5% (stable)
Labels cache-hit memory	48.0 B/op	48.0 B/op	0.0% (stable)
Labels cache-hit allocations	3.0 allocs/op	3.0 allocs/op	0.0% (stable)
Labels cache-bypass CPU cost	773.2 ns/op	779.0 ns/op	+0.8% (stable)
Labels cache-bypass memory	53.0 B/op	52.0 B/op	-1.9% (stable)
Labels cache-bypass allocations	3.0 allocs/op	3.0 allocs/op	0.0% (stable)

State

Coverage, compatibility, and sampled performance are reported here from the same PR workflow.
This is a delta report, not a release gate by itself. Required checks still decide merge safety.
Performance is a smoke comparison, not a full benchmark lab run.
Delta states use the same noise guards as the quality gate (percent + absolute + low-baseline checks), so report labels match merge-gate behavior.

szibis added 2 commits May 13, 2026 12:24

docs(changelog): add entry for documentation accuracy audit

51bfcd1

github-actions Bot added size/M Medium change scope/proxy Proxy core scope/docs Documentation scope/tests Tests bugfix Bug fix labels May 13, 2026

szibis merged commit 7eef6a9 into main May 13, 2026
48 checks passed

szibis deleted the docs/accuracy-audit-pr branch May 13, 2026 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: address code review issues — parser-stage correctness, memory cap, O(N²) chunked fetch, and docs#355

fix: address code review issues — parser-stage correctness, memory cap, O(N²) chunked fetch, and docs#355
szibis merged 2 commits into
mainfrom
docs/accuracy-audit-pr

szibis commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

szibis commented May 13, 2026

Summary

Test Plan

Uh oh!

github-actions Bot commented May 13, 2026

PR Quality Report

Coverage and tests

Compatibility

Performance smoke

State

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant