Skip to content

fix(e2e): log-generator always-on, 5× more volume for stable pattern clustering#360

Open
szibis wants to merge 1 commit into
mainfrom
fix/e2e-patterns-volume
Open

fix(e2e): log-generator always-on, 5× more volume for stable pattern clustering#360
szibis wants to merge 1 commit into
mainfrom
fix/e2e-patterns-volume

Conversation

@szibis
Copy link
Copy Markdown
Collaborator

@szibis szibis commented May 14, 2026

Summary

  • Remove profiles: ["ui"] — log-generator now starts with plain docker-compose up -d. It was gated behind a profile, so stacks started without --profile ui had zero data flowing → no patterns in Drilldown ever
  • LOG_INTERVAL 10→2, LOG_BATCH 8→15 (~8.6 lines/s → ~75 lines/s): the pattern miner windows a 1h range into ~96 buckets of ~37s each; at the old rate each bucket had ~30 lines total (too sparse for stable clustering), at the new rate each bucket has ~2,775 lines — enough for consistent multi-window acceptance and a continuous Drilldown timeline instead of sparse isolated points
  • Add GOMEMLIMIT=2GiB to loki-vl-proxy-patterns-autodetect, consistent with other proxy variants in the stack

Root cause

The pattern miner requires ~200 lines per ~37s window to form stable clusters. At 8.6 lines/s the generator produced only ~30 lines per window — windowAccepted stayed near 0 and the proxy fell through to the persistence snapshot fallback. On a cold stack with no snapshot yet, the response was empty data: []. Persistence only saves after a successful clustering run, so cold stacks were permanently stuck until manually primed.

Test Plan

  • docker-compose up -d (no --profile ui) — log-generator starts automatically
  • After ~2 minutes, patterns appear in Grafana Logs Drilldown (port 3002) for the patterns-autodetect datasource
  • Pattern timeline is continuous/linear, not sparse scattered points
  • Restart loki-vl-proxy-patterns-autodetect — patterns load from /cache/patterns-snapshot.json within 30s

…clustering

- Remove profiles:["ui"] — log-generator now starts with default compose up,
  patterns in Drilldown require continuous data flow so gating it behind a
  profile meant cold stacks never accumulated enough volume for clustering
- LOG_INTERVAL 10→2, LOG_BATCH 8→15: ~75 lines/s total vs ~8.6 before.
  The pattern miner windows a 1h range into ~96 buckets of ~37s each;
  at the old rate each bucket had ~30 lines (too sparse for stable clusters),
  at the new rate each bucket has ~2775 lines — enough for consistent
  multi-window cluster acceptance and a continuous Drilldown timeline
- Add GOMEMLIMIT=2GiB to loki-vl-proxy-patterns-autodetect, consistent
  with other proxy variants in the stack
@github-actions github-actions Bot added size/XS Extra small change bugfix Bug fix labels May 14, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR Quality Report

Compared against base branch main.

Coverage and tests

Signal Base PR Delta
Test count 2524 2524 0
Coverage 87.2% 87.2% 0.0% (stable)

Compatibility

Track Base PR Delta
Loki API 100.0% 11/11 (100.0%) 0.0% (stable)
Logs Drilldown 100.0% 17/17 (100.0%) 0.0% (stable)
VictoriaLogs 100.0% 11/11 (100.0%) 0.0% (stable)

Performance smoke

Lower CPU cost (ns/op) is better. Lower benchmark memory cost (B/op, allocs/op) is better. Higher throughput is better. Lower load-test memory growth is better. Benchmark rows are medians from repeated samples.

Signal Base PR Delta
QueryRange cache-hit CPU cost 1770.0 ns/op 1788.0 ns/op +1.0% (stable)
QueryRange cache-hit memory 200.0 B/op 200.0 B/op 0.0% (stable)
QueryRange cache-hit allocations 7.0 allocs/op 7.0 allocs/op 0.0% (stable)
QueryRange cache-bypass CPU cost 2030.0 ns/op 2055.0 ns/op +1.2% (stable)
QueryRange cache-bypass memory 286.0 B/op 288.0 B/op +0.7% (stable)
QueryRange cache-bypass allocations 7.0 allocs/op 7.0 allocs/op 0.0% (stable)
Labels cache-hit CPU cost 675.5 ns/op 691.7 ns/op +2.4% (stable)
Labels cache-hit memory 48.0 B/op 48.0 B/op 0.0% (stable)
Labels cache-hit allocations 3.0 allocs/op 3.0 allocs/op 0.0% (stable)
Labels cache-bypass CPU cost 806.2 ns/op 842.2 ns/op +4.5% (stable)
Labels cache-bypass memory 53.0 B/op 53.0 B/op 0.0% (stable)
Labels cache-bypass allocations 3.0 allocs/op 3.0 allocs/op 0.0% (stable)

State

  • Coverage, compatibility, and sampled performance are reported here from the same PR workflow.
  • This is a delta report, not a release gate by itself. Required checks still decide merge safety.
  • Performance is a smoke comparison, not a full benchmark lab run.
  • Delta states use the same noise guards as the quality gate (percent + absolute + low-baseline checks), so report labels match merge-gate behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Bug fix size/XS Extra small change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant