Skip to content

feat(EC-1816): add multi-component stress benchmark#3331

Open
dheerajodha wants to merge 4 commits into
conforma:mainfrom
dheerajodha:EC-1816
Open

feat(EC-1816): add multi-component stress benchmark#3331
dheerajodha wants to merge 4 commits into
conforma:mainfrom
dheerajodha:EC-1816

Conversation

@dheerajodha

Copy link
Copy Markdown
Contributor
  • Adds a stress benchmark under benchmark/stress/ that validates a multi-component snapshot with 35 workers, simulating the workload that caused the OOM incident (EC-1805)
  • Component count (EC_STRESS_COMPONENTS, default 10) and worker count (EC_STRESS_WORKERS, default 35) are parameterized via env vars for CI tuning
  • Reuses existing benchmark infrastructure (benchmark/internal/suite, registry, untar) and the same golden-container image data, duplicated across components at runtime

Resolves: EC-1816

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 3249f5c6-bf15-4b84-9b7f-b909fa477858

📥 Commits

Reviewing files that changed from the base of the PR and between e99c4c3 and bd9ea39.

📒 Files selected for processing (4)
  • benchmark/README.md
  • benchmark/stress/prepare_data.sh
  • benchmark/stress/push_data.sh
  • benchmark/stress/stress.go
✅ Files skipped from review due to trivial changes (1)
  • benchmark/README.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • benchmark/stress/prepare_data.sh
  • benchmark/stress/push_data.sh
  • benchmark/stress/stress.go

📝 Walkthrough

Walkthrough

The PR adds stress benchmark data preparation and publish scripts, a new stress benchmark runner that loads archived data and executes validation workloads, and README documentation for the new benchmark entry.

Changes

Stress benchmark infrastructure

Layer / File(s) Summary
Archive workflow
benchmark/stress/prepare_data.sh, benchmark/stress/push_data.sh, benchmark/README.md
prepare_data.sh pulls stress-v1 from Quay or regenerates data.tar.gz locally; push_data.sh verifies data.tar.gz before pushing it to quay.io/conforma/benchmark-data:stress-v1; benchmark/README.md adds the stress benchmark entry and environment-variable guidance.
Benchmark setup and snapshot
benchmark/stress/stress.go
Adds the Stress benchmark entry point, environment parsing for EC_STRESS_COMPONENTS and EC_STRESS_WORKERS, archive extraction, registry startup and cleanup, and JSON snapshot generation for repeated golden-container components.
Policy execution and parallel run
benchmark/stress/stress.go
Builds the execution policy JSON, invokes suite.Execute with validate/image, snapshot, worker count, and fixed timing inputs, and wraps each benchmark iteration with driver.Parallel.

Estimated code review effort: 2 (Simple) | ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding a multi-component stress benchmark.
Description check ✅ Passed The description matches the changeset and accurately describes the new stress benchmark and its configuration.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 4, 2026

Copy link
Copy Markdown

Looks good to me

Previous run

Review

Findings

Low

  • [edge-case] benchmark/stress/stress.go:38 — The envInt function panics on invalid or non-positive environment variable values. This is consistent with the established pattern across all benchmark code (simple/simple.go uses panic identically, as does internal/untar/untar.go), so no change is required. A future improvement could use fmt.Fprintf(os.Stderr, ...) + os.Exit(1) for marginally better user experience on configuration errors.

Labels: PR adds Go benchmark/testing infrastructure under benchmark/stress/.

Previous run (2)

Review

Findings

Medium

  • [stale-reference] benchmark/stress/stress.go:97 — The git source URL for golden-container uses the old organization name enterprise-contract (https://github.com/enterprise-contract/golden-container.git) while the rest of the codebase (simple benchmark, hack scripts) has migrated to https://github.com/conforma/golden-container. The git revision 8327c1ce7472b017b9396fe26d5d5e1ed0eb61cc also differs from the simple benchmark's 2dec8f515a64ef2f21ee3e7b1ed41da77a5c5a9a, suggesting it may reference a commit in the old repo that could become unavailable if the old repo is archived.
    Remediation: Use https://github.com/conforma/golden-container to match the existing simple benchmark pattern, and verify the revision hash exists in the conforma fork.

Low

  • [edge-case] benchmark/stress/stress.go:42 — The envInt function panics on values < 1 but does not guard against unreasonably large values. For EC_STRESS_COMPONENTS, an extremely large value would cause buildSnapshot to allocate a massive slice, likely causing an OOM before the benchmark runs. Minor robustness concern since this is a developer tool.

  • [incomplete-doc] benchmark/README.md — The benchmark README describes benchmarks generically but doesn't mention the new stress benchmark, its env vars (EC_STRESS_COMPONENTS, EC_STRESS_WORKERS), or its distinct purpose of simulating multi-component workloads.

Info

  • [pattern-violation] benchmark/stress/stress.go:119 — The policy JSON string in the stress benchmark is well-formed JSON (no trailing commas), while the simple benchmark's policy string contains trailing commas (invalid JSON). The stress benchmark is more correct here, but the inconsistency between benchmarks is notable.
Previous run (3)

Review

Findings

Low

  • [resource-leak] benchmark/stress/stress.go:68 — In setup(), if registry.Launch() fails, the temporary directory created by untar.UnTar() is never cleaned up. The panic exits without removing the temp dir. This matches the existing pattern in benchmark/simple/simple.go and the OS reclaims the directory on process exit, so practical risk is minimal.
    Remediation: Call os.RemoveAll(dir) before panicking on registry.Launch failure, or defer cleanup unconditionally.

  • [missing-input-validation] benchmark/stress/stress.go:52envInt accepts zero and negative values for EC_STRESS_COMPONENTS and EC_STRESS_WORKERS. Setting these to zero or negative values could produce confusing benchmark results.
    Remediation: Add a check that the returned value is at least 1.

Info

  • [sub-agent-failure] The style-conventions, intent-coherence, and docs-currency sub-agents did not return findings: model claude-sonnet-4-5@20250929 not available on the deployment. These are sonnet-tier dimensions; correctness (opus) completed successfully.
Previous run (4)

Review

Findings

Low

  • [edge-case] benchmark/stress/stress.go:56envInt does not validate that the returned integer is positive. Setting EC_STRESS_COMPONENTS=0 produces a snapshot with zero components, and EC_STRESS_WORKERS=0 or a negative value is passed directly to --workers. These degenerate inputs silently produce meaningless benchmark results rather than failing fast. Consider adding a lower-bound check (e.g., if n < 1 { panic(...) }).

Info

  • [design-direction] benchmark/stress/ — The stress benchmark is well-placed in the benchmark infrastructure alongside the existing simple benchmark. Both use golang.org/x/benchmarks/driver for performance measurement. If the goal evolves toward OOM reproduction/regression testing specifically, acceptance tests could complement this benchmark.
  • [sub-agent-gap] The style-conventions sub-agent could not access PR branch files. Manual inspection confirms the stress benchmark follows the established patterns from benchmark/simple/: identical setup()/Closer lifecycle, same driver.Parallel(n, 1, fn) shape, consistent license headers, and proper reuse of benchmark/internal/ packages.

fullsend-ai-review[bot]

This comment was marked as outdated.

@fullsend-ai-review fullsend-ai-review Bot added the ready-for-merge All reviewers approved — ready to merge label Jun 4, 2026
@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 80 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
benchmark/stress/stress.go 0.00% 80 Missing ⚠️
Flag Coverage Δ
acceptance 53.44% <ø> (+<0.01%) ⬆️
generative 16.68% <0.00%> (-0.11%) ⬇️
integration 27.49% <0.00%> (-0.18%) ⬇️
unit 68.69% <0.00%> (-0.44%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
benchmark/stress/stress.go 0.00% <0.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@fullsend-ai-review

fullsend-ai-review Bot commented Jun 11, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 12:58 PM UTC · Completed 1:06 PM UTC
Commit: 47d3320 · View workflow run →

fullsend-ai-review[bot]

This comment was marked as outdated.

@fullsend-ai-review fullsend-ai-review Bot added ready-for-merge All reviewers approved — ready to merge and removed ready-for-merge All reviewers approved — ready to merge labels Jun 11, 2026
@github-actions github-actions Bot added size: XL and removed size: L labels Jun 16, 2026
@fullsend-ai-review

Copy link
Copy Markdown

🤖 Review · Started 12:25 PM UTC
Commit: 47d3320 · View workflow run →

@dheerajodha dheerajodha marked this pull request as ready for review June 16, 2026 12:32
@fullsend-ai-review

fullsend-ai-review Bot commented Jun 16, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 12:34 PM UTC · Completed 12:44 PM UTC
Commit: 47d3320 · View workflow run →

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmark/stress/prepare_data.sh`:
- Around line 30-36: The oras pull command suppresses error output with
2>/dev/null and the script always falls back to regenerating from upstream on
failure, making CI runs non-reproducible and hiding infrastructure issues.
Remove the error suppression (2>/dev/null) from the oras pull command on line 30
and restructure the logic so that if the oras pull fails, the script exits with
an error rather than continuing to the regeneration fallback. This ensures
benchmark input remains deterministic and surfaces any Quay or authentication
failures instead of silently working around them.

In `@benchmark/stress/stress.go`:
- Around line 26-38: The imports in the stress.go file are not properly ordered
according to the gci formatting standards. Run the project's Go import
formatting tool (typically gci write or go fmt) on the stress.go file to
automatically reorder the imports into the correct grouping: standard library
imports first, followed by blank line, then third-party imports (like
golang.org/x/benchmarks), followed by blank line, then local package imports
(like github.com/conforma/cli). This will resolve the gci formatting check
failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 03e32d5c-c048-4271-92dd-eba0be016eaa

📥 Commits

Reviewing files that changed from the base of the PR and between c6df9ad and aa42a5a.

📒 Files selected for processing (3)
  • benchmark/stress/prepare_data.sh
  • benchmark/stress/push_data.sh
  • benchmark/stress/stress.go

Comment thread benchmark/stress/prepare_data.sh
Comment thread benchmark/stress/stress.go
fullsend-ai-review[bot]

This comment was marked as outdated.

@fullsend-ai-review fullsend-ai-review Bot added requires-manual-review Review requires human judgment and removed ready-for-merge All reviewers approved — ready to merge labels Jun 16, 2026
st3penta
st3penta previously approved these changes Jun 22, 2026

@st3penta st3penta left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one comment, but overall LGTM

Comment thread benchmark/stress/stress.go Outdated
@fullsend-ai-review

fullsend-ai-review Bot commented Jun 30, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 1:06 PM UTC · Completed 1:18 PM UTC
Commit: 47d3320 · View workflow run →

fullsend-ai-review[bot]

This comment was marked as outdated.

@fullsend-ai-review fullsend-ai-review Bot added ready-for-merge All reviewers approved — ready to merge go Pull requests that update Go code testing and removed requires-manual-review Review requires human judgment labels Jun 30, 2026
Add a stress benchmark under benchmark/stress/ that validates a
multi-component snapshot with configurable worker count, simulating
real-world release pipeline workloads that caused OOM (EC-1805).

- Component count controlled via EC_STRESS_COMPONENTS (default 10)
- Worker count controlled via EC_STRESS_WORKERS (default 35)
- Uses the same golden-container image as the simple benchmark,
  duplicated across components at runtime
- Reuses the existing benchmark/internal/suite harness
- Includes prepare_data.sh to regenerate offline data archive
- Automatically supported by make benchmark_stress via Makefile
  wildcard rules

Resolves: EC-1816

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dheerajodha and others added 3 commits July 2, 2026 19:14
Pull pre-built data.tar.gz from quay.io/conforma/benchmark-data
in prepare_data.sh, falling back to upstream regeneration. Add
push_data.sh for uploading the archive.

Resolves: EC-1816

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reject zero and negative values for EC_STRESS_COMPONENTS and
EC_STRESS_WORKERS to fail fast instead of producing meaningless
benchmark results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace deprecated --json-input with --images, add benchmark listing
to README.

Resolves: EC-1816

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dheerajodha

Copy link
Copy Markdown
Contributor Author

I pushed nothing new in this PR; I had to rebase it to get the CVE fixes in.

@fullsend-ai-review

fullsend-ai-review Bot commented Jul 2, 2026

Copy link
Copy Markdown

🤖 Finished Review · ✅ Success · Started 1:48 PM UTC · Completed 2:00 PM UTC
Commit: 47d3320 · View workflow run →

}

type source struct {
Git gitSource `json:"git"`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] edge-case

The buildSnapshot function names components as 'golden-container-0', 'golden-container-1', etc. When EC_STRESS_COMPONENTS=1, this produces 'golden-container-0', while the simple benchmark names its single component 'golden-container' (no index suffix). Functionally correct but worth noting for result comparison consistency between benchmarks.

@fullsend-ai-review fullsend-ai-review Bot added ready-for-merge All reviewers approved — ready to merge and removed ready-for-merge All reviewers approved — ready to merge labels Jul 2, 2026
@dheerajodha

Copy link
Copy Markdown
Contributor Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go Pull requests that update Go code ready-for-merge All reviewers approved — ready to merge size: XL testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants