Skip to content

TRT-2633: Link symptoms to triage records with summaries and filtering#3497

Open
smg247 wants to merge 3 commits intoopenshift:mainfrom
smg247:triage-symptoms
Open

TRT-2633: Link symptoms to triage records with summaries and filtering#3497
smg247 wants to merge 3 commits intoopenshift:mainfrom
smg247:triage-symptoms

Conversation

@smg247
Copy link
Copy Markdown
Member

@smg247 smg247 commented May 5, 2026

Surfaces job run symptoms on triage detail pages by syncing them into a
triage_symptoms junction table during regression cache loading, then
exposing per-symptom summaries (with regression IDs and job run counts)
via the expand=symptoms query parameter. Adds frontend symptom chips on
regression rows and a filterable symptom summary panel. Includes unit
and e2e tests for symptom syncing, API expansion, and cascade deletion.

Triage Details page with Symptoms:
Screenshot 2026-05-05 at 7 31 37 PM

Filtered:
Screenshot 2026-05-05 at 7 31 51 PM

Audit logging was left out of this effort. In the future, symptoms might be manually added to the triage. At that time, we would want to consider how to handle the audit logging (likely similar to what we do with regressions).

Summary by CodeRabbit

  • New Features

    • Symptoms are now tracked and persisted on triage records; API returns per-symptom summaries with regression counts, job-run counts and percentages. Frontend shows a "Symptoms" section, per-regression symptom chips, and symptom-based filtering.
  • Chores

    • DB migrations and schema updates to store symptom associations and enable cascade deletes; background sync added to compute and upsert symptom counts idempotently.
  • Tests

    • New unit and e2e tests for symptom propagation, syncing/idempotency, API expansion, UI filtering, and cascade-delete behavior.

  Surfaces job run symptoms on triage detail pages by syncing them into a
  triage_symptoms junction table during regression cache loading, then
  exposing per-symptom summaries (with regression IDs and job run counts)
  via the expand=symptoms query parameter. Adds frontend symptom chips on
  regression rows and a filterable symptom summary panel. Includes unit
  and e2e tests for symptom syncing, API expansion, and cascade deletion.
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 5, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented May 5, 2026

@smg247: This pull request references TRT-2633 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Surfaces job run symptoms on triage detail pages by syncing them into a
triage_symptoms junction table during regression cache loading, then
exposing per-symptom summaries (with regression IDs and job run counts)
via the expand=symptoms query parameter. Adds frontend symptom chips on
regression rows and a filterable symptom summary panel. Includes unit
and e2e tests for symptom syncing, API expansion, and cascade deletion.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

Walkthrough

Adds end-to-end symptom tracking: BigQuery collects job symptom IDs, pipeline carries them into RegressionJobRun records, a new triage_symptoms junction table stores per-triage/per-regression symptom counts via an idempotent SyncTriageSymptoms, API exposes symptom summaries, and the frontend shows a Symptoms section with filtering and per-regression chips.

Changes

Symptom Association Feature

Layer / File(s) Summary
Documentation & Planning
docs/plans/trt-2633-triage-symptoms.md
Specification describing BigQuery→pipeline→DB→API→frontend changes, migration notes, cascade rules, and a test plan.
Data Shape — Types & Models
pkg/apis/api/componentreport/crstatus/types.go, pkg/apis/api/componentreport/testdetails/types.go, pkg/db/models/triage.go
Add transient JobSymptoms to TestJobRunRows and JobRunStats; add RegressionJobRun.JobSymptoms pq.StringArray; new TriageSymptom model and Triage.TriageSymptoms relationship.
BigQuery Query & Deserialization
pkg/api/componentreadiness/dataprovider/bigquery/querygenerators.go
BigQuery subquery aggregates distinct non-empty symptom_id into job_symptoms; deserializer splits comma-separated column into cts.JobSymptoms.
Pipeline Propagation
pkg/api/componentreadiness/test_details.go, pkg/api/componentreadiness/regressiontracker.go
getJobRunStats populates JobRunStats.JobSymptoms; FailedJobRunsFromTestDetails and MergeJobRuns persist JobSymptoms on created/updated RegressionJobRun (Assign on upsert).
Triage-Symptom Synchronization
pkg/api/componentreadiness/regressiontracker.go
New RegressionStore.SyncTriageSymptoms([]*models.TestRegression) interface method and Postgres implementation: preload regressions with triages & job runs, dedupe symptoms per job run, compute per-triage/per-symptom job_run_count, and upsert triage_symptoms idempotently.
Regression Cache Loader Integration
pkg/dataloader/regressioncacheloader/regressioncacheloader.go
After processing releases, loader collects active regression IDs and invokes SyncTriageSymptoms, appending any errors to loader error state.
Schema Migration & Constraints
pkg/db/db.go
Add &models.TriageSymptom{} to migrations; introduce ensureTriageSymptomCascade to create ON DELETE CASCADE FKs for triage_symptoms and remove previous backfill step.
Triage Symptom Summaries API
pkg/api/componentreadiness/triage.go
Add TriageSymptomSummary type and GetTriageSymptomSummaries() to aggregate counts from triage_symptoms, load symptom metadata, assemble regression ID lists, compute percentages, and sort by regression count.
Triage Expansion Endpoint
pkg/sippyserver/server.go
jsonGetTriageByID accepts comma-separated expand fields (e.g., regressions,symptoms); ExpandedTriage gains SymptomSummaries and conditionally populates summaries and/or regressed tests.
Frontend — Utilities & UI
sippy-ng/src/component_readiness/CompReadyUtils.js, sippy-ng/src/component_readiness/Triage.js, sippy-ng/src/component_readiness/TriageSymptoms.js, sippy-ng/src/component_readiness/TriagedRegressionTestList.js
Add symptomColor(symptomId) utility; Triage page requests expand=regressions,symptoms, holds symptomFilter state, renders TriageSymptoms component; TriageSymptoms displays symptom list, counts, percentages, and filter toggle; TriagedRegressionTestList adds optional Symptoms column, builds regressionSymptomMap, and filters regressions by symptomFilter.
Tests & Helpers
pkg/api/componentreadiness/regressiontracker_test.go, test/e2e/.../regressiontracker_test.go, test/e2e/.../triageapi_test.go, test/e2e/util/db.go
Unit tests updated for JobSymptoms propagation; new E2E tests for SyncTriageSymptoms, idempotency, JobRunCount accuracy, multi-symptom support, API expand=symptoms behavior, and cascade deletion. Added test helpers SeedSymptom, CleanupSymptoms, CleanupTriageSymptoms.

Sequence Diagram

sequenceDiagram
    participant BQ as BigQuery
    participant Pipeline as Regression Pipeline
    participant Store as Regression Store
    participant Cache as Regression Cache Loader
    participant API as Triage API
    participant FE as Frontend UI

    BQ->>BQ: Aggregate distinct symptom_ids into\njob_symptoms (comma-separated)
    BQ->>Pipeline: Return job_symptoms column
    Pipeline->>Pipeline: Deserialize job_symptoms -> JobRunStats.JobSymptoms
    Pipeline->>Pipeline: Include JobSymptoms in RegressionJobRun
    Pipeline->>Store: MergeJobRuns (upsert) with JobSymptoms
    Cache->>Store: Call SyncTriageSymptoms(active regressions)
    Store->>Store: Load regressions with triages & job runs
    Store->>Store: For each regression: dedupe symptoms per run, count job_run occurrences per symptom
    Store->>Store: Upsert triage_symptoms rows (idempotent)
    API->>Store: GetTriageSymptomSummaries on expand=symptoms
    Store->>API: Return symptom summaries with counts and regression IDs
    API->>FE: Respond with triage including symptom_summaries
    FE->>FE: Render Symptoms section, allow filtering, render symptom chips per regression
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 12 | ❌ 5

❌ Failed checks (4 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.31% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Go Error Handling ⚠️ Warning CleanupSymptoms in test/e2e/util/db.go ignores database errors without checking them, violating the requirement to never ignore returned errors without justification. Add error checking to CleanupSymptoms to capture and check the delete result, logging errors like CleanupTriageSymptoms (lines 53-58) does.
Excessive Css In React Should Use Styles ⚠️ Warning TriagedRegressionTestList.js has inline sx with 4 style properties on Chip in renderCell, exceeding the acceptable 3-4 property threshold. Extract Chip sx styling to useStyles hook for maintainability and consistency with established patterns in the component_readiness codebase.
Test Coverage For New Features ⚠️ Warning GetTriageSymptomSummaries lacks unit tests. Complex function with DB queries, calculations, and edge cases has only indirect e2e coverage. CodeRabbit reviewer recommended unit tests. Add unit tests for GetTriageSymptomSummaries covering: totalRegressions==0, percentage calculation, sort order, missing metadata, empty results. Consider React component tests for TriageSymptoms.
Test Structure And Quality ❓ Inconclusive Custom check specifies Ginkgo test quality requirements, but PR contains only standard Go tests using t.Run() subtests with testify, not Ginkgo patterns. PR uses standard Go tests. Tests show single responsibility, proper defer cleanup. Assertion message coverage partial (62% vs 100% goal). No cluster timeouts needed. Consistent with existing patterns.
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: linking symptoms to triage records with public API summaries and frontend filtering capabilities.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Sql Injection Prevention ✅ Passed SQL queries use proper parameterization (? for GORM, @Name for BigQuery). BigQuery variant values are sanitized with param.Cleanse(). Hardcoded constraint names in db migrations avoid injection risks.
Single Responsibility And Clear Naming ✅ Passed Context-specific packages, focused structs, action-oriented method names. No generic Manager/Util names. Appropriate complexity with proper responsibility delegation throughout.
Stable And Deterministic Test Names ✅ Passed No Ginkgo tests are present in this PR. All tests added use Go's standard t.Run() pattern. The custom check about "Ginkgo test names" is not applicable to this pull request.
Microshift Test Compatibility ✅ Passed New tests use standard Go testing.T (not Ginkgo), test only Sippy application logic, no cluster APIs used. Check not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The new e2e tests are database and API integration tests, not OpenShift Ginkgo tests. They use standard Go testing, operate on local PostgreSQL, and have no multi-node or SNO compatibility concerns.
Topology-Aware Scheduling Compatibility ✅ Passed This PR does not introduce Kubernetes manifests, operator code, or controllers. All changes are to backend API, database models, frontend UI, and tests. No scheduling constraints are present.
Ote Binary Stdout Contract ✅ Passed No OTE stdout contract violations found. Sippy is a web server, not an OTE binary. No process-level stdout writes in any modified code. Logging uses logrus (outputs to stderr).
Ipv6 And Disconnected Network Test Compatibility ✅ Passed New tests are Go unit tests with local PostgreSQL database only. No IPv4 hardcoding, external connectivity, or IPv6/disconnected environment incompatibilities found.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.1)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from dgoodwin and sosiouxme May 5, 2026 18:25
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 5, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: smg247

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 5, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/api/componentreadiness/regressiontracker.go`:
- Around line 113-119: The DB upsert currently only Assigns JobLabels and
JobSymptoms, leaving other fields stale; change the upsert so it assigns the
full models.RegressionJobRun from the incoming jobRuns[i] (or explicitly include
ProwJobURL, StartTime, TestFailed, TestFailures, etc.) instead of the partial
struct so FirstOrCreate updates all metadata for an existing prow_job_run_id;
locate the call using
prs.dbc.DB.Where(...).Assign(...).FirstOrCreate(&jobRuns[i]) and replace the
Assign payload with the complete jobRuns[i] data (or use Updates with
jobRuns[i]) to refresh the whole record on re-merge.
- Around line 151-181: During the recount loop you currently only upsert
observed TriageSymptom rows (models.TriageSymptom) but never remove obsolete
rows, which leaves stale triage_symptoms visible in GetTriageSymptomSummaries;
fix by computing the observed symptom IDs per regression/triage and deleting any
db rows that are not in that observed set before/after the FirstOrCreate/update
step. Concretely, inside the regs loop (use reg.ID and triage.ID) build a slice
or set of observed symptomIDs from symptomCounts for that regression, then run
prs.dbc.DB.Where("regression_id = ? AND triage_id = ? AND symptom_id NOT IN
(?)", reg.ID, triage.ID, observedSlice).Delete(&models.TriageSymptom{}) (or
handle empty observedSlice by deleting all for that triage/regression); then
continue with the existing upsert/update logic so only current symptoms remain.

In `@pkg/sippyserver/server.go`:
- Around line 1712-1718: The handler currently logs errors from
componentreadiness.GetTriageSymptomSummaries and continues, causing callers to
receive a 200 with missing data; update the expandFields["symptoms"] branch to
propagate the error instead of swallowing it: when
componentreadiness.GetTriageSymptomSummaries(s.db, triage.ID,
len(triage.Regressions)) returns an err, return that error (or wrap it with
context about triage.ID and the expand key) from the surrounding function so
callers see a failure; only set et.SymptomSummaries when no error is returned.

In `@sippy-ng/src/component_readiness/TriageSymptoms.js`:
- Around line 82-99: The IconButton is currently an unnamed icon-only toggle;
add an accessible name and state by passing an aria-label and aria-pressed prop:
set aria-label to something descriptive like `Filter regressions to
${ss.symptom.name || ss.symptom.id}` and set aria-pressed={symptomFilter ===
ss.symptom.id}; keep the Tooltip but do not rely on it for accessibility, and
preserve the existing onClick and color logic (symbols: IconButton,
setSymptomFilter, symptomFilter, ss.symptom.id, FilterList).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 965165f9-7ba0-4667-a4ff-c2ddf229f4fb

📥 Commits

Reviewing files that changed from the base of the PR and between 7ad8b10 and bc991a8.

📒 Files selected for processing (18)
  • docs/plans/trt-2633-triage-symptoms.md
  • pkg/api/componentreadiness/dataprovider/bigquery/querygenerators.go
  • pkg/api/componentreadiness/regressiontracker.go
  • pkg/api/componentreadiness/regressiontracker_test.go
  • pkg/api/componentreadiness/test_details.go
  • pkg/api/componentreadiness/triage.go
  • pkg/apis/api/componentreport/crstatus/types.go
  • pkg/apis/api/componentreport/testdetails/types.go
  • pkg/dataloader/regressioncacheloader/regressioncacheloader.go
  • pkg/db/db.go
  • pkg/db/models/triage.go
  • pkg/sippyserver/server.go
  • sippy-ng/src/component_readiness/CompReadyUtils.js
  • sippy-ng/src/component_readiness/Triage.js
  • sippy-ng/src/component_readiness/TriageSymptoms.js
  • sippy-ng/src/component_readiness/TriagedRegressionTestList.js
  • test/e2e/componentreadiness/regressiontracker/regressiontracker_test.go
  • test/e2e/componentreadiness/triage/triageapi_test.go

Comment thread pkg/api/componentreadiness/regressiontracker.go
Comment thread pkg/api/componentreadiness/regressiontracker.go
Comment thread pkg/sippyserver/server.go
Comment thread sippy-ng/src/component_readiness/TriageSymptoms.js
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

   with ON DELETE CASCADE so deleting a symptom definition cleans up
   associated triage_symptoms rows. Also fix error propagation for symptom
   summary lookups and add accessibility attributes to the symptom filter
   button.
@openshift-merge-bot openshift-merge-bot Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label May 5, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

Copy link
Copy Markdown
Member

@stbenjam stbenjam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Panel Verdict

Disposition: REQUEST_CHANGES — Two data integrity gaps around stale/orphaned triage_symptoms rows require fixes before merge.


Specialist Findings

Architecture Reviewer:

  • SUGGESTION — SyncTriageSymptoms uses FirstOrCreate + separate Update instead of the Assign + FirstOrCreate pattern already used by MergeJobRuns in the same file, doubling DB round-trips per symptom (regressiontracker.go:165-180).
  • SUGGESTION — Preload("JobRuns") for all active regressions could load tens of thousands of rows into memory at once (regressiontracker.go:142-146). Consider batching or a SQL-based aggregation.
  • SUGGESTION — GetTriageSymptomSummaries makes 3 DB queries where 2 would suffice — query #3 re-fetches rows already partially aggregated in query #1 (triage.go:878-941).
  • NOTE — No FK constraint from triage_symptoms.regression_id to test_regressions.id, risking orphaned rows on regression deletion.
  • NOTE — regressionSymptomMap is referenced before its declaration in TriagedRegressionTestList.js (works at runtime due to closure semantics but is fragile).
  • NOTE — Removal of backfillClosedRegressionViews is unrelated cleanup bundled into this PR.
  • NOTE — Overall architecture is sound, follows existing patterns well. Error handling is correct throughout.

Security & Supply Chain Reviewer:

  • SUGGESTION — TriageSymptomSummary embeds the full jobrunscan.Symptom model, exposing internal matching rules (MatchString, FilePattern, MatcherType) in API responses. Consider a slimmed-down DTO with just ID and Summary (triage.go:868).
  • NOTE — expand parameter silently ignores unknown values — safe but could mask typos.
  • NOTE — Error messages include internal details, consistent with existing codebase patterns.
  • NOTE — No dependency changes, no new external calls, no injection vectors, no XSS concerns. Supply chain is clean.

UX & API Reviewer:

  • SUGGESTION — No validation of invalid expand values; ?expand=symtpoms (typo) silently returns no symptom data (server.go:1680-1684).
  • SUGGESTION — Stale triage_symptoms rows are never cleaned up, leading to inaccurate data over time when symptoms disappear from job runs (regressiontracker.go:131-184).
  • SUGGESTION — TriageSymptoms field on the Triage model has json:"triage_symptoms,omitempty" — raw junction rows could leak into API responses if someone adds preloading. Consider json:"-" (models/triage.go:52).
  • SUGGESTION — Tooltip on the "Symptoms" heading could be more discoverable with an info icon (TriageSymptoms.js:26-33).
  • NOTE — Symptom chip truncation at 12 characters is arbitrary; CSS text-overflow might adapt better.
  • NOTE — Good accessibility practices with aria-label and aria-pressed. Deterministic symptomColor is a good design choice.

Codebase Consistency Reviewer:

  • SUGGESTION — seedSymptom and cleanupTriageSymptoms test helpers are duplicated across two e2e packages with inconsistent implementations (one uses FirstOrCreate, the other uses Create).
  • SUGGESTION — Duplicate imports from './CompReadyUtils' in TriagedRegressionTestList.js (lines 5 and 8) — should be combined into one.
  • SUGGESTION — SyncTriageSymptoms uses two-query upsert instead of the Assign pattern established by MergeJobRuns in the same file.
  • SUGGESTION — RegressedTests JSON tag missing omitempty, inconsistent with SymptomSummaries which has it (server.go:1672).
  • NOTE — sort.Slice used instead of slices.SortFunc despite slices already being imported (triage.go:938).
  • NOTE — Overall strong consistency with existing patterns in GORM models, error wrapping, BigQuery queries, logging, and frontend conventions.

QA Engineer:

  • BLOCKING — No unit test for GetTriageSymptomSummaries, a non-trivial exported function with aggregation logic, silent-skip behavior, and percentage computation (triage.go:878).
  • BLOCKING — SyncTriageSymptoms never removes stale junction rows. No test covers the scenario where a symptom disappears from all job runs after a sync (regressiontracker.go:128-184).
  • SUGGESTION — No test for expand parameter with unknown/invalid values.
  • SUGGESTION — No test for symptomColor utility function (CompReadyUtils.js:914-920).
  • SUGGESTION — No test for duplicate symptom IDs within a single job run's JobSymptoms array.
  • NOTE — LinearProgress percentage could theoretically exceed 100% if data is inconsistent. Consider clamping.
  • NOTE — No frontend tests for TriageSymptoms component, consistent with existing codebase (no existing tests for these components).

Devil's Advocate:

  • BLOCKING — No FK from triage_symptoms.regression_id to test_regressions.id. When a regression is deleted, orphaned triage_symptoms rows persist and inflate symptom counts. When a regression is unlinked from a triage, same problem (models/triage.go:271-276).
  • BLOCKING — SyncTriageSymptoms never deletes stale junction rows. If BigQuery returns different symptoms for a job run on a subsequent query (e.g., symptom definition deleted, BQ data corrected), old rows persist with stale counts (regressiontracker.go:131-184).
  • SUGGESTION — N+1 DB queries: 200 regressions * 3 triages * 5 symptoms * 2 ops = 6000 round-trips per loader run. Use bulk ON CONFLICT ... DO UPDATE.
  • SUGGESTION — Preload("JobRuns") loads all job runs into memory. Consider SQL-based aggregation to avoid the memory spike.
  • SUGGESTION — backfillClosedRegressionViews removal is unrelated; confirm it has run in all environments.
  • NOTE — SyncTriageSymptoms includes regressions from errored releases — intentional but symptom counts may be understated.

Technical Writer:

  • SUGGESTION — Plan doc states symptoms are "also computed when regressions is requested" but the implementation handles them independently; frontend explicitly requests both (docs/plans/trt-2633-triage-symptoms.md:412-419).
  • SUGGESTION — Plan specifies Regressions column should display "2 / 5" (fraction format) but implementation only shows the count.
  • SUGGESTION — Plan specifies symptom rows should link to Job Artifact Query page, but implementation renders plain chips with no links.
  • NOTE — New public Go functions have appropriate godoc comments. No existing external documentation requires updating.

Panel Synthesis

Three independently-arriving specialists (Architecture, UX/API, Devil's Advocate, QA) all converge on the same two data integrity issues:

  1. SyncTriageSymptoms is additive-only and never cleans up stale junction rows. The design doc claims symptoms "only accumulate," but MergeJobRuns uses Assign which replaces JobSymptoms on each sync. If BigQuery data changes, symptom counts become permanently inflated.

  2. No FK constraint from triage_symptoms.regression_id to test_regressions.id. When a regression is deleted, orphaned rows persist and inflate symptom counts.


Required Actions Before Merge

  1. Add cleanup of stale triage_symptoms rows in SyncTriageSymptoms (pkg/api/componentreadiness/regressiontracker.go:131-184). After computing symptomCounts for each (triage, regression) pair, delete any existing triage_symptoms rows whose symptom_id is not in the current set.

  2. Add FK constraint from triage_symptoms.regression_id to test_regressions.id with ON DELETE CASCADE (pkg/db/db.go). Follow the existing ensureTriageSymptomCascade pattern.


Optional Follow-ups

  • Consolidate FirstOrCreate + Update into Assign + FirstOrCreate (or batch ON CONFLICT ... DO UPDATE) to reduce DB round-trips
  • Add unit tests for GetTriageSymptomSummaries covering edge cases: zero regressions, stale symptom IDs, percentage rounding, sort order
  • Slim down TriageSymptomSummary to expose only ID and Summary from the Symptom model rather than the full struct with matching rules
  • Add json:"-" to the TriageSymptoms field on the Triage model to prevent raw junction rows from leaking into API responses
  • Deduplicate seedSymptom/cleanupTriageSymptoms test helpers across e2e packages
  • Update plan document to reflect implementation deviations (independent expand handling, count-only display, deferred JAQ links)

Review generated by multi-specialist panel (Architecture, Security & Supply Chain, UX/API, Codebase Consistency, QA, Devil's Advocate, Technical Writer + Panel Arbiter)

@stbenjam
Copy link
Copy Markdown
Member

stbenjam commented May 5, 2026

From the panel review, the required requests seem to make sense to me, and the unit test ask seems reasonable if its testable.

@coderabbitai What do you think about the panel review above? ^^

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me pull up the actual review comments on this PR to understand what you're referring to.


🏁 Script executed:

#!/bin/bash
# Get review comments and reviews on the PR
gh pr view 3497 --repo openshift/sippy --json reviews,comments

echo "---REVIEW COMMENTS---"
gh api repos/openshift/sippy/pulls/3497/reviews --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, state: .state, body: .body}'

echo "---INLINE COMMENTS---"
gh api repos/openshift/sippy/pulls/3497/comments --jq '.[] | select(.user.login == "coderabbitai") | {id: .id, path: .path, body: .body[0:500]}'

Length of output: 67911


@stbenjam I agree with the panel review's assessment. The two required actions are both valid data integrity concerns that I independently flagged in my own review as well:

Required Action 1 (stale triage_symptoms rows): This is the more impactful of the two. SyncTriageSymptoms is purely additive — if BigQuery data changes (a symptom definition is removed, or a job run is re-processed with different symptoms), old rows persist with stale counts. MergeJobRuns already handles this correctly for JobSymptoms by using Assign, which replaces the field on each sync, so there's precedent in this file for the expected behavior.

Required Action 2 (regression_id FK): A straightforward addition following the exact pattern already established by ensureTriageSymptomCascade in pkg/db/db.go. One new ensureTriageSymptomRegressionCascade call handles it cleanly.

On the unit test ask: Yes, GetTriageSymptomSummaries is testable and worth covering. It has non-trivial logic — silent-skip when totalRegressions == 0, percentage computation, sort order, and the stale-symptom-ID case. The existing e2e infrastructure in test/e2e/componentreadiness/triage/triageapi_test.go already seeds the necessary data (symptoms, regressions, triage records), so adding a focused test for this function's edge cases is low-friction. The stale-row-removal scenario is also testable with the same scaffolding: seed → sync → remove symptom → re-sync → assert row is gone.

One item from my own review that the panel didn't elevate to "required" but is worth noting: in pkg/sippyserver/server.go around line 1712-1718, errors from GetTriageSymptomSummaries are currently logged and swallowed, returning a 200 with missing symptom data. That's a silent failure the caller can't distinguish from "no symptoms found." Worth fixing alongside the other two items.

@stbenjam
Copy link
Copy Markdown
Member

stbenjam commented May 5, 2026

@coderabbitai

Required Action 1 (stale triage_symptoms rows): This is the more impactful of the two. SyncTriageSymptoms is purely additive — if BigQuery data changes (a symptom definition is removed, or a job run is re-processed with different symptoms), old rows persist with stale counts. MergeJobRuns already handles this correctly for JobSymptoms by using Assign, which replaces the field on each sync, so there's precedent in this file for the expected behavior.

BigQuery doesn't really ever change, though.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 5, 2026

Tip

For best results, initiate chat on the files or code changes.

@stbenjam That's a fair correction — if BigQuery is append-only and historical job run records are immutable, then the additive-only behavior of SyncTriageSymptoms is not a problem. The stale-row concern doesn't apply. I'll withdraw that as a required action.


📝 Learnings were identified and not saved (knowledge base disabled). Enable

@smg247
Copy link
Copy Markdown
Member Author

smg247 commented May 5, 2026

This was designed with the assumption that job runs would not stop showing symptoms unless the symptom itself was deleted. I can't think of any reason why we would want to account for that possibility, and I believe it will make the logic more complex to do so.

I will add the FK constraint from triage_symptoms.regression_id to test_regressions.id as that is certainly desired.

@smg247
Copy link
Copy Markdown
Member Author

smg247 commented May 5, 2026

Waiting on #3481 merge to add additional unit testing

  Add FK cascades from triage_symptoms to both job_run_symptoms and
  test_regressions. Use ON CONFLICT DO UPDATE for single-roundtrip upserts.
  Slim TriageSymptomSummary to expose only symptom ID and summary. Hide
  raw junction rows from API via json:"-". Deduplicate e2e test helpers
  into test/e2e/util.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/e2e/util/db.go (1)

53-57: ⚡ Quick win

Avoid table-wide cleanup in this shared e2e helper.

CleanupTriageSymptoms wipes the entire junction table, so one symptom e2e package can erase another package's fixtures if they share the same SIPPY_E2E_DSN. Prefer deleting by the triage or regression IDs created by the current test instead of issuing a global delete here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/util/db.go` around lines 53 - 57, CleanupTriageSymptoms currently
does a table-wide delete on models.TriageSymptom via dbc.DB.Where("1 =
1").Delete(...), which can wipe other tests' fixtures; change the function
signature CleanupTriageSymptoms to accept identifiers from the current test
(e.g., triageIDs []int or regressionIDs []int or a filter struct) and replace
the global delete with a scoped delete using Where with those IDs (e.g.,
"triage_id IN (?)" or "regression_id IN (?)") so only rows created by the
current test are removed; locate references to CleanupTriageSymptoms and update
callers to pass the appropriate IDs created during the test setup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/e2e/util/db.go`:
- Around line 53-57: CleanupTriageSymptoms currently does a table-wide delete on
models.TriageSymptom via dbc.DB.Where("1 = 1").Delete(...), which can wipe other
tests' fixtures; change the function signature CleanupTriageSymptoms to accept
identifiers from the current test (e.g., triageIDs []int or regressionIDs []int
or a filter struct) and replace the global delete with a scoped delete using
Where with those IDs (e.g., "triage_id IN (?)" or "regression_id IN (?)") so
only rows created by the current test are removed; locate references to
CleanupTriageSymptoms and update callers to pass the appropriate IDs created
during the test setup.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: b3d14cb1-aed8-40f1-b31c-5361f60b0dff

📥 Commits

Reviewing files that changed from the base of the PR and between b4e7c38 and 36a8820.

📒 Files selected for processing (8)
  • docs/plans/trt-2633-triage-symptoms.md
  • pkg/api/componentreadiness/regressiontracker.go
  • pkg/api/componentreadiness/triage.go
  • pkg/db/db.go
  • pkg/db/models/triage.go
  • test/e2e/componentreadiness/regressiontracker/regressiontracker_test.go
  • test/e2e/componentreadiness/triage/triageapi_test.go
  • test/e2e/util/db.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/api/componentreadiness/triage.go

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test e2e

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 6, 2026

@smg247: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants