TRT-2633: Link symptoms to triage records with summaries and filtering#3497
TRT-2633: Link symptoms to triage records with summaries and filtering#3497smg247 wants to merge 3 commits intoopenshift:mainfrom
Conversation
Surfaces job run symptoms on triage detail pages by syncing them into a triage_symptoms junction table during regression cache loading, then exposing per-symptom summaries (with regression IDs and job run counts) via the expand=symptoms query parameter. Adds frontend symptom chips on regression rows and a filterable symptom summary panel. Includes unit and e2e tests for symptom syncing, API expansion, and cascade deletion.
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
@smg247: This pull request references TRT-2633 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughAdds end-to-end symptom tracking: BigQuery collects job symptom IDs, pipeline carries them into RegressionJobRun records, a new ChangesSymptom Association Feature
Sequence DiagramsequenceDiagram
participant BQ as BigQuery
participant Pipeline as Regression Pipeline
participant Store as Regression Store
participant Cache as Regression Cache Loader
participant API as Triage API
participant FE as Frontend UI
BQ->>BQ: Aggregate distinct symptom_ids into\njob_symptoms (comma-separated)
BQ->>Pipeline: Return job_symptoms column
Pipeline->>Pipeline: Deserialize job_symptoms -> JobRunStats.JobSymptoms
Pipeline->>Pipeline: Include JobSymptoms in RegressionJobRun
Pipeline->>Store: MergeJobRuns (upsert) with JobSymptoms
Cache->>Store: Call SyncTriageSymptoms(active regressions)
Store->>Store: Load regressions with triages & job runs
Store->>Store: For each regression: dedupe symptoms per run, count job_run occurrences per symptom
Store->>Store: Upsert triage_symptoms rows (idempotent)
API->>Store: GetTriageSymptomSummaries on expand=symptoms
Store->>API: Return symptom summaries with counts and regression IDs
API->>FE: Respond with triage including symptom_summaries
FE->>FE: Render Symptoms section, allow filtering, render symptom chips per regression
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 12 | ❌ 5❌ Failed checks (4 warnings, 1 inconclusive)
✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.1)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smg247 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/api/componentreadiness/regressiontracker.go`:
- Around line 113-119: The DB upsert currently only Assigns JobLabels and
JobSymptoms, leaving other fields stale; change the upsert so it assigns the
full models.RegressionJobRun from the incoming jobRuns[i] (or explicitly include
ProwJobURL, StartTime, TestFailed, TestFailures, etc.) instead of the partial
struct so FirstOrCreate updates all metadata for an existing prow_job_run_id;
locate the call using
prs.dbc.DB.Where(...).Assign(...).FirstOrCreate(&jobRuns[i]) and replace the
Assign payload with the complete jobRuns[i] data (or use Updates with
jobRuns[i]) to refresh the whole record on re-merge.
- Around line 151-181: During the recount loop you currently only upsert
observed TriageSymptom rows (models.TriageSymptom) but never remove obsolete
rows, which leaves stale triage_symptoms visible in GetTriageSymptomSummaries;
fix by computing the observed symptom IDs per regression/triage and deleting any
db rows that are not in that observed set before/after the FirstOrCreate/update
step. Concretely, inside the regs loop (use reg.ID and triage.ID) build a slice
or set of observed symptomIDs from symptomCounts for that regression, then run
prs.dbc.DB.Where("regression_id = ? AND triage_id = ? AND symptom_id NOT IN
(?)", reg.ID, triage.ID, observedSlice).Delete(&models.TriageSymptom{}) (or
handle empty observedSlice by deleting all for that triage/regression); then
continue with the existing upsert/update logic so only current symptoms remain.
In `@pkg/sippyserver/server.go`:
- Around line 1712-1718: The handler currently logs errors from
componentreadiness.GetTriageSymptomSummaries and continues, causing callers to
receive a 200 with missing data; update the expandFields["symptoms"] branch to
propagate the error instead of swallowing it: when
componentreadiness.GetTriageSymptomSummaries(s.db, triage.ID,
len(triage.Regressions)) returns an err, return that error (or wrap it with
context about triage.ID and the expand key) from the surrounding function so
callers see a failure; only set et.SymptomSummaries when no error is returned.
In `@sippy-ng/src/component_readiness/TriageSymptoms.js`:
- Around line 82-99: The IconButton is currently an unnamed icon-only toggle;
add an accessible name and state by passing an aria-label and aria-pressed prop:
set aria-label to something descriptive like `Filter regressions to
${ss.symptom.name || ss.symptom.id}` and set aria-pressed={symptomFilter ===
ss.symptom.id}; keep the Tooltip but do not rely on it for accessibility, and
preserve the existing onClick and color logic (symbols: IconButton,
setSymptomFilter, symptomFilter, ss.symptom.id, FilterList).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 965165f9-7ba0-4667-a4ff-c2ddf229f4fb
📒 Files selected for processing (18)
docs/plans/trt-2633-triage-symptoms.mdpkg/api/componentreadiness/dataprovider/bigquery/querygenerators.gopkg/api/componentreadiness/regressiontracker.gopkg/api/componentreadiness/regressiontracker_test.gopkg/api/componentreadiness/test_details.gopkg/api/componentreadiness/triage.gopkg/apis/api/componentreport/crstatus/types.gopkg/apis/api/componentreport/testdetails/types.gopkg/dataloader/regressioncacheloader/regressioncacheloader.gopkg/db/db.gopkg/db/models/triage.gopkg/sippyserver/server.gosippy-ng/src/component_readiness/CompReadyUtils.jssippy-ng/src/component_readiness/Triage.jssippy-ng/src/component_readiness/TriageSymptoms.jssippy-ng/src/component_readiness/TriagedRegressionTestList.jstest/e2e/componentreadiness/regressiontracker/regressiontracker_test.gotest/e2e/componentreadiness/triage/triageapi_test.go
|
Scheduling required tests: |
with ON DELETE CASCADE so deleting a symptom definition cleans up associated triage_symptoms rows. Also fix error propagation for symptom summary lookups and add accessibility attributes to the symptom filter button.
|
Scheduling required tests: |
There was a problem hiding this comment.
Review Panel Verdict
Disposition: REQUEST_CHANGES — Two data integrity gaps around stale/orphaned triage_symptoms rows require fixes before merge.
Specialist Findings
Architecture Reviewer:
- SUGGESTION —
SyncTriageSymptomsusesFirstOrCreate+ separateUpdateinstead of theAssign+FirstOrCreatepattern already used byMergeJobRunsin the same file, doubling DB round-trips per symptom (regressiontracker.go:165-180). - SUGGESTION —
Preload("JobRuns")for all active regressions could load tens of thousands of rows into memory at once (regressiontracker.go:142-146). Consider batching or a SQL-based aggregation. - SUGGESTION —
GetTriageSymptomSummariesmakes 3 DB queries where 2 would suffice — query #3 re-fetches rows already partially aggregated in query #1 (triage.go:878-941). - NOTE — No FK constraint from
triage_symptoms.regression_idtotest_regressions.id, risking orphaned rows on regression deletion. - NOTE —
regressionSymptomMapis referenced before its declaration inTriagedRegressionTestList.js(works at runtime due to closure semantics but is fragile). - NOTE — Removal of
backfillClosedRegressionViewsis unrelated cleanup bundled into this PR. - NOTE — Overall architecture is sound, follows existing patterns well. Error handling is correct throughout.
Security & Supply Chain Reviewer:
- SUGGESTION —
TriageSymptomSummaryembeds the fulljobrunscan.Symptommodel, exposing internal matching rules (MatchString,FilePattern,MatcherType) in API responses. Consider a slimmed-down DTO with justIDandSummary(triage.go:868). - NOTE —
expandparameter silently ignores unknown values — safe but could mask typos. - NOTE — Error messages include internal details, consistent with existing codebase patterns.
- NOTE — No dependency changes, no new external calls, no injection vectors, no XSS concerns. Supply chain is clean.
UX & API Reviewer:
- SUGGESTION — No validation of invalid
expandvalues;?expand=symtpoms(typo) silently returns no symptom data (server.go:1680-1684). - SUGGESTION — Stale
triage_symptomsrows are never cleaned up, leading to inaccurate data over time when symptoms disappear from job runs (regressiontracker.go:131-184). - SUGGESTION —
TriageSymptomsfield on theTriagemodel hasjson:"triage_symptoms,omitempty"— raw junction rows could leak into API responses if someone adds preloading. Considerjson:"-"(models/triage.go:52). - SUGGESTION — Tooltip on the "Symptoms" heading could be more discoverable with an info icon (
TriageSymptoms.js:26-33). - NOTE — Symptom chip truncation at 12 characters is arbitrary; CSS
text-overflowmight adapt better. - NOTE — Good accessibility practices with
aria-labelandaria-pressed. DeterministicsymptomColoris a good design choice.
Codebase Consistency Reviewer:
- SUGGESTION —
seedSymptomandcleanupTriageSymptomstest helpers are duplicated across two e2e packages with inconsistent implementations (one usesFirstOrCreate, the other usesCreate). - SUGGESTION — Duplicate imports from
'./CompReadyUtils'inTriagedRegressionTestList.js(lines 5 and 8) — should be combined into one. - SUGGESTION —
SyncTriageSymptomsuses two-query upsert instead of theAssignpattern established byMergeJobRunsin the same file. - SUGGESTION —
RegressedTestsJSON tag missingomitempty, inconsistent withSymptomSummarieswhich has it (server.go:1672). - NOTE —
sort.Sliceused instead ofslices.SortFuncdespiteslicesalready being imported (triage.go:938). - NOTE — Overall strong consistency with existing patterns in GORM models, error wrapping, BigQuery queries, logging, and frontend conventions.
QA Engineer:
- BLOCKING — No unit test for
GetTriageSymptomSummaries, a non-trivial exported function with aggregation logic, silent-skip behavior, and percentage computation (triage.go:878). - BLOCKING —
SyncTriageSymptomsnever removes stale junction rows. No test covers the scenario where a symptom disappears from all job runs after a sync (regressiontracker.go:128-184). - SUGGESTION — No test for
expandparameter with unknown/invalid values. - SUGGESTION — No test for
symptomColorutility function (CompReadyUtils.js:914-920). - SUGGESTION — No test for duplicate symptom IDs within a single job run's
JobSymptomsarray. - NOTE —
LinearProgresspercentage could theoretically exceed 100% if data is inconsistent. Consider clamping. - NOTE — No frontend tests for
TriageSymptomscomponent, consistent with existing codebase (no existing tests for these components).
Devil's Advocate:
- BLOCKING — No FK from
triage_symptoms.regression_idtotest_regressions.id. When a regression is deleted, orphanedtriage_symptomsrows persist and inflate symptom counts. When a regression is unlinked from a triage, same problem (models/triage.go:271-276). - BLOCKING —
SyncTriageSymptomsnever deletes stale junction rows. If BigQuery returns different symptoms for a job run on a subsequent query (e.g., symptom definition deleted, BQ data corrected), old rows persist with stale counts (regressiontracker.go:131-184). - SUGGESTION — N+1 DB queries:
200 regressions * 3 triages * 5 symptoms * 2 ops = 6000round-trips per loader run. Use bulkON CONFLICT ... DO UPDATE. - SUGGESTION —
Preload("JobRuns")loads all job runs into memory. Consider SQL-based aggregation to avoid the memory spike. - SUGGESTION —
backfillClosedRegressionViewsremoval is unrelated; confirm it has run in all environments. - NOTE —
SyncTriageSymptomsincludes regressions from errored releases — intentional but symptom counts may be understated.
Technical Writer:
- SUGGESTION — Plan doc states symptoms are "also computed when
regressionsis requested" but the implementation handles them independently; frontend explicitly requests both (docs/plans/trt-2633-triage-symptoms.md:412-419). - SUGGESTION — Plan specifies Regressions column should display "2 / 5" (fraction format) but implementation only shows the count.
- SUGGESTION — Plan specifies symptom rows should link to Job Artifact Query page, but implementation renders plain chips with no links.
- NOTE — New public Go functions have appropriate godoc comments. No existing external documentation requires updating.
Panel Synthesis
Three independently-arriving specialists (Architecture, UX/API, Devil's Advocate, QA) all converge on the same two data integrity issues:
-
SyncTriageSymptomsis additive-only and never cleans up stale junction rows. The design doc claims symptoms "only accumulate," butMergeJobRunsusesAssignwhich replacesJobSymptomson each sync. If BigQuery data changes, symptom counts become permanently inflated. -
No FK constraint from
triage_symptoms.regression_idtotest_regressions.id. When a regression is deleted, orphaned rows persist and inflate symptom counts.
Required Actions Before Merge
-
Add cleanup of stale
triage_symptomsrows inSyncTriageSymptoms(pkg/api/componentreadiness/regressiontracker.go:131-184). After computingsymptomCountsfor each(triage, regression)pair, delete any existingtriage_symptomsrows whosesymptom_idis not in the current set. -
Add FK constraint from
triage_symptoms.regression_idtotest_regressions.idwithON DELETE CASCADE(pkg/db/db.go). Follow the existingensureTriageSymptomCascadepattern.
Optional Follow-ups
- Consolidate
FirstOrCreate+UpdateintoAssign+FirstOrCreate(or batchON CONFLICT ... DO UPDATE) to reduce DB round-trips - Add unit tests for
GetTriageSymptomSummariescovering edge cases: zero regressions, stale symptom IDs, percentage rounding, sort order - Slim down
TriageSymptomSummaryto expose onlyIDandSummaryfrom theSymptommodel rather than the full struct with matching rules - Add
json:"-"to theTriageSymptomsfield on theTriagemodel to prevent raw junction rows from leaking into API responses - Deduplicate
seedSymptom/cleanupTriageSymptomstest helpers across e2e packages - Update plan document to reflect implementation deviations (independent expand handling, count-only display, deferred JAQ links)
Review generated by multi-specialist panel (Architecture, Security & Supply Chain, UX/API, Codebase Consistency, QA, Devil's Advocate, Technical Writer + Panel Arbiter)
|
From the panel review, the required requests seem to make sense to me, and the unit test ask seems reasonable if its testable. @coderabbitai What do you think about the panel review above? ^^ |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chainLet me pull up the actual review comments on this PR to understand what you're referring to. 🏁 Script executed: Length of output: 67911
Required Action 1 (stale Required Action 2 ( On the unit test ask: Yes, One item from my own review that the panel didn't elevate to "required" but is worth noting: in |
BigQuery doesn't really ever change, though. |
|
Tip For best results, initiate chat on the files or code changes.
|
|
This was designed with the assumption that job runs would not stop showing symptoms unless the symptom itself was deleted. I can't think of any reason why we would want to account for that possibility, and I believe it will make the logic more complex to do so. I will add the FK constraint from |
|
Waiting on #3481 merge to add additional unit testing |
Add FK cascades from triage_symptoms to both job_run_symptoms and test_regressions. Use ON CONFLICT DO UPDATE for single-roundtrip upserts. Slim TriageSymptomSummary to expose only symptom ID and summary. Hide raw junction rows from API via json:"-". Deduplicate e2e test helpers into test/e2e/util.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/e2e/util/db.go (1)
53-57: ⚡ Quick winAvoid table-wide cleanup in this shared e2e helper.
CleanupTriageSymptomswipes the entire junction table, so one symptom e2e package can erase another package's fixtures if they share the sameSIPPY_E2E_DSN. Prefer deleting by the triage or regression IDs created by the current test instead of issuing a global delete here.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/util/db.go` around lines 53 - 57, CleanupTriageSymptoms currently does a table-wide delete on models.TriageSymptom via dbc.DB.Where("1 = 1").Delete(...), which can wipe other tests' fixtures; change the function signature CleanupTriageSymptoms to accept identifiers from the current test (e.g., triageIDs []int or regressionIDs []int or a filter struct) and replace the global delete with a scoped delete using Where with those IDs (e.g., "triage_id IN (?)" or "regression_id IN (?)") so only rows created by the current test are removed; locate references to CleanupTriageSymptoms and update callers to pass the appropriate IDs created during the test setup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/e2e/util/db.go`:
- Around line 53-57: CleanupTriageSymptoms currently does a table-wide delete on
models.TriageSymptom via dbc.DB.Where("1 = 1").Delete(...), which can wipe other
tests' fixtures; change the function signature CleanupTriageSymptoms to accept
identifiers from the current test (e.g., triageIDs []int or regressionIDs []int
or a filter struct) and replace the global delete with a scoped delete using
Where with those IDs (e.g., "triage_id IN (?)" or "regression_id IN (?)") so
only rows created by the current test are removed; locate references to
CleanupTriageSymptoms and update callers to pass the appropriate IDs created
during the test setup.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: b3d14cb1-aed8-40f1-b31c-5361f60b0dff
📒 Files selected for processing (8)
docs/plans/trt-2633-triage-symptoms.mdpkg/api/componentreadiness/regressiontracker.gopkg/api/componentreadiness/triage.gopkg/db/db.gopkg/db/models/triage.gotest/e2e/componentreadiness/regressiontracker/regressiontracker_test.gotest/e2e/componentreadiness/triage/triageapi_test.gotest/e2e/util/db.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/api/componentreadiness/triage.go
|
Scheduling required tests: |
|
@smg247: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Surfaces job run symptoms on triage detail pages by syncing them into a
triage_symptoms junction table during regression cache loading, then
exposing per-symptom summaries (with regression IDs and job run counts)
via the expand=symptoms query parameter. Adds frontend symptom chips on
regression rows and a filterable symptom summary panel. Includes unit
and e2e tests for symptom syncing, API expansion, and cascade deletion.
Triage Details page with Symptoms:

Filtered:

Audit logging was left out of this effort. In the future, symptoms might be manually added to the triage. At that time, we would want to consider how to handle the audit logging (likely similar to what we do with regressions).
Summary by CodeRabbit
New Features
Chores
Tests