Skip to content

feat: improve admin queue diagnostics and recovery ui#571

Merged
frostebite merged 2 commits into
mainfrom
fix/image-row-admin-retry
May 15, 2026
Merged

feat: improve admin queue diagnostics and recovery ui#571
frostebite merged 2 commits into
mainfrom
fix/image-row-admin-retry

Conversation

@frostebite
Copy link
Copy Markdown
Member

@frostebite frostebite commented May 14, 2026

Summary

  • improve the admin queue UI with likely blockers, selected-repo queue health, and stale-work tables
  • expand the copied diagnostics prompt with base/hub status, selected-repo job state, build state, and stale-age signals
  • add admin-only image-row retry and reset controls for failed images without requiring the row to be expanded
  • improve the top-level admin layout and build dashboard so queue problems are visible earlier

Testing

  • yarn typecheck
  • yarn test
  • yarn build

Summary by CodeRabbit

Release Notes

  • New Features

    • Added admin-only actions to reset and retry failed CI builds
    • Dashboard now tracks build duration, highlighting builds running longer than 45 minutes
    • Introduced "Likely Blockers" diagnostics panel to identify queue bottlenecks
  • UI Improvements

    • Reorganized Docker version selector and admin controls into a styled card-based layout
    • Enhanced queue management panel with stale build and job visibility

Review Change Stack

@github-actions
Copy link
Copy Markdown

Cat Gif

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 549e2bb4-36d4-4e24-adc5-6a7c5605ceba

📥 Commits

Reviewing files that changed from the base of the PR and between 03cb379 and bd011e1.

📒 Files selected for processing (5)
  • src/components/docs/versions/build-status-dashboard.tsx
  • src/components/docs/versions/image-job-admin-actions.tsx
  • src/components/docs/versions/image-versions.tsx
  • src/components/docs/versions/queue-management-panel.tsx
  • src/components/docs/versions/unity-version.tsx

📝 Walkthrough

Walkthrough

Refactors the ImageVersions header into a styled card/toolbar, adds timestamp-to-age helpers and “started” build age metrics to the dashboard, and expands queue/job/build types plus diagnostics to compute age-based stale lists, maxed-out failures, and a “Likely Blockers” summary in the queue panel.

Changes

Version UI and Queue Diagnostics

Layer / File(s) Summary
ImageVersions header & toolbar
src/components/docs/versions/image-versions.tsx
Adds headerCardStyle and toolbarStyle, refactors the header to wrap the version selector and admin controls in a styled card/toolbar, and adds a short explanatory paragraph under the toolbar.
Build-status dashboard age helpers and metrics
src/components/docs/versions/build-status-dashboard.tsx
Adds minutesSinceTimestamp and formatAgeMinutes helpers, computes per-build started ages, counts started builds ≥45m, computes the oldest started build age, and renders two new stat chips: “Started 45m+” and “Oldest started”.
Queue model, diagnostics, and panel UI
src/components/docs/versions/queue-management-panel.tsx
Extends QueueJob/QueueBuild shapes with timestamp/meta fields, adds minutes/age helpers, extends diagnostics prompt shape and computation with selected-repo metrics (age thresholds, oldest ages, status counts, maxed-out failures), builds likelyBlockers, and updates the rendered panel and tables to show selected-repo summary stats and stale created/started lists.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • game-ci/documentation#551: Related admin operations on ciBuilds and failureCount >= 15 logic; overlaps in reset/retry endpoints and stuck-build handling.
  • game-ci/documentation#570: Related updates to queue-management-panel.tsx and diagnostics for repo-version-scoped queue health metrics.
  • game-ci/documentation#547: Related UI changes to admin action placement within image-versions header and admin button grouping.

Suggested reviewers

  • GabLeRoux
  • webbertakken

Poem

🐇 I hop through headers, timestamps in paw,
I count the minutes each build saw,
A card, two chips, and a diagnostics bell,
I nudge stale jobs so queues run well. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers the main changes and testing performed, but is missing the required checklist items from the template (contribution guide, README, and Tests checkboxes). Add the required checklist section with checkboxes for contribution guide, README status, and Tests status as specified in the repository template.
✅ Passed checks (4 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately describes the main change: adding admin controls for queue diagnostics and recovery in the UI, which aligns with the component additions.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/image-row-admin-retry

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

src/components/docs/versions/image-job-admin-actions.tsx

Oops! Something went wrong! :(

ESLint: 10.3.0

ESLint couldn't find an eslint.config.(js|mjs|cjs) file.

From ESLint v9.0.0, the default configuration file is now eslint.config.js.
If you are using a .eslintrc.* file, please follow the migration guide
to update your configuration file to the new format:

https://eslint.org/docs/latest/use/configure/migration-guide

If you still have problems after following the migration guide, please stop by
https://eslint.org/chat/help to chat with the team.

src/components/docs/versions/queue-management-panel.tsx

Oops! Something went wrong! :(

ESLint: 10.3.0

ESLint couldn't find an eslint.config.(js|mjs|cjs) file.

From ESLint v9.0.0, the default configuration file is now eslint.config.js.
If you are using a .eslintrc.* file, please follow the migration guide
to update your configuration file to the new format:

https://eslint.org/docs/latest/use/configure/migration-guide

If you still have problems after following the migration guide, please stop by
https://eslint.org/chat/help to chat with the team.

src/components/docs/versions/build-status-dashboard.tsx

Oops! Something went wrong! :(

ESLint: 10.3.0

ESLint couldn't find an eslint.config.(js|mjs|cjs) file.

From ESLint v9.0.0, the default configuration file is now eslint.config.js.
If you are using a .eslintrc.* file, please follow the migration guide
to update your configuration file to the new format:

https://eslint.org/docs/latest/use/configure/migration-guide

If you still have problems after following the migration guide, please stop by
https://eslint.org/chat/help to chat with the team.

  • 2 others

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 14, 2026

Visit the preview URL for this PR (updated for commit bd011e1):

https://game-ci-5559f--pr571-fix-image-row-admin-4da2rmrp.web.app

(expires Fri, 22 May 2026 00:25:47 GMT)

🔥 via Firebase Hosting GitHub Action 🌎

Sign: 1f0574f15f83e11bfc148eae8646486a6d0e078b

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/components/docs/versions/unity-version.tsx (1)

61-76: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Nested <button> elements in row header: move admin actions outside the toggle button.

ImageJobAdminActions renders two <button> elements (lines 122 and 137 of image-job-admin-actions.tsx), placed as a direct child inside the outer <button className={styles.versionButton}> (line 63–69 of unity-version.tsx). Nested interactive elements are invalid HTML and cause unpredictable click handling—clicking inner buttons may also trigger the row toggle or cause hydration warnings.

Refactor to move ImageJobAdminActions out of the toggle button; render both as siblings (e.g., flex wrapper) or replace the row toggle with a non-button clickable element.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/components/docs/versions/unity-version.tsx` around lines 61 - 76, The row
toggle button in unity-version.tsx currently contains the ImageJobAdminActions
component (which itself renders buttons), causing nested interactive elements;
move ImageJobAdminActions out of the toggle button so they are siblings.
Concretely: replace the single <button className={styles.versionButton}
onClick={ToggleEnable}> wrapper with a wrapper element (e.g., a flex <div> or
existing row container) that contains the toggle <button> (keep ToggleEnable and
non-interactive bits like id, icon, DateTime, ShowAndCopyChangeSetHashButton
inside that button) and render <ImageJobAdminActions ciJobId={id}
status={status} /> as a sibling element to the toggle button; adjust CSS
(styles.versionButton / row wrapper) to preserve layout and ensure the toggle
only surrounds non-interactive content so inner admin buttons are not nested.
🧹 Nitpick comments (1)
src/components/docs/versions/image-job-admin-actions.tsx (1)

87-99: ⚡ Quick win

Log swallowed per-build errors so admins can diagnose partial failures.

The catch on line 96 discards the underlying error. When the notification reports Reset 3/5 builds. 2 failed., there's no trail to figure out why — the admin would have to re-run with DevTools open and patch in logging. A single console.error keeps the user-facing UX identical while preserving the failure context.

♻️ Proposed fix
       for (const build of builds) {
         try {
           const payload =
             endpoint === 'retryBuild'
               ? { buildId: build.buildId, relatedJobId: build.relatedJobId }
               : { buildId: build.buildId };
           // Sequential calls avoid hammering the backend for a single image row action.
           await callEndpoint(endpoint, payload);
           succeeded += 1;
-        } catch {
+        } catch (error) {
+          console.error(`Admin action "${action}" failed for build ${build.buildId}:`, error);
           failed += 1;
         }
       }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/components/docs/versions/image-job-admin-actions.tsx` around lines 87 -
99, The catch block inside the for-loop that iterates over builds (around the
try { await callEndpoint(endpoint, payload); } catch { ... }) is swallowing
per-build errors; update the catch to log the thrown error and contextual info
(e.g., the build object or build.buildId and endpoint) via console.error (or
processLogger if available) so admins can diagnose partial failures while
keeping the existing succeeded/failed counting and user-facing notification
behavior intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/components/docs/versions/image-job-admin-actions.tsx`:
- Around line 67-71: The code currently calls response.json() unconditionally
which throws on non-JSON error bodies; change the logic in the block surrounding
the response handling (the const body = await response.json(); and the
subsequent if (!response.ok) branch) to first read the raw text via
response.text(), then try to JSON.parse that text inside a try/catch to produce
a parsed body; when response.ok is false build the detail using
parsedBody.message / parsedBody.error if available, otherwise fall back to the
raw text and finally to `Request failed (${response.status})`, ensuring any JSON
parse errors are swallowed and the real status and text are surfaced in the
thrown Error.
- Around line 41-44: The problem is that useFirestoreCollectionData is called
unconditionally, creating a Firestore listener per row even when status !==
'failed'; fix by moving the ciBuilds construction and the
useFirestoreCollectionData<BuildRecord> hook into a new child component (e.g.,
FailedJobBuilds or ImageJobAdminBuilds) that accepts ciJobId as a prop and
renders the build UI, then in the parent component do an early return when
status !== 'failed' so the child (and thus ciBuilds/useFirestoreCollectionData)
is only mounted for failed jobs; reference ciBuilds, useFirestoreCollectionData,
BuildRecord, ciJobId, status and buildStatus when updating the code.

---

Outside diff comments:
In `@src/components/docs/versions/unity-version.tsx`:
- Around line 61-76: The row toggle button in unity-version.tsx currently
contains the ImageJobAdminActions component (which itself renders buttons),
causing nested interactive elements; move ImageJobAdminActions out of the toggle
button so they are siblings. Concretely: replace the single <button
className={styles.versionButton} onClick={ToggleEnable}> wrapper with a wrapper
element (e.g., a flex <div> or existing row container) that contains the toggle
<button> (keep ToggleEnable and non-interactive bits like id, icon, DateTime,
ShowAndCopyChangeSetHashButton inside that button) and render
<ImageJobAdminActions ciJobId={id} status={status} /> as a sibling element to
the toggle button; adjust CSS (styles.versionButton / row wrapper) to preserve
layout and ensure the toggle only surrounds non-interactive content so inner
admin buttons are not nested.

---

Nitpick comments:
In `@src/components/docs/versions/image-job-admin-actions.tsx`:
- Around line 87-99: The catch block inside the for-loop that iterates over
builds (around the try { await callEndpoint(endpoint, payload); } catch { ... })
is swallowing per-build errors; update the catch to log the thrown error and
contextual info (e.g., the build object or build.buildId and endpoint) via
console.error (or processLogger if available) so admins can diagnose partial
failures while keeping the existing succeeded/failed counting and user-facing
notification behavior intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7244d67f-088a-4fe1-a240-8d8168167125

📥 Commits

Reviewing files that changed from the base of the PR and between 8d2c3d4 and 03cb379.

📒 Files selected for processing (2)
  • src/components/docs/versions/image-job-admin-actions.tsx
  • src/components/docs/versions/unity-version.tsx

Comment on lines +41 to +44
const ciBuilds = firestore.collection('ciBuilds').where('relatedJobId', '==', ciJobId);
const { status: buildStatus, data = [] } = useFirestoreCollectionData<BuildRecord>(ciBuilds);

if (status !== 'failed' || buildStatus === 'loading') return null;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid subscribing to Firestore for every non-failed job row.

useFirestoreCollectionData is invoked unconditionally, so a per-row ciBuilds listener (filtered by relatedJobId) is created for every UnityVersion rendered — including jobs that are completed, scheduled, inProgress, superseded, etc. On a docs page that renders many image versions, this fans out into N live Firestore listeners just to discard the data at the status !== 'failed' gate one line below. Gate the component before the hook runs by splitting the inner logic into a separate component.

♻️ Proposed fix: gate the Firestore subscription on status
-const ImageJobAdminActions = ({ ciJobId, status }: Props) => {
-  const firestore = useFirestore();
-  const { data: user } = useUser();
-  const notify = useNotification();
-  const [runningAction, setRunningAction] = useState<'reset' | 'retry' | null>(null);
-
-  const ciBuilds = firestore.collection('ciBuilds').where('relatedJobId', '==', ciJobId);
-  const { status: buildStatus, data = [] } = useFirestoreCollectionData<BuildRecord>(ciBuilds);
-
-  if (status !== 'failed' || buildStatus === 'loading') return null;
+const ImageJobAdminActions = ({ ciJobId, status }: Props) => {
+  if (status !== 'failed') return null;
+  return <ImageJobAdminActionsInner ciJobId={ciJobId} />;
+};
+
+const ImageJobAdminActionsInner = ({ ciJobId }: { ciJobId: string }) => {
+  const firestore = useFirestore();
+  const { data: user } = useUser();
+  const notify = useNotification();
+  const [runningAction, setRunningAction] = useState<'reset' | 'retry' | null>(null);
+
+  const ciBuilds = firestore.collection('ciBuilds').where('relatedJobId', '==', ciJobId);
+  const { status: buildStatus, data = [] } = useFirestoreCollectionData<BuildRecord>(ciBuilds);
+
+  if (buildStatus === 'loading') return null;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/components/docs/versions/image-job-admin-actions.tsx` around lines 41 -
44, The problem is that useFirestoreCollectionData is called unconditionally,
creating a Firestore listener per row even when status !== 'failed'; fix by
moving the ciBuilds construction and the useFirestoreCollectionData<BuildRecord>
hook into a new child component (e.g., FailedJobBuilds or ImageJobAdminBuilds)
that accepts ciJobId as a prop and renders the build UI, then in the parent
component do an early return when status !== 'failed' so the child (and thus
ciBuilds/useFirestoreCollectionData) is only mounted for failed jobs; reference
ciBuilds, useFirestoreCollectionData, BuildRecord, ciJobId, status and
buildStatus when updating the code.

Comment on lines +67 to +71
const body = await response.json();
if (!response.ok) {
const detail = body.error ? `${body.message}: ${body.error}` : body.message;
throw new Error(detail || `Request failed (${response.status})`);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Parse JSON defensively so non-JSON error bodies don't mask the real failure.

response.json() runs before the response.ok check, so any non-JSON error response (gateway HTML 5xx, empty body 504, auth proxy redirect, etc.) throws SyntaxError: Unexpected token ... and the real status code/message never reaches the admin via the notification. Read the body as text first and parse it safely.

🛡️ Proposed fix
-    const body = await response.json();
-    if (!response.ok) {
-      const detail = body.error ? `${body.message}: ${body.error}` : body.message;
-      throw new Error(detail || `Request failed (${response.status})`);
-    }
-    return body;
+    const rawText = await response.text();
+    let body: { message?: string; error?: string; [key: string]: unknown } = {};
+    if (rawText) {
+      try {
+        body = JSON.parse(rawText);
+      } catch {
+        // Non-JSON response; fall through with empty body so we surface the status code.
+      }
+    }
+    if (!response.ok) {
+      const detail = body.error ? `${body.message}: ${body.error}` : body.message;
+      throw new Error(detail || `Request failed (${response.status})`);
+    }
+    return body;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const body = await response.json();
if (!response.ok) {
const detail = body.error ? `${body.message}: ${body.error}` : body.message;
throw new Error(detail || `Request failed (${response.status})`);
}
const rawText = await response.text();
let body: { message?: string; error?: string; [key: string]: unknown } = {};
if (rawText) {
try {
body = JSON.parse(rawText);
} catch {
// Non-JSON response; fall through with empty body so we surface the status code.
}
}
if (!response.ok) {
const detail = body.error ? `${body.message}: ${body.error}` : body.message;
throw new Error(detail || `Request failed (${response.status})`);
}
return body;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/components/docs/versions/image-job-admin-actions.tsx` around lines 67 -
71, The code currently calls response.json() unconditionally which throws on
non-JSON error bodies; change the logic in the block surrounding the response
handling (the const body = await response.json(); and the subsequent if
(!response.ok) branch) to first read the raw text via response.text(), then try
to JSON.parse that text inside a try/catch to produce a parsed body; when
response.ok is false build the detail using parsedBody.message /
parsedBody.error if available, otherwise fall back to the raw text and finally
to `Request failed (${response.status})`, ensuring any JSON parse errors are
swallowed and the real status and text are surfaced in the thrown Error.

@frostebite frostebite requested a review from webbertakken May 14, 2026 15:26
@frostebite frostebite changed the title fix: add image-level admin retry controls feat: improve admin queue diagnostics and recovery ui May 14, 2026
@frostebite frostebite enabled auto-merge (squash) May 14, 2026 22:48
@webbertakken webbertakken force-pushed the fix/image-row-admin-retry branch from 4625bf2 to bd011e1 Compare May 15, 2026 00:22
@frostebite frostebite merged commit b407e0f into main May 15, 2026
7 of 8 checks passed
@frostebite frostebite deleted the fix/image-row-admin-retry branch May 15, 2026 00:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants