Skip to content

Dashboard: reduce GPU usage from frequent re-renders#1383

Open
jayy-77 wants to merge 1 commit intoexo-explore:mainfrom
jayy-77:fix-dashboard-gpu-usage-1025
Open

Dashboard: reduce GPU usage from frequent re-renders#1383
jayy-77 wants to merge 1 commit intoexo-explore:mainfrom
jayy-77:fix-dashboard-gpu-usage-1025

Conversation

@jayy-77
Copy link
Copy Markdown

@jayy-77 jayy-77 commented Feb 4, 2026

Fixes #1025

Summary

  • Throttle TopologyGraph SVG rebuilds to structural changes + periodic metric refresh.
  • Update interaction styling (hover/filter/highlight) without tearing down the SVG.
  • Switch /state polling to adaptive timeouts and back off when the tab is hidden.
  • Disable dashed-link animations unless debug mode is enabled.

Test plan

  • cd dashboard && npm ci && npm run check

Throttle TopologyGraph SVG rebuilds and switch state polling to adaptive timeouts
(with backoff when the tab is hidden). Also disable link dash animations unless
debug mode is enabled.
@AlexCheema
Copy link
Copy Markdown
Contributor

Code Review: Dashboard: reduce GPU usage from frequent re-renders

Thanks for tackling this, @jayy-77. The core idea -- throttle full SVG teardowns, separate interaction styling from structural re-renders, and use adaptive polling -- is sound and should noticeably reduce GPU/CPU load. I have a few issues to flag, some of which could introduce bugs.


TopologyGraph.svelte

1. buildStructureKey omits reactive dependencies that trigger full re-renders

The structure key captures isMinimized, debugEnabled, node IDs, and edges. But renderGraph() also reads rdmaCtlData, identitiesData, and tbBridgeData (the Thunderbolt/RDMA/identity debug labels). If those change without the node/edge set changing, the graph will show stale debug info until the next 5-second metrics refresh. These are displayed only in debug mode, so it is arguably low priority, but worth noting. Consider including a hash of those data objects in the key when debugEnabled is true, or at least document the trade-off.

2. dynamic-fill class only applied to the hexagon (default/unknown) shape

The updateInteractionStyles function targets .dynamic-fill for fill color updates, but looking at the diff, only the hexagon polygon gets class="node-outline dynamic-fill" (line 920 in the current code). The Mac Studio rect, Mac Mini rect, and MacBook Pro rect all have class="node-outline" without dynamic-fill. This means updateInteractionStyles will update stroke/stroke-width on all node types, but the fillColor update will only apply to hexagons. For Mac devices, the fill is hardcoded to #1a1a1a (dark background), so this is likely intentional -- but if a user hovers over a Mac Studio node, the outline will change color while the fill stays constant, which may look inconsistent with the hexagon hover behavior. Please confirm this is intended or add dynamic-fill where appropriate.

3. Interaction style colors are duplicated between renderGraph and updateInteractionStyles

The wire/fill color logic (lines ~567-590 in the existing renderGraph) is copy-pasted into updateInteractionStyles. If anyone adjusts hover colors in one place, they must remember to update the other. Consider extracting the color computation into a shared helper:

function computeNodeColors(nodeId: string) {
  const isHighlighted = highlightedNodes.has(nodeId);
  const isInFilter = filteredNodes.size > 0 && filteredNodes.has(nodeId);
  const isFilteredOut = filteredNodes.size > 0 && !filteredNodes.has(nodeId);
  const isHovered = Boolean(onNodeClick) && hoveredNodeId === nodeId && !isInFilter;
  // ... return { wireColor, fillColor, strokeWidth, opacity }
}

4. lastStructureKey and lastFullRenderAt are plain let variables, not $state

This is actually fine for the current usage -- they are only read inside $effect closures and not used reactively. But it is worth noting that these are module-level mutable variables that escape Svelte's reactivity system. A comment explaining why they are intentionally non-reactive would help future readers.

5. The metrics refresh interval (5s) does not account for performance.now() wrapping or tab sleep

When a browser tab is backgrounded, performance.now() continues to advance on most browsers, but the $effect that checks now - lastFullRenderAt >= METRICS_RERENDER_INTERVAL_MS only fires when data changes (i.e., when a new poll response arrives). If polling backs off to 10s when hidden, the metrics will refresh at the poll interval (10s), not the 5s constant. That is fine -- just mentioning it because the 5s constant could be misleading.

6. Flow animation CSS: the animation: none override may conflict with the global app.css rule

In app.css, .graph-link has animation: flowAnimation 1s linear infinite. In the component <style>, :global(.graph-link) now sets animation: none, and :global(svg.animate-links .graph-link) re-enables it. Because app.css is a global stylesheet, specificity could cause the app.css rule to win over the component's :global(.graph-link) rule depending on stylesheet load order. I would recommend either:

  • Removing the .graph-link animation rule from app.css entirely (since the component now controls it), or
  • Using higher specificity in the component (e.g., :global(svg .graph-link) for the animation: none rule).

If both rules end up active, the animation would not actually be disabled in non-debug mode, defeating the purpose of this optimization.


app.svelte.ts

7. scheduleNextPoll(0) on visibility change can stack with an in-flight poll

When the tab becomes visible, handleVisibilityChange calls scheduleNextPoll(0). If a poll is already in flight (from the background 10s timer), pollLoop will see pollInFlight === true and reschedule at 250ms. That is handled correctly. However, consider the sequence:

  1. Poll completes, schedules next at 10s (hidden tab).
  2. Tab becomes visible, scheduleNextPoll(0) clears the 10s timeout and fires immediately -- good.
  3. But if the tab flickers visible/hidden rapidly, multiple scheduleNextPoll(0) calls will fire. Each clears the previous timeout, so only the last one runs. This is fine.

Overall the polling logic looks correct. Nice work avoiding the setInterval stacking problem.

8. computePollIntervalMs accesses this.downloads and this.instances -- verify these are reactive

In the Svelte 5 runes mode, $state fields on a class are reactive. this.downloads and this.instances are indeed $state (confirmed in the store). The method is called from pollLoop (not inside a reactive context), so it will read the current snapshot value. This is correct behavior.

9. Missing cleanup of the visibility event listener if startPolling is called multiple times

The visibilityListenerRegistered flag prevents duplicate listeners, which is good. However, if stopPolling is called and then startPolling is called again, the flag is properly reset in stopPolling and the listener is re-registered. This looks correct.


app.css

10. prefers-reduced-motion block -- good accessibility improvement

The opacity: 0.6 fallback gives users who prefer reduced motion a subtle visual indicator where the shooting star would be, rather than hiding it entirely. The ::before pseudo-element (the trail) is hidden. This is a clean approach.


ModelPickerGroup.svelte

11. SVG <title> child element -- correct accessibility fix

Moving title from an attribute on <svg> to a <title> child element is the correct approach per the SVG spec. The title attribute on <svg> elements is not semantically equivalent to the <title> child element -- screen readers and browsers use the child element for accessible names. Good catch.


Summary

The PR delivers meaningful performance improvements and the approach is well-reasoned. The main concerns are:

  1. CSS specificity conflict between app.css and the component's :global(.graph-link) for the animation override (Fix ChatGPT API endpoint #6) -- this could silently negate the animation optimization.
  2. Duplicated color logic between renderGraph and updateInteractionStyles ([BOUNTY - $100] Add support for LLaVA #3) -- maintenance risk.
  3. dynamic-fill only on hexagons ([BOUNTY - $200] Radio Networking Module #2) -- confirm this is intentional.

Everything else looks solid. The adaptive polling and throttled rendering are the right patterns for this problem.

@AlexCheema
Copy link
Copy Markdown
Contributor

Code Review — PR #1383: Dashboard: reduce GPU usage from frequent re-renders

CI: No checks found | Merge status: CONFLICTING (needs rebase)

Overview

This PR tackles a real performance problem: the dashboard's topology graph was doing full SVG teardown-and-rebuild on every state poll (every 1s), even when only metrics changed — driving high GPU/CPU usage. The fix introduces three optimizations:

  1. Throttled graph rendering (TopologyGraph.svelte): Structural changes (node/edge additions, layout mode) trigger immediate re-renders; metrics-only updates are throttled to 5s intervals. Hover/filter interactions are handled via lightweight D3 style updates instead of full rebuilds.
  2. Adaptive polling (app.svelte.ts): Replaces the fixed 1s setInterval with a setTimeout-based poll loop that adapts: 1s when downloads/instances are active, 3s when idle, 10s when the tab is hidden. Includes visibility-change detection for immediate refresh on tab focus.
  3. Conditional link animation (TopologyGraph.svelte + CSS): Flow animations on edge links are only enabled in debug mode via an animate-links CSS class toggle, reducing GPU compositing work in normal usage.
  4. Minor fixes: SVG <title> child elements replace invalid title attributes in ModelPickerGroup.svelte; prefers-reduced-motion media query added to app.css.

Issues

High Severity

1. Svelte 5 $effect tracks reads inside renderGraph() — throttling has non-deterministic dependency tracking

In Svelte 5, $effect automatically tracks all reactive ($state/$derived) reads that occur during its execution. When shouldRender is true and renderGraph() runs, it reads rdmaCtlData, identitiesData, tbBridgeData, hoveredNodeId, filteredNodes, highlightedNodes, and many other reactive values. These reads are tracked by the effect.

The problem: when any of those tracked values change, the effect will re-run. But on re-run, if shouldRender is false (structure key unchanged, within 5s window), renderGraph() does NOT execute — so those reactive dependencies are NOT tracked in that run. This means the effect's dependency set fluctuates between runs:

  • Run with render: tracks data + isMinimized + debugEnabled + everything inside renderGraph()
  • Run without render: tracks only data + isMinimized + debugEnabled

In practice, a change to hoveredNodeId will trigger the first $effect (because it was tracked on the previous rendering run), but the effect will see shouldRender === false and skip the render — which is correct for the throttle. However, on that skip, hoveredNodeId will NOT be re-tracked, so the NEXT hover change won't trigger this effect at all (until data changes). The second $effect (the interaction one) handles hover, so this is partially mitigated. But the dependency tracking is non-deterministic and could produce surprising behavior if renderGraph() is refactored to read different reactive values.

Recommendation: Wrap renderGraph() calls in untrack() to prevent it from registering additional dependencies beyond the ones explicitly listed at the top of the effect. This makes the dependency set deterministic.

2. CSS specificity conflict: app.css .graph-link vs component :global(.graph-link)

On main, app.css defines:

.graph-link {
    animation: flowAnimation 1s linear infinite;
    /* ... other styles */
}

The PR adds to the component's <style>:

:global(.graph-link) {
    animation: none;
}
:global(svg.animate-links .graph-link) {
    animation: flowAnimation 0.75s linear infinite;
}

Both app.css and the component's :global(.graph-link) have the same specificity (one class selector). The winner depends on stylesheet load order. If app.css loads after the component styles (which is typical in Vite/SvelteKit builds where app.css is imported in the layout), the animation: flowAnimation 1s linear infinite from app.css will override the component's animation: none, meaning the animation is never actually disabled. This would silently negate the GPU optimization for non-debug mode.

Fix: Either remove the .graph-link animation rule from app.css (since the component now controls it), or increase specificity in the component (e.g., :global(svg .graph-link) for the animation: none rule).

Medium Severity

3. buildStructureKey() omits debug-mode reactive data

The structure key includes isMinimized, debugEnabled, node IDs, and edge keys. But renderGraph() also renders debug labels from rdmaCtlData, identitiesData, and tbBridgeData. If these change without the node/edge set changing, the graph will show stale debug labels until the next 5s metrics refresh. Since these only display in debug mode, the impact is limited, but users who enable debug mode presumably care about data accuracy.

Suggestion: When debugEnabled is true, include a hash or version of the debug data objects in the structure key.

4. Duplicated color/styling logic between renderGraph and updateInteractionStyles

The wire color, fill color, and stroke width computation is copy-pasted between renderGraph() (lines ~567-590) and the new updateInteractionStyles(). This is a maintenance hazard — future color changes must be synchronized in both places or the hover/filter styles will drift from the render styles.

Suggestion: Extract a computeNodeColors(nodeId) helper that returns { wireColor, fillColor, strokeWidth, opacity } and call it from both locations.

5. dynamic-fill class only applied to hexagon (default/unknown) shape

updateInteractionStyles updates .dynamic-fill elements' fill color, but only the hexagon polygon gets class="node-outline dynamic-fill". The Mac Studio, Mac Mini, and MacBook Pro shapes all use class="node-outline" without dynamic-fill. This means:

  • Hexagons: hover/filter changes both outline stroke AND fill color
  • Mac devices: hover/filter changes outline stroke only, fill stays at #1a1a1a

This is probably intentional (Mac devices have a specific dark fill with memory-bar overlays), but the asymmetry should be documented with a comment explaining the intent.

Low Severity

6. prefers-reduced-motion addition will conflict with main

The PR adds a prefers-reduced-motion block to app.css. The current main already has a comprehensive prefers-reduced-motion section covering .shooting-star, .graph-link, .status-pulse, .cursor-blink, .onboarding-connection-line, and a universal * selector. The PR's addition is a subset and will cause a merge conflict. After rebasing, this hunk should likely be dropped entirely.

7. lastFullRenderAt set in ResizeObserver callback affects throttle window

In the onMount resize handler, both lastStructureKey = null and lastFullRenderAt = performance.now() are set, then renderGraph() is called directly (bypassing the $effect throttle). Setting lastFullRenderAt prevents the next $effect from re-rendering within 5s. This is correct behavior but worth a comment explaining the intent.

What's good

  • The core architecture is sound: Separating structural re-renders from interaction-style updates is the right approach. D3 select-and-mutate for hover/filter is much cheaper than full SVG teardown.
  • Adaptive polling is well-implemented: The setTimeout chain with pollInFlight guard prevents stacking. The visibility-change handler correctly clears pending timeouts before scheduling. The 1s/3s/10s intervals are reasonable.
  • data-node-id attribute: Clean approach for enabling D3 to target specific node groups without traversing the DOM.
  • SVG <title> fix: Correct per the SVG spec — <title> as a child element is semantically meaningful for accessibility; the title attribute on <svg> is not.
  • Link animation toggle via CSS class: Toggling animate-links on the SVG element is an efficient way to control GPU-intensive CSS animations without touching the DOM.
  • Event listener cleanup: The visibilityListenerRegistered flag and proper cleanup in stopPolling() is correct.
  • handleVisibilityChange as arrow function: Correctly preserves this binding for the event listener callback.

Merge blockers

  1. Merge conflicts — The PR is based on an older version of main (opened Feb 4) and has CONFLICTING merge status. It needs a rebase, particularly for app.css where main now has a comprehensive prefers-reduced-motion block.
  2. CSS specificity conflict (issue [BOUNTY - $200] Radio Networking Module #2) — The animation disable/enable may not work as intended if app.css loads after component styles. This would silently negate the primary GPU optimization.
  3. No CI checks — No status checks are reported. The dashboard build and component rendering should be verified after changes.

Verdict

The PR addresses a genuine performance problem with a well-reasoned multi-pronged approach. The adaptive polling and throttled rendering are the right patterns. However, it needs a rebase onto current main, the CSS specificity conflict for the animation toggle needs resolution (otherwise the primary GPU optimization may be silently ineffective), and the duplicated color logic should be extracted to a shared helper. The Svelte 5 $effect dependency tracking issue (issue #1) is subtle but important — wrapping renderGraph() in untrack() would make the behavior deterministic and prevent future bugs. After those fixes, this would be a solid improvement.

Review only — not a merge approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

High GPU usage when EXO Dashboard is open in browser (increases with node count)

2 participants