UI graph view unusable on large state machines: smart-edge pathfinding reruns for every edge on every render

I first hit this on one of my own Burr apps: clicking through steps in the tracking UI would freeze the tab for a second or more per click, and the bigger the state machine got, the worse it was. I dug in and it turns out the graph view re-runs A* pathfinding for every edge on every render.

# Current behavior

Clicking or hovering a step in the step list highlights the matching node and edges in the graph. On a small app that's instant. Past a few dozen actions it blocks the main thread for seconds per click.

I benchmarked it with Playwright against the production UI build, timing from click dispatch to the next paint, median of 12 clicks spread across the step list:

| Actions | Edges | Initial render | Click to paint (median) | Hover to paint (median) |
|---|---|---|---|---|
| 10 | 25 | 0.9s | 32ms | 30ms |
| 75 | 221 | 1.1s | 489ms | 495ms |
| 150 | 449 | 5.8s | 11,194ms (max 23,835ms) | 11,549ms |
| 300 | 899 | never renders, killed after 5 min | n/a | n/a |

Hover is just as expensive as click, and hover fires continuously while the pointer moves down the step list, so on a large app the graph pane is effectively frozen the whole time you're inspecting it.

A V8 profile during the click loop on the 150-action app shows 41% garbage collector, 16% `_buildNodes`, 12% `findPath`, plus `getNeighbors`. All of that is the A* pathfinding stack inside `@tisoap/react-flow-smart-edge` and the allocation churn it causes.

## Root cause

All in `telemetry/ui/src/components/routes/app/GraphView.tsx` (line numbers from current main):

1. `ActionActionEdge` calls `getSmartEdge` in its render body (line 162). That builds a pathfinding grid over the bounding box of all nodes and runs A* for that one edge. Nothing memoizes it, so it re-runs for every edge on every render. Per-edge cost grows with layout area, which grows with node count, so the total is superlinear in graph size.
2. Highlight state reaches nodes and edges through `NodeStateProvider`, and `_Graph` recreates the context value object inline on every render (lines 346-352). So every click and every hover enter/leave re-renders all N nodes and all E edges, and every edge re-runs A*.
3. `ActionActionEdge` also subscribes to the full node array via `useNodes()` (line 152), so any node store change re-renders every edge.
4. With auto-refresh on, the app refetches every 500ms (`REFRESH_INTERVAL`, `AppView.tsx:38`). react-query's structural sharing usually keeps the `data.application` reference stable across polls, but each poll still re-renders the view, and because of item 2 that re-renders every node and edge and re-runs A* for every edge, twice a second, with no interaction at all. Anything that does hand the graph a new object identity with the same structure (switching focus between a parent app and a structurally identical sub-app, for example) re-triggers the layout effect (`GraphView.tsx:332-343`) on top of that: full dagre relayout, new node/edge arrays, and a `fitView` that resets the viewport.

The 300-action hang is just item 1 paid once for all 899 edges over a very large grid at initial render.

## Steps to replicate behavior

1. Run the script below to log synthetic applications into `~/.burr`, then open the UI and open `ui-perf-bench/bench-150n`.
2. Click or hover step rows and feel the freeze. Open `bench-300n`: the graph never renders.

<details>
<summary>gen_large_app.py</summary>

```python
import argparse

from burr.core import ApplicationBuilder, State, action, default, expr


@action(reads=["counter"], writes=["counter"])
def step(state: State) -> State:
    return state.update(counter=state["counter"] + 1)


def build(n_nodes: int, extra_edges: int):
    names = [f"step_{i:03d}" for i in range(n_nodes)]
    transitions = []
    for i in range(n_nodes - 1):
        transitions.append((names[i], names[i + 1], default))
    for i in range(n_nodes):
        for k in range(1, extra_edges + 1):
            j = (i * 7 + k * 13 + 3) % n_nodes
            if j != i and j != i + 1:
                transitions.append((names[i], names[j], expr("counter < 0")))
    return names, transitions


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--nodes", type=int, default=150)
    parser.add_argument("--extra-edges", type=int, default=2)
    args = parser.parse_args()

    names, transitions = build(args.nodes, args.extra_edges)
    app = (
        ApplicationBuilder()
        .with_actions(**{name: step for name in names})
        .with_transitions(*transitions)
        .with_state(counter=0)
        .with_entrypoint(names[0])
        .with_identifiers(app_id=f"bench-{args.nodes}n")
        .with_tracker("local", project="ui-perf-bench")
        .build()
    )
    app.run(halt_after=[names[-1]])
    print(f"done: {args.nodes} nodes, {len(transitions)} edges")


if __name__ == "__main__":
    main()
```

</details>

## Library & System Information

Reproduced on the apache-burr 0.40.2 release build (macOS, Chromium headless via Playwright). The per-edge `getSmartEdge` call in the edge render body is in both 0.40.2 and current main, so both are affected. The file isn't identical between the two though: #586 swapped the layout engine from elkjs to dagre after 0.40.2, and when I re-ran the same benchmark on a main source build it came out much slower than 0.40.2 (9.7s vs 0.5s per click at 75 actions, numbers in the first comment below). That fits the mechanism: the pathfinding grid covers the layout's bounding box, and dagre spreads the same graph over more area, so every edge's A* run costs more.

# Expected behavior

The graph should repaint within a frame or two of a click or hover regardless of graph size, and a 300-action graph should render.

# Additional context

Sketch of a fix, all local to `GraphView.tsx`:

1. Memoize the `getSmartEdge` result per edge on geometry and node positions. Selection and hover don't move nodes, so highlighting stops paying for pathfinding entirely.
2. Above some node count, skip `getSmartEdge` and take the existing `getBezierPath` fallback so big graphs render at all (the code already falls back to bezier when `getSmartEdge` returns null).
3. Memoize the `NodeStateProvider` value in `_Graph`.
4. Key the relayout effect on graph structure (action names and transitions) instead of object identity, so identity churn without a structural change (focus switches, responses that aren't reference-stable) stops triggering full relayouts.

Happy to submit a PR.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UI graph view unusable on large state machines: smart-edge pathfinding reruns for every edge on every render #833

Current behavior

Root cause

Steps to replicate behavior

Library & System Information

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Actions	Edges	Initial render	Click to paint (median)	Hover to paint (median)
10	25	0.9s	32ms	30ms
75	221	1.1s	489ms	495ms
150	449	5.8s	11,194ms (max 23,835ms)	11,549ms
300	899	never renders, killed after 5 min	n/a	n/a

Uh oh!

UI graph view unusable on large state machines: smart-edge pathfinding reruns for every edge on every render #833

Description

Current behavior

Root cause

Steps to replicate behavior

Library & System Information

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions