Skip to content

Feat: Add signal safety as single commit#213

Open
jlukic wants to merge 21 commits into
mainfrom
feat/signal-safety-bench
Open

Feat: Add signal safety as single commit#213
jlukic wants to merge 21 commits into
mainfrom
feat/signal-safety-bench

Conversation

@jlukic
Copy link
Copy Markdown
Member

@jlukic jlukic commented May 18, 2026

Clean bench test for #212

@vercel
Copy link
Copy Markdown

vercel Bot commented May 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
semantic-next Ready Ready Preview, Comment May 20, 2026 3:24pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
mcp Ignored Ignored Preview May 20, 2026 3:24pm

Request Review

@github-actions github-actions Bot added Templating Modifies templating package Reactivity Modifies reactivity package Tests Modifies tests Utils Modifies utilities package labels May 18, 2026
@jlukic jlukic closed this May 18, 2026
@jlukic jlukic reopened this May 18, 2026
@semantic-performance-bot
Copy link
Copy Markdown

semantic-performance-bot Bot commented May 18, 2026

🟡 Mixed (mostly faster) for 4968ed3 on Benchmark Suite 📊

Base: main · Action: #26172216232 · Raw: bench-report.json

Feat: Add signal safety as single commit

Warning

This PR improves ✅ 24 tests while regressing on ❌ 3 tests.

✅ 24 faster · ❌ 3 slower · 🔍 20 unsure · ⚪ 24 no change · 🏆 3 new peaks · 📜 8 reopened


✅ Faster (24) — top 5 shown

Metrics where this PR confidently improved performance compared to main.

metric Improvement
signal:reactive-set-property-by-id-200 -99% (202ms) 🏆
signal:reactive-set-index-300 -99% (113ms) 🏆
signal:reactive-list-filter-1000x300 -96% (119ms) 🏆
todo:rename-500 -92% (187ms) 🏆
signal:reactive-push-2000x20 -83% (188ms) 🏆
Show all 24 faster metrics
metric Improvement
signal:reactive-set-property-by-id-200 -99% (202ms) 🏆
signal:reactive-set-index-300 -99% (113ms) 🏆
signal:reactive-list-filter-1000x300 -96% (119ms) 🏆
todo:rename-500 -92% (187ms) 🏆
signal:reactive-push-2000x20 -83% (188ms) 🏆
todo:remove-middle-100 -80% (43ms) 🏆
todo:remove-last-100 -78% (39ms) 🏆
todo:toggle-first-100 -76% (44ms) 🏆
todo:remove-50-front -73% (12ms) 🌟
todo:toggle-middle-100 -73% (32ms) 🌟
todo:toggle-last-100 -72% (34ms) 🌟
todo:remove-first-100 -71% (40ms) 🌟
todo:toggle-100 -69% (29ms) 🌟
todo:remove-50-back -67% (9ms) 🌟
todo:remove-50-middle -64% (9ms) 🌟
signal:computed-subscribe-unsubscribe-10k -27% (3ms) ⭐
signal:computed-unobserved-200x500 -19% (5ms) ⭐
signal:computed-chain-10x60k -11% (22ms)
todo:add-20 -11% (1ms)
hydrate:helper-100-state-change-1k -11% (0ms)
todo:toggle-all-200 -9% (87ms)
template:subtemplate-helpers-heavy-100x500 -4% (2ms)
signal:reaction-coalesce-400x100 -4% (2ms)
signal:reactive-multi-read-5x160k -4% (9ms)

❌ Slower (3)

Metrics where this PR confidently regressed performance compared to main.

metric Regression
todo:edit-cycle-5 +27% (18ms) ❗
signal:reaction-dep-diff-45k +7% (2ms)
signal:reactive-stable-deps-3reads-5000x100 +5% (10ms)

🏆 New peaks (3)

These metrics hit a new best on this PR. The most recent candidate is usually the cause.

metric improvement prior peak likely candidates
todo:edit-cycle-5 19% 2683d93 7041556, f36d043, 7388c0b (+2 more)
todo:remove-first-100 6% 0dd3f4f 7041556, f36d043, 7388c0b
todo:toggle-last-100 6% f36d043 7041556

📜 Regressions from peak (8)

These metrics were faster on an earlier push to this PR. The most recent candidate is usually where to look.

metric regression prior peak likely candidates
signal:reaction-dep-diff-45k 20% 2683d93 7041556, f36d043, 7388c0b (+2 more)
todo:toggle-100 9% 7388c0b 7041556, f36d043
todo:remove-50-middle 8% f36d043 7041556
template:each-mount-1000 6% f36d043 7041556
todo:remove-50-back 6% 0dd3f4f 7041556, f36d043, 7388c0b
todo:remove-last-100 3% 0dd3f4f 7041556, f36d043, 7388c0b
todo:remove-50-front 3% f36d043 7041556
todo:toggle-middle-100 2% f36d043 7041556
⚪ No Change (24)

Metrics where this PR measured within ±2% of main — no meaningful performance change detected.

metric Change
compiler-micros:ast-walk-15k -2.0% – +0.1%
renderer-micros:build-html-string-10k -0.7% – +1.6%
todo:bulk-add-500 -1.1% – +0.6%
krausest:create-10k -0.6% – +0.4%
krausest:create-1k -1.1% – +1.0%
hydrate:each-100 -1.8% – -0.0%
hydrate:each-100-mount -1.7% – -0.2%
todo:edit-start-10 -1.2% – +1.0%
renderer-micros:expr-lisp-50k -1.2% – +0.8%
renderer-micros:expr-simple-100k -0.4% – +1.4%
todo:filter-cycle-20 -0.5% – +0.9%
signal:flush-fanout-allocation-1000x500 -1.1% – +0.6%
hydrate:helper-100-mount -0.7% – +2.0%
compiler-micros:parse-cold-complex-200 -1.2% – +0.5%
compiler-micros:parse-cold-normal-500 -1.9% – +0.9%
signal:reaction-flush-noop-5m -1.1% – +0.0%
signal:reactive-fanout-500x1200 -1.1% – +0.3%
signal:reactive-stable-fanout-5000x100 -0.0% – +1.3%
krausest:replace-1k -1.5% – +1.5%
krausest:select-40 +0.2% – +0.9%
compiler-micros:snippet-args-5k -1.1% – +0.4%
template:subtemplate-data-blob-100 -0.6% – +0.8%
template:subtemplate-helpers-light-100x500 -1.0% – +1.4%
template:subtemplate-shorthand-props-100x500 -1.7% – +1.2%
🔍 Unsure (20)

Too Fast to Measure Precisely (20)

On benches this short, OS jitter, GC, and JIT pauses drown out anything under 4%. Bigger changes than that still show up.

metric Change Test Time Expected Noise
template:active-indicator-200 -2.4% – +1.4% ~31ms ±6%
template:active-indicator-nested-200 -5.1% – +0.1% ~15ms ±9%
krausest:append-1k -1.6% – +2.0% ~87ms ±4%
krausest:clear-10k -2.0% – +3.2% ~131ms ±7%
todo:clear-completed-250 -3.4% – -0.6% ~40ms ±4%
renderer-micros:dom-walker-1000x15 -2.7% – -0.4% ~49ms ±5%
template:each-mount-1000 +0.7% – +3.6% ~49ms ±6%
renderer-micros:expr-js-10k -3.1% – -0.4% ~20ms ±6%
signal:reactive-list-replace-1000x1000 -2.0% – +0.8% ~320ms ±3%
krausest:remove-row-back-100 -13.6% – +6.3% ~22ms ±26%
krausest:remove-row-front-20 -6.6% – +5.4% ~10ms ±15%
krausest:remove-row-middle-20 -7.5% – +19.9% ~7ms ±32%
signal:set-same-10m -0.3% – +2.5% ~19ms ±5%
template:snippet-args-per-key-100x500 -3.2% – +0.5% ~30ms ±5%
template:snippet-in-subtemplate-100x1k -2.8% – +1.4% ~20ms ±8%
template:stable-ref-mutate-500 -3.6% – +1.7% ~13ms ±9%
signal:sub-unsub-100k -0.2% – +3.4% ~29ms ±5%
template:subtemplate-reactive-data-100x500 -2.4% – +0.1% ~42ms ±4%
krausest:swap-rows-20 -1.4% – +8.7% ~120ms ±12%
krausest:update-10th-50 -7.9% – +3.1% ~25ms ±12%
📖 Bench glossary (71 metrics)
metric what it tests
compiler-micros:ast-walk-15k Walks a kitchen-sink AST through optimizeAST 15000 times. Merge, hoist, and recurse pass.
compiler-micros:parse-cold-complex-200 Compiles a feature-dense kitchen-sink template 200 times. Catches parser regressions on uncommon block paths.
compiler-micros:parse-cold-normal-500 Compiles a TodoMVC-style component template 500 times. Headline metric for normal-component compile throughput.
compiler-micros:snippet-args-5k Parses four representative subtemplate-call shapes 5000 times each. Snippet args extraction.
hydrate:each-100 Reassigns the items of a hydrated 1000-item list to a fresh array with the same keys and data.
hydrate:each-100-mount Hydrates a server-rendered 1000-item list and waits for it to become interactive without re-rendering.
hydrate:helper-100-mount Hydrates a 1000-item list where each item calls a helper that reads state shared across the list.
hydrate:helper-100-state-change-1k Walks the shared activeID across every item in a hydrated 1000-item list so two items repaint per cycle.
krausest:append-1k Appends 1000 new rows onto an existing 1000-row table.
krausest:clear-10k Clears a 10000-row table back to empty in a single operation.
krausest:create-10k Renders a fresh 10000-row table into an empty parent at ten times the create-1k scale.
krausest:create-1k Renders a fresh 1000-row table into an empty parent.
krausest:remove-row-back-100 Removes the last row 100 times from a 1000-row table, with no other rows needing to move.
krausest:remove-row-front-20 Removes the first row 20 times from a 1000-row table, with all remaining rows sliding up each time.
krausest:remove-row-middle-20 Removes the middle row 20 times from a 1000-row table, with the rows below it sliding up each time.
krausest:replace-1k Replaces 1000 rows with a fresh 1000-row set, diffing the keyed list against a populated table.
krausest:select-40 Highlights one row at a time across 40 rows so only the previous and newly highlighted rows update.
krausest:swap-rows-20 Swaps the second and second-to-last rows in a 1000-row table, repeated 20 times.
krausest:update-10th-50 Updates the label on every tenth row of a 1000-row table, looped 50 times to lift the work above noise.
renderer-micros:build-html-string-10k Builds the HTML string for a realistic card AST 10000 times. Raw assembly throughput.
renderer-micros:dom-walker-1000x15 Runs bindMarkers across a 1000-node card fragment 15 times. TreeWalker pass and binding dispatch.
renderer-micros:expr-js-10k Evaluates one arithmetic expression and one ternary 10000 times each. JS-eval hot path.
renderer-micros:expr-lisp-50k Evaluates one Lisp-style helper call 50000 times. Parse-cache lookup and helper dispatch.
renderer-micros:expr-simple-100k Evaluates one simple identifier and one dotted path 100000 times each. Property-lookup hot path.
signal:computed-chain-10x60k Propagates a value change from root to leaf through a 10-deep chain of derived signals 60000 times.
signal:computed-subscribe-unsubscribe-10k 10000 create-computed + attach-observer + detach cycles. Lifecycle cost the refcount path must keep acceptable.
signal:computed-unobserved-200x500 200 unobserved computed signals, root updated 500 times. Measures the eager-recompute cost the refcount removes.
signal:flush-fanout-allocation-1000x500 500 subscribers fanout across 1000 flush cycles. Each flush spreads pendingReactions; tests per-flush allocation churn.
signal:reaction-coalesce-400x100 Sets one signal 100 times then flushes once across 400 bursts so 100 subscribers wake one time per burst.
signal:reaction-dep-diff-45k Toggles which of two signals a subscriber reads across 45000 cycles. Per-run dep-set diffing.
signal:reaction-flush-noop-5m Calls Reaction.flush() 5000000 times with no pending work. Scheduler dispatch overhead.
signal:reactive-fanout-500x1200 Fans out one signal's value change to 500 subscribers across 1200 successive updates.
signal:reactive-list-filter-1000x300 Changes a search-term signal 300 times, re-scanning a 1000-item list on each change.
signal:reactive-list-replace-1000x1000 Replaces a 1000-item list signal with a fresh 1000-item array and rescans it 1000 times.
signal:reactive-multi-read-5x160k Changes five signals in turn for 32000 rounds with one subscriber reading all five.
signal:reactive-push-2000x20 Appends 20 items onto an empty list signal with a subscriber, across 2000 reset cycles.
signal:reactive-set-index-300 Replaces one item by index in a 1000-item list signal across 300 updates, with a subscriber.
signal:reactive-set-property-by-id-200 Finds an item by id and updates one field in a 1000-item list signal across 200 alternating updates.
signal:reactive-stable-deps-3reads-5000x100 5000 reactions × 3 signals × 100 cycles. Each run clears + re-adds 3 stable dep edges.
signal:reactive-stable-fanout-5000x100 5000 reactions × 1 signal × 100 invalidations. Per-run Set.delete + add on a stable dep edge.
signal:set-same-10m Sets a signal to its current value 10000000 times. Exercises the no-op fast path when nothing changes.
signal:sub-unsub-100k Creates and tears down a subscriber on one signal across 100000 cycles. Subscription churn cost.
template:active-indicator-200 Cycles selectedId across 200 list items. Only the previously and newly active items update their class.
template:active-indicator-nested-200 Cycles currentUrl through 50 leaf urls in a 5×10×4 nav. Only the previously and newly active leaves should update their…
template:each-mount-1000 Mounts a fresh 1000-item each block with five-field items so per-record allocation cost dominates the wall clock.
template:snippet-args-per-key-100x500 Mutates one snippet arg's source across 100 invocations, 500 cycles. Adjacent no-signal expressions stay quiet.
template:snippet-in-subtemplate-100x1k Mutates one subtemplate prop's source across 25 cards each invoking 4 inner snippets, 1000 cycles. Snippet bodies shoul…
template:stable-ref-mutate-500 Replaces one item by index in a 500-item list across 100 cycles. Only that item's expressions re-render.
template:subtemplate-data-blob-100 Mutates one field inside data=expression on 100 children. Every child re-renders by design.
template:subtemplate-helpers-heavy-100x500 100 subtemplates, 4 inner bindings where three call helpers shaped like userland reality — Intl.NumberFormat, Array.fin…
template:subtemplate-helpers-light-100x500 100 subtemplates, 4 inner bindings each calling formatDate / classIf / capitalize, 500 cycles. Mutates one source signa…
template:subtemplate-reactive-data-100x500 Mutates one verbose reactiveData field across 100 child subtemplates, 500 cycles. Only the changed field re-evaluates.
template:subtemplate-shorthand-props-100x500 Mutates one shorthand prop's source across 100 child subtemplates, 500 cycles. Only that prop re-evaluates.
todo:add-20 Appends 20 todo items one at a time, like a user typing entries in a row.
todo:bulk-add-500 Renders 500 todo items added at once from a single data load.
todo:clear-completed-250 Clears 250 completed items from a 500-item list in one action, like clicking clear completed.
todo:edit-cycle-5 Runs 5 full edit-then-save cycles on different items, like editing a row and saving it.
todo:edit-start-10 Enters edit mode on 10 different items in a row, like double-clicking each one.
todo:filter-cycle-20 Cycles through active, completed, and all filters 20 times on a 100-item list.
todo:remove-50-back Deletes 50 items from the end of a 100-item list, one click at a time.
todo:remove-50-front Deletes 50 items from the front of a 100-item list, one click at a time.
todo:remove-50-middle Deletes 50 items from the middle of a 100-item list, one click at a time.
todo:remove-first-100 Deletes the first item 100 times from a 200-item list, with remaining items moving up each time.
todo:remove-last-100 Deletes the last item 100 times from a 200-item list, with no other items needing to move.
todo:remove-middle-100 Deletes the middle item 100 times from a 200-item list, walking halfway through to find each target.
todo:rename-500 Renames items in a 100-item list 500 times via single-field setProperty without editingId co-fires.
todo:toggle-100 Cycles through the first 10 items 10 times each, like a user toggling items repeatedly down a list.
todo:toggle-all-200 Toggles all 100 items completed and back across 200 cycles via the master checkbox.
todo:toggle-first-100 Toggles the first item in a 100-item list 100 times, alternating completed on and off.
todo:toggle-last-100 Toggles the last item in a 100-item list 100 times, alternating completed on and off.
todo:toggle-middle-100 Toggles a middle item in a 100-item list 100 times, alternating completed on and off.

Sample size: 80 floor / 280 max · Noise floor: ±2% · Timeout: 3min · Wall-clock: 12m51s

The refChanged branch in each.js Phase 3 called refreshSnapshotAndDetect
(which updates snapshot in place) and then unconditionally overwrote it
with a fresh createSnapshot allocation. Reference mode hits this branch
on position-change reconciles (e.g. filter cycling, list reordering);
clone mode never enters it because identity churn routes through the
sibling isArrayAsMode branch which doesn't have the double-snapshot.

Match the sibling branch's pattern: only createSnapshot when no prior
object snapshot existed.
The fresh-take agents (Challenge + Survey + Neutral) all flagged that
set-same-10m and sub-unsub-100k regressing under reference safety is
inexplicable from the primitive-Signal hot path itself (protect()
early-returns before reading this.safety; bytecode is identical to
clone mode).

The convergent hypothesis is that the regression is inherited
cross-bench state: the upstream list-Signal benches do dramatically
different allocation/cloning work in clone vs reference mode, leaving
V8 with different JIT feedback / heap layout / GC pressure when the
primitive benches execute later in the same Chrome session.

Running the primitives first isolates them. If the regressions persist
at the top of the script, the cause is intrinsic to the primitive path.
If they disappear, the cause is cross-bench state — diagnostic, not a
production code change.
Captures the flow that produced real results in the signal-safety
investigation: chrome-devtools MCP traces, counter instrumentation
of bundled framework, local tachometer with custom-built bundles,
fresh-take subagents when reasoning loops. Documents the dead ends
that didn't work — static reading, V8 hypothesis without skill
citation, pushing bench-file changes that get overlaid away.
…nd evidence integrity

Adds a fixed bench-weight heuristic (krausest 5x / todo·template·hydrate 2x /
synthetics 0.25x) with a gate that the heaviest real-workload regressor must be
measured, not reasoned about by analogy. Splits the instrument flow into gather
(trace, baseline diff — no hypothesis) and steelman (counters/Playwright —
hypothesis required), with the hypothesis born from a measurement rather than a
read. Adds an orientation step: read the bench and learn how the component works
user-side via the authoring curriculum, since the renderer is a separate package
from the component surface. Adds an evidence-integrity rule for handling prior
cause-claims, and keeps the no-bench-editing guardrail.
Reframes the weight heuristic as a focus-and-coverage budget (which suite to
dig, what the report owes each regressor) rather than an investigation
stopwatch, and adds the contrast principle: a bench that did not regress is a
control, and the delta between a regressor and a flat near-neighbor localizes
the cause. Names deflection as the anti-pattern to watch for, with the tell that
reasoning toward it sounds composed and rigorous, so the check is direction not
felt-soundness. Keeps firmness for the calibration refusal (a methodology stall
without a demonstration is not grounds to stop) while removing scolding tone.
jlukic added 2 commits May 20, 2026 09:42
…ormance

Makes the skip-prone steps into checkable stops. Orientation gate: write the
expected-reactivity prediction (from the template AST and component model)
before tracing, so the regression reads as the gap between expected and
measured. V8 gate: V8-internals claims cite the current performance-v8 skill,
framed as a recency patch (training knowledge predates the May-2026 skills),
not a competence check. Grounding gate: name the AST node or construct before
asserting a mechanism. Adds a Parse-the-template step (validate_template with
includeAST) and a Step 9 fix-and-confirm. Gates verify the produced artifact
rather than surveilling the agent.
…investigate-performance

From the first real run of the gated skill. Adds: a uniform effect can't
explain a non-uniform profile (a multiplier identical on winners and losers is
shared/inherited state, not a per-bench cause); ablation as the most decisive
confirmation (remove the cause, show the effect vanishes and the win survives);
'it's just a machine difference' named as a dead end when local sign reproduces
but CI magnitude doesn't; ground where the measurement points (plumbing, not
only the user-facing component); and performance as iterative — a named residual
is an honest loop boundary, relabeling it noise is not.
jlukic added 2 commits May 20, 2026 11:08
A plain-object walk copied index keys onto a bare object and dropped the
backing buffer. Delegate binary leaves to structuredClone, which is both
correct and faster here. Adds a clone-vs-structuredClone bench.
jlukic added 3 commits May 20, 2026 11:19
…tState

defaultState is the definition's declaration and is correctly shared across
instances (Template.clone manifests, it does not duplicate). The leak was that
in reference mode each instance's Signal aliased that shared default and a
mutation wrote through into the prototype. Isolate at the seam where instance
state is derived from the defaults — clone the seeded object value per Signal —
instead of cloning defaultState at clone(). Restores the shared-defaultState
contract (subtemplate-settings Template.clone tests pass unchanged) and keeps
the per-read reference win.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Docs Modifies documentation Reactivity Modifies reactivity package Templating Modifies templating package Tests Modifies tests Utils Modifies utilities package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant