Fix: clean up "version" fields in L2 swimlane / dep_gen JSON by indigo1973 · Pull Request #856 · hw-native-sys/simpler

indigo1973 · 2026-05-26T01:46:24Z

deps.json and l2_perf_records.json both carried a "version" field
that consumers were getting wrong:

deps.json bumped v2 → v3 in #808 but swimlane_converter still
guarded on version != 2, silently rejected every fresh capture,
and fell back to L2PerfRecord::fanout[] — losing the race-window
edges dep_gen replay exists to recover.
l2_perf_records.json's "version" was never a schema version — the
producer writes L2PerfLevel (1..4). Misreading it caused two
consumers to short-circuit on version != 2 / < 2, while phase
blocks only exist at level >= 3.

Producer side: deps.json drops the field outright; l2_perf_records.json
(a2a3 + a5) renames "version" → "l2_perf_level" so the name matches
its meaning. Consumer side: drop the three now-misaligned guards
(deps_to_graph, swimlane_converter.load_deps_json /
print_verbose_data_info, sched_overhead_analysis.parse_scheduler
from_json_phases) plus the version assertions in test_dep_gen,
test_dep_gen_chain, and _swimlane_validate.

Doc / comment fallout per .claude/rules/doc-consistency.md: retire
"v2 JSON" / "version 2" wording in favour of "l2_perf_level >= N"
across docs/dfx/{dep_gen,l2-swimlane-profiling}.md, profiling_levels.md
(a2a3 + a5), tools/README.md, the 6 scheduler comments (dispatch /
cold_path / types × a2a3, a5), and the tool docstrings. dep_gen.md §4
example + fields table rewritten against the strided-Tensor producer
(buffer_numel / start_offset / strides[] replace raw_shapes /
multi-dim offset[]); strides type corrected to uint32 (Tensor::strides
invariant > 0).

gemini-code-assist

Code Review

This pull request updates the dependency graph generation schema from version 2 (v2) to version 3 (v3), introducing a strided tensor representation. Key changes include adding an args array with detailed tensor slice geometry to tasks, replacing raw_shapes with buffer_numel in the tensors schema, and replacing simple offsets with explicit start offsets and strides for both consumers and producers in the edges schema. Downstream tools, documentation, and tests have been updated to support and validate the new v3 schema. There are no review comments, so I have no feedback to provide.

The L2 swimlane per-task commit on AICPU was copying up to 128*8B = 1 KB of fanout edges plus walking the producer's fanout linked list, every task, on the scheduler completion critical path. The fanout edges are already the static DAG and are reconstructed offline by dep_gen replay into deps.json — so the device-side hot path was paying GM-bandwidth and cache-miss cost to duplicate information host tooling already has. Scope is a2a3 only; a5 is untouched. Device side: - L2PerfRecord drops fanout[128] / fanout_count (~1088 B -> 64 B per record). - l2_perf_aicpu_complete_record drops the trailing fanout / fanout_count parameters; the impl no longer touches them. - scheduler_completion drops the fanout_arr build + linked-list walk; host_build_graph/aicpu_executor drops the same pattern at all four call sites. Host side: - l2_perf_collector::export_swimlane_json emits "fanout": [] and "fanout_count": 0 per task to keep the JSON schema shape stable, and drops the top-level "version" field, which had drifted into a duplicate of L2PerfLevel (see in-flight PR hw-native-sys#856 for the misaligned guard cleanup on the consumer side). Downstream tools: - swimlane_converter already preferred deps.json over task["fanout"]; it now reads the version-free schema and treats empty fanout as the expected steady state. - sched_overhead_analysis no longer gates phase parsing on the dropped "version" field — it gates on presence of aicpu_scheduler_phases, which is the right key. Tests and comments: - dep_gen tests drop the now-vacuous "fanout subset-of deps" gate and the auto-add of --enable-l2-swimlane that only existed to feed that gate. - _swimlane_validate drops the version assertion. - profiling_levels.md, dep_gen.h, dep_gen_replay.h, pto_orchestrator comments updated to reflect deps.json as the sole source of truth for fanout. Verified on a2a3sim: test_l2_swimlane, test_l2_swimlane_mixed, test_dep_gen, test_dep_gen_chain all pass with --enable-l2-swimlane --enable-dep-gen.

indigo1973 · 2026-05-27T06:05:23Z

/gemini review

gemini-code-assist · 2026-05-27T06:05:26Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

The L2 swimlane per-task commit on AICPU was copying up to 128*8B = 1 KB of fanout edges plus walking the producer's fanout linked list, every task, on the scheduler completion critical path. The fanout edges are already the static DAG and are reconstructed offline by dep_gen replay into deps.json — so the device-side hot path was paying GM-bandwidth and cache-miss cost to duplicate information host tooling already has. Scope is a2a3 only; a5 is untouched. Device side: - L2PerfRecord drops fanout[128] / fanout_count (~1088 B -> 64 B per record). - l2_perf_aicpu_complete_record drops the trailing fanout / fanout_count parameters; the impl no longer touches them. - scheduler_completion drops the fanout_arr build + linked-list walk; host_build_graph/aicpu_executor drops the same pattern at all four call sites. Host side: - l2_perf_collector::export_swimlane_json emits "fanout": [] and "fanout_count": 0 per task to keep the JSON schema shape stable, and drops the top-level "version" field, which had drifted into a duplicate of L2PerfLevel (see in-flight PR #856 for the misaligned guard cleanup on the consumer side). Downstream tools: - swimlane_converter already preferred deps.json over task["fanout"]; it now reads the version-free schema and treats empty fanout as the expected steady state. - sched_overhead_analysis no longer gates phase parsing on the dropped "version" field — it gates on presence of aicpu_scheduler_phases, which is the right key. Tests and comments: - dep_gen tests drop the now-vacuous "fanout subset-of deps" gate and the auto-add of --enable-l2-swimlane that only existed to feed that gate. - _swimlane_validate drops the version assertion. - profiling_levels.md, dep_gen.h, dep_gen_replay.h, pto_orchestrator comments updated to reflect deps.json as the sole source of truth for fanout. Verified on a2a3sim: test_l2_swimlane, test_l2_swimlane_mixed, test_dep_gen, test_dep_gen_chain all pass with --enable-l2-swimlane --enable-dep-gen.

deps.json and l2_perf_records.json both carried a "version" field that consumers were getting wrong: - deps.json bumped v2 → v3 in hw-native-sys#808 but swimlane_converter still guarded on `version != 2`, silently rejected every fresh capture, and fell back to L2PerfRecord::fanout[] — losing the race-window edges dep_gen replay exists to recover. - l2_perf_records.json's "version" was never a schema version — the producer writes L2PerfLevel (1..4). Misreading it caused two consumers to short-circuit on `version != 2` / `< 2`, while phase blocks only exist at level >= 3. Producer side: deps.json drops the field outright; l2_perf_records.json (a2a3 + a5) renames "version" → "l2_perf_level" so the name matches its meaning. Consumer side: drop the three now-misaligned guards (deps_to_graph, swimlane_converter.load_deps_json / _print_verbose_data_info, sched_overhead_analysis.parse_scheduler_ from_json_phases) plus the version assertions in test_dep_gen, test_dep_gen_chain, and _swimlane_validate. Doc / comment fallout per .claude/rules/doc-consistency.md: retire "v2 JSON" / "version 2" wording in favour of "l2_perf_level >= N" across docs/dfx/{dep_gen,l2-swimlane-profiling}.md, profiling_levels.md (a2a3 + a5), tools/README.md, the 6 scheduler comments (dispatch / cold_path / types × a2a3, a5), and the tool docstrings. dep_gen.md §4 example + fields table rewritten against the strided-Tensor producer (buffer_numel / start_offset / strides[] replace raw_shapes / multi-dim offset[]); strides type corrected to uint32 (Tensor::strides invariant > 0).

gemini-code-assist Bot reviewed May 26, 2026

View reviewed changes

indigo1973 force-pushed the dep_0525 branch from ca7881b to d59fa27 Compare May 26, 2026 06:56

indigo1973 changed the title ~~Fix: complete deps.json v3 rollout missed by PR #808~~ Fix: drop misaligned version guards in the L2 swimlane pipeline May 26, 2026

ChaoWao mentioned this pull request May 27, 2026

Refactor: drop fanout from L2PerfRecord hot path #863

Merged

6 tasks

indigo1973 force-pushed the dep_0525 branch from d59fa27 to a1ba56c Compare May 27, 2026 06:03

indigo1973 changed the title ~~Fix: drop misaligned version guards in the L2 swimlane pipeline~~ Fix: clean up "version" fields in L2 swimlane / dep_gen JSON May 27, 2026

indigo1973 force-pushed the dep_0525 branch from a1ba56c to e4ed7d7 Compare May 27, 2026 06:36

ChaoWao approved these changes May 27, 2026

View reviewed changes

ChaoWao merged commit 352c3f8 into hw-native-sys:main May 27, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: clean up "version" fields in L2 swimlane / dep_gen JSON#856

Fix: clean up "version" fields in L2 swimlane / dep_gen JSON#856
ChaoWao merged 1 commit into
hw-native-sys:mainfrom
indigo1973:dep_0525

indigo1973 commented May 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

indigo1973 commented May 27, 2026

Uh oh!

gemini-code-assist Bot commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

indigo1973 commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

indigo1973 commented May 27, 2026

Uh oh!

gemini-code-assist Bot commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

indigo1973 commented May 26, 2026 •

edited

Loading