feat: allow mixed-K fusion (remove K_full consistency constraint) by melroyanthony · Pull Request #23 · melroyanthony/MLSys

melroyanthony · 2026-03-15T21:53:20Z

Summary

Removes the K_full consistency constraint that prevented fusing MatMuls with different reduction dimensions. The problem statement allows per-op k-steps — each MatMul does ceil(its_K_full / k) steps independently.

Closes #22

Changes

compute_num_k_steps: max(K_full) instead of min(K_full)
Removed K_full consistency check from fusion (Rust + Python)
Evaluator validates k <= max(K_full) instead of K_full equality
Mixed-K phase-based latency for split-K (per-op active step counts)
Uniform-K degenerate case uses existing formula (no regression)
ADR-006 documents the decision and execution model

Results

Benchmark latencies are unchanged — the cost-based fusion correctly determines that mixed-K fused latency is not beneficial for these benchmarks' tensor dimensions. The mixed-K model is now correct and ready for future benchmarks where it would help.

All 16 unit tests pass. All 5 benchmarks complete in < 1s.

Test plan

cargo test --skip test_benchmark — 16/16 passing
All 5 benchmarks produce valid solutions
PROBLEM.md examples (uniform-K) produce identical results
Track B Python evaluator aligned with Rust

Architecture: - system-design.md: mixed-K execution model with per-op active steps, updated compute/memory formulas, K_max replaces K_cap - ADR-006: documents why K_full consistency was overly restrictive, the correct per-op k-step model, and 30% expected improvement Requirements: - F-05: allows mixed-K fusion with num_k_steps = ceil(max(K_full)/k) - F-06: K_max for k search range - New acceptance criteria for mixed-K support - Risk register: mixed-K latency model correctness Refs #22

Rust + Python changes: - compute_num_k_steps: max(K_full) instead of min(K_full) - find_k_full/find_split_k: max(K_full) for search range - Removed K_full consistency check from fusion (both tracks) - Evaluator: k <= max(K_full) instead of K_full consistency - Latency model: mixed-K phase-based computation for split-K (each MatMul active for ceil(its_K_full/k) steps only) - Uniform-K case uses existing formula (no regression) 16/16 tests pass. Benchmark latencies unchanged because cost-based fusion correctly determines that mixed-K fused latency is not better than split for these benchmarks' tensor dimensions. Closes #22

Copilot

Pull request overview

This PR removes the K_full consistency constraint that prevented fusing MatMul ops with different reduction dimensions into the same subgraph. The change spans both the Rust backend and Python evaluator, updating the latency model, fusion logic, granularity search, validation, and documentation.

Changes:

Replaced min(K_full) with max(K_full) across all k-step computations (Rust latency, memory, granularity; Python evaluator), and removed the K_full consistency check from fusion and validation
Added a mixed-K phased latency computation path in both Rust (latency.rs) and Python (evaluator.py) that computes per-phase costs when MatMuls have different K_full values
Added ADR-006 and updated system-design docs and MVP scope to document the mixed-K execution model

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
solution/backend/rust/src/latency.rs	Changed `min` → `max` for K_full; added mixed-K phased latency path
solution/backend/rust/src/evaluate.rs	Removed K_full consistency validation; k <= max(K_full) check
solution/backend/rust/src/optimizer/fusion.rs	Removed K_full consistency check from merge eligibility
solution/backend/rust/src/optimizer/granularity.rs	Changed `find_k_full` to use `max()` instead of `min()`
solution/backend/rust/src/memory.rs	Changed `find_split_k` to search from max K_full downward
solution/agent/evaluator.py	Changed `min` → `max` for K_full; added mixed-K phased latency path; updated validation
solution/agent/scheduler.py	Removed K_full consistency check from fusion
solution/docs/decisions/ADR-006-mixed-k-fusion.md	New ADR documenting the mixed-K fusion decision
solution/docs/architecture/system-design.md	Updated system design docs for mixed-K model
solution/requirements/mvp-scope.md	Updated requirements and acceptance criteria for mixed-K

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

solution/agent/evaluator.py

solution/backend/rust/src/latency.rs

solution/agent/evaluator.py

solution/backend/rust/src/latency.rs

solution/agent/evaluator.py

Rust + Python: - Per-op k_strip: track actual tensor-dimension-based k_strip per MatMul instead of compute-ratio proxy (correct for mixed input dimensions) - O(1) per phase: replaced step loop with first/interior/last arithmetic matching documented O(distinct_K) complexity - Simulation path: skip k_strip loads for inactive MatMuls (finished their reduction) using tensor-to-active-steps mapping Refs #22

Copilot

Pull request overview

This PR removes the K_full consistency constraint that prevented fusing MatMul ops with different reduction dimensions into the same subgraph. The change touches both the Rust optimizer and the Python evaluator to support "mixed-K" execution where each MatMul independently runs ceil(K_full_op / k) k-steps, with the subgraph running for ceil(max(K_full) / k) total steps. A phased latency computation handles the varying active-op sets efficiently in O(distinct_K) per candidate.

Changes:

Changed min(K_full) to max(K_full) for num_k_steps, k search range, and validation across Rust and Python
Added mixed-K phased latency computation in both latency.rs and evaluator.py with O(1)-per-phase closed-form arithmetic, while retaining the uniform-K fast path
Removed K_full consistency checks from fusion (Rust + Python) and evaluator validation, and added ADR-006 documenting the decision

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
solution/backend/rust/src/latency.rs	Changed num_k_steps from min→max K_full; added mixed-K phased latency with per-op k_strip tracking
solution/backend/rust/src/evaluate.rs	Relaxed validation to check k <= max(K_full) instead of K_full equality
solution/backend/rust/src/optimizer/fusion.rs	Removed K_full consistency check from merge eligibility
solution/backend/rust/src/optimizer/granularity.rs	Changed find_k_full from min to max
solution/backend/rust/src/memory.rs	Changed split-K search bound from min to max K_full
solution/agent/evaluator.py	Python evaluator: min→max K_full, mixed-K phased latency, simulation path active-op filtering
solution/agent/scheduler.py	Removed K_full consistency check from Python fusion
solution/docs/decisions/ADR-006-mixed-k-fusion.md	New ADR documenting mixed-K fusion decision
solution/docs/architecture/system-design.md	Updated design docs with mixed-K execution model
solution/requirements/mvp-scope.md	Updated requirements and acceptance criteria for mixed-K

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

solution/backend/rust/src/latency.rs

solution/agent/evaluator.py

- Rust + Python: compute uses min(k, K_full_op) / K_full_op to prevent base_cost scaling above 1.0 when k > K_full_op in mixed-K subgraphs - Applies to matmul_compute_per_step, mixed-K active_compute, and simulation path active_matmul_compute - Removed dead total_k_steps variable from Python mixed-K path Refs #22

Copilot

Pull request overview

This PR removes the K_full consistency constraint from the fusion stage, allowing MatMul ops with different reduction dimensions (K_full values) to coexist in the same subgraph. The subgraph now runs for ceil(max(K_full) / k) total k-steps, with each MatMul active only for its own ceil(K_full_op / k) steps. The change is applied consistently across the Rust optimizer, Python evaluator, and documentation.

Changes:

Replaced min(K_full) with max(K_full) for num_k_steps computation, granularity search bounds, and split-K search in both Rust and Python
Removed K_full consistency checks from fusion (Rust + Python) and evaluator validation; validation now only checks k <= max(K_full)
Added mixed-K phase-based latency computation with O(distinct_K) closed-form evaluation per candidate, with a fast path for uniform-K subgraphs

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
solution/backend/rust/src/latency.rs	Core mixed-K latency model: phase-based computation, min(k, K_full_op) clamping, max-based num_k_steps
solution/backend/rust/src/evaluate.rs	Removed K_full consistency validation; now only checks k <= max(K_full)
solution/backend/rust/src/optimizer/fusion.rs	Removed K_full consistency check from fusion eligibility
solution/backend/rust/src/optimizer/granularity.rs	Changed find_k_full from min to max
solution/backend/rust/src/memory.rs	Changed split-K search bound from min to max K_full
solution/agent/evaluator.py	Python mirror: mixed-K latency, phase-based evaluation, validation relaxation
solution/agent/scheduler.py	Removed K_full consistency check from Python fusion
solution/docs/decisions/ADR-006-mixed-k-fusion.md	ADR documenting the mixed-K decision and execution model
solution/docs/architecture/system-design.md	Updated latency model docs for mixed-K semantics
solution/requirements/mvp-scope.md	Updated acceptance criteria and feature descriptions for mixed-K

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

solution/agent/evaluator.py

Simulation path k_strip loads now use per-tensor k_eff = min(k, K_full_op) via k_strip_tensor_k_eff mapping. Prevents loading more elements than the tensor's actual reduction dimension when k > K_full_op. Refs #22

Copilot

Pull request overview

This PR removes the K_full consistency constraint from the fusion stage, allowing MatMul ops with different reduction dimensions (K_full values) to be fused into the same subgraph. The change implements a mixed-K execution model where each MatMul runs for ceil(K_full_op / k) steps and the subgraph runs for ceil(max(K_full) / k) total steps, with inactive MatMuls contributing zero compute and memory on later steps.

Changes:

Changed num_k_steps from ceil(min(K_full) / k) to ceil(max(K_full) / k) and removed K_full consistency checks from fusion (Rust + Python)
Added phase-based closed-form latency computation for mixed-K subgraphs in both Rust and Python, with a uniform-K fast path for regression safety
Added ADR-006 and updated system design documentation to describe the mixed-K execution model

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
solution/backend/rust/src/latency.rs	Changed `min` → `max` for K_full, added `min(k, K_full_op)` clamping to compute, and implemented phase-based mixed-K latency path
solution/backend/rust/src/optimizer/fusion.rs	Removed K_full consistency check from merge eligibility
solution/backend/rust/src/optimizer/granularity.rs	Changed `find_k_full` from `min` to `max`
solution/backend/rust/src/memory.rs	Changed `find_split_k` to use `max` K_full as upper search bound
solution/backend/rust/src/evaluate.rs	Relaxed validation: k <= max(K_full) instead of K_full equality
solution/agent/scheduler.py	Removed K_full consistency check from Python fusion
solution/agent/evaluator.py	Changed to max K_full, added mixed-K phase-based latency path with k_eff clamping, updated validation
solution/docs/decisions/ADR-006-mixed-k-fusion.md	New ADR documenting the mixed-K fusion decision
solution/docs/architecture/system-design.md	Updated latency model, granularity search, and performance docs for mixed-K
solution/requirements/mvp-scope.md	Updated acceptance criteria and feature descriptions for mixed-K

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

solution/backend/rust/src/latency.rs

solution/agent/evaluator.py

solution/backend/rust/src/latency.rs

…-K test - Rust build_memory_plan: k_strip sizes use min(k, K_full_op) via k_eff - Python comments: updated to show k_eff instead of raw k - Added test_mixed_k_two_matmuls: verifies phase-based latency with K_full=64 and K_full=128 at various k values, checks compute_num_k_steps 17 tests passing. Refs #22

Copilot

Pull request overview

This PR removes the K_full consistency constraint that prevented fusing MatMuls with different reduction dimensions into the same subgraph. The problem statement allows per-op k-steps, so this constraint was overly restrictive. The cost-based fusion criterion remains as the gatekeeper for merge decisions.

Changes:

Changed min(K_full) to max(K_full) across all components (granularity search, split-K search, num_k_steps, evaluator validation) and removed K_full equality checks from fusion (Rust + Python)
Added mixed-K phased latency computation in both Rust and Python: when MatMuls have different K_full values, steps are grouped into phases by active op set, with O(1) computation per phase
Added test_mixed_k_two_matmuls unit test and comprehensive documentation (ADR-006, system-design.md updates, mvp-scope.md updates)

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
solution/backend/rust/src/latency.rs	Core change: `min→max` for num_k_steps, `k_eff = min(k, K_full_op)` clamping in build_memory_plan and compute, mixed-K phased latency path
solution/backend/rust/src/optimizer/fusion.rs	Removed K_full consistency check from merge eligibility
solution/backend/rust/src/optimizer/granularity.rs	`min→max` for K_full search bound
solution/backend/rust/src/memory.rs	`min→max` for split-K search upper bound
solution/backend/rust/src/evaluate.rs	Replaced K_full consistency validation with `k <= max(K_full)` check
solution/backend/rust/src/main.rs	Added `test_mixed_k_two_matmuls` test with K_full=64/128
solution/agent/evaluator.py	Python mirror of all Rust changes: max K_full, mixed-K phased latency, k_eff clamping, per-op k_strip tracking
solution/agent/scheduler.py	Removed K_full consistency check from Python fusion
solution/docs/decisions/ADR-006-mixed-k-fusion.md	ADR documenting the decision and execution model
solution/docs/architecture/system-design.md	Updated latency model docs for mixed-K semantics
solution/requirements/mvp-scope.md	Updated acceptance criteria and feature descriptions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

melroyanthony added 2 commits March 15, 2026 21:39

Copilot AI review requested due to automatic review settings March 15, 2026 21:53

Copilot started reviewing on behalf of melroyanthony March 15, 2026 21:53 View session

melroyanthony self-assigned this Mar 15, 2026

melroyanthony added the enhancement New feature or request label Mar 15, 2026

Copilot AI reviewed Mar 15, 2026

View reviewed changes

melroyanthony requested a review from Copilot March 15, 2026 22:20

Copilot started reviewing on behalf of melroyanthony March 15, 2026 22:20 View session

Copilot AI reviewed Mar 15, 2026

View reviewed changes

solution/backend/rust/src/latency.rs Show resolved Hide resolved

solution/agent/evaluator.py Outdated Show resolved Hide resolved

solution/agent/evaluator.py Outdated Show resolved Hide resolved

melroyanthony requested a review from Copilot March 15, 2026 22:33

Copilot started reviewing on behalf of melroyanthony March 15, 2026 22:33 View session

Copilot AI reviewed Mar 15, 2026

View reviewed changes

solution/agent/evaluator.py Outdated Show resolved Hide resolved

fix: clamp k_strip sizes to min(k, K_full_op) in Python evaluator

9f8fe02

Simulation path k_strip loads now use per-tensor k_eff = min(k, K_full_op) via k_strip_tensor_k_eff mapping. Prevents loading more elements than the tensor's actual reduction dimension when k > K_full_op. Refs #22

melroyanthony requested a review from Copilot March 15, 2026 22:42

Copilot started reviewing on behalf of melroyanthony March 15, 2026 22:43 View session

Copilot AI reviewed Mar 15, 2026

View reviewed changes

solution/backend/rust/src/latency.rs Show resolved Hide resolved

solution/agent/evaluator.py Outdated Show resolved Hide resolved

solution/backend/rust/src/latency.rs Show resolved Hide resolved

melroyanthony requested a review from Copilot March 15, 2026 22:54

Copilot started reviewing on behalf of melroyanthony March 15, 2026 22:55 View session

Copilot AI reviewed Mar 15, 2026

View reviewed changes

melroyanthony merged commit 39c896e into main Mar 15, 2026
10 checks passed

melroyanthony deleted the feat/issue-22-mixed-k-fusion branch March 15, 2026 23:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: allow mixed-K fusion (remove K_full consistency constraint)#23

feat: allow mixed-K fusion (remove K_full consistency constraint)#23
melroyanthony merged 6 commits intomainfrom
feat/issue-22-mixed-k-fusion

melroyanthony commented Mar 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

melroyanthony commented Mar 15, 2026

Summary

Changes

Results

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants