Skip to content

feat: allow mixed-K fusion (remove K_full consistency constraint)#23

Merged
melroyanthony merged 6 commits intomainfrom
feat/issue-22-mixed-k-fusion
Mar 15, 2026
Merged

feat: allow mixed-K fusion (remove K_full consistency constraint)#23
melroyanthony merged 6 commits intomainfrom
feat/issue-22-mixed-k-fusion

Conversation

@melroyanthony
Copy link
Owner

Summary

Removes the K_full consistency constraint that prevented fusing MatMuls with different reduction dimensions. The problem statement allows per-op k-steps — each MatMul does ceil(its_K_full / k) steps independently.

Closes #22

Changes

  • compute_num_k_steps: max(K_full) instead of min(K_full)
  • Removed K_full consistency check from fusion (Rust + Python)
  • Evaluator validates k <= max(K_full) instead of K_full equality
  • Mixed-K phase-based latency for split-K (per-op active step counts)
  • Uniform-K degenerate case uses existing formula (no regression)
  • ADR-006 documents the decision and execution model

Results

Benchmark latencies are unchanged — the cost-based fusion correctly determines that mixed-K fused latency is not beneficial for these benchmarks' tensor dimensions. The mixed-K model is now correct and ready for future benchmarks where it would help.

All 16 unit tests pass. All 5 benchmarks complete in < 1s.

Test plan

  • cargo test --skip test_benchmark — 16/16 passing
  • All 5 benchmarks produce valid solutions
  • PROBLEM.md examples (uniform-K) produce identical results
  • Track B Python evaluator aligned with Rust

Architecture:
- system-design.md: mixed-K execution model with per-op active steps,
  updated compute/memory formulas, K_max replaces K_cap
- ADR-006: documents why K_full consistency was overly restrictive,
  the correct per-op k-step model, and 30% expected improvement

Requirements:
- F-05: allows mixed-K fusion with num_k_steps = ceil(max(K_full)/k)
- F-06: K_max for k search range
- New acceptance criteria for mixed-K support
- Risk register: mixed-K latency model correctness

Refs #22
Rust + Python changes:
- compute_num_k_steps: max(K_full) instead of min(K_full)
- find_k_full/find_split_k: max(K_full) for search range
- Removed K_full consistency check from fusion (both tracks)
- Evaluator: k <= max(K_full) instead of K_full consistency
- Latency model: mixed-K phase-based computation for split-K
  (each MatMul active for ceil(its_K_full/k) steps only)
- Uniform-K case uses existing formula (no regression)

16/16 tests pass. Benchmark latencies unchanged because cost-based
fusion correctly determines that mixed-K fused latency is not better
than split for these benchmarks' tensor dimensions.

Closes #22
Copilot AI review requested due to automatic review settings March 15, 2026 21:53
@melroyanthony melroyanthony self-assigned this Mar 15, 2026
@melroyanthony melroyanthony added the enhancement New feature or request label Mar 15, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the K_full consistency constraint that prevented fusing MatMul ops with different reduction dimensions into the same subgraph. The change spans both the Rust backend and Python evaluator, updating the latency model, fusion logic, granularity search, validation, and documentation.

Changes:

  • Replaced min(K_full) with max(K_full) across all k-step computations (Rust latency, memory, granularity; Python evaluator), and removed the K_full consistency check from fusion and validation
  • Added a mixed-K phased latency computation path in both Rust (latency.rs) and Python (evaluator.py) that computes per-phase costs when MatMuls have different K_full values
  • Added ADR-006 and updated system-design docs and MVP scope to document the mixed-K execution model

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
solution/backend/rust/src/latency.rs Changed minmax for K_full; added mixed-K phased latency path
solution/backend/rust/src/evaluate.rs Removed K_full consistency validation; k <= max(K_full) check
solution/backend/rust/src/optimizer/fusion.rs Removed K_full consistency check from merge eligibility
solution/backend/rust/src/optimizer/granularity.rs Changed find_k_full to use max() instead of min()
solution/backend/rust/src/memory.rs Changed find_split_k to search from max K_full downward
solution/agent/evaluator.py Changed minmax for K_full; added mixed-K phased latency path; updated validation
solution/agent/scheduler.py Removed K_full consistency check from fusion
solution/docs/decisions/ADR-006-mixed-k-fusion.md New ADR documenting the mixed-K fusion decision
solution/docs/architecture/system-design.md Updated system design docs for mixed-K model
solution/requirements/mvp-scope.md Updated requirements and acceptance criteria for mixed-K

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Rust + Python:
- Per-op k_strip: track actual tensor-dimension-based k_strip per MatMul
  instead of compute-ratio proxy (correct for mixed input dimensions)
- O(1) per phase: replaced step loop with first/interior/last arithmetic
  matching documented O(distinct_K) complexity
- Simulation path: skip k_strip loads for inactive MatMuls (finished
  their reduction) using tensor-to-active-steps mapping

Refs #22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the K_full consistency constraint that prevented fusing MatMul ops with different reduction dimensions into the same subgraph. The change touches both the Rust optimizer and the Python evaluator to support "mixed-K" execution where each MatMul independently runs ceil(K_full_op / k) k-steps, with the subgraph running for ceil(max(K_full) / k) total steps. A phased latency computation handles the varying active-op sets efficiently in O(distinct_K) per candidate.

Changes:

  • Changed min(K_full) to max(K_full) for num_k_steps, k search range, and validation across Rust and Python
  • Added mixed-K phased latency computation in both latency.rs and evaluator.py with O(1)-per-phase closed-form arithmetic, while retaining the uniform-K fast path
  • Removed K_full consistency checks from fusion (Rust + Python) and evaluator validation, and added ADR-006 documenting the decision

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
solution/backend/rust/src/latency.rs Changed num_k_steps from min→max K_full; added mixed-K phased latency with per-op k_strip tracking
solution/backend/rust/src/evaluate.rs Relaxed validation to check k <= max(K_full) instead of K_full equality
solution/backend/rust/src/optimizer/fusion.rs Removed K_full consistency check from merge eligibility
solution/backend/rust/src/optimizer/granularity.rs Changed find_k_full from min to max
solution/backend/rust/src/memory.rs Changed split-K search bound from min to max K_full
solution/agent/evaluator.py Python evaluator: min→max K_full, mixed-K phased latency, simulation path active-op filtering
solution/agent/scheduler.py Removed K_full consistency check from Python fusion
solution/docs/decisions/ADR-006-mixed-k-fusion.md New ADR documenting mixed-K fusion decision
solution/docs/architecture/system-design.md Updated design docs with mixed-K execution model
solution/requirements/mvp-scope.md Updated requirements and acceptance criteria for mixed-K

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

- Rust + Python: compute uses min(k, K_full_op) / K_full_op to prevent
  base_cost scaling above 1.0 when k > K_full_op in mixed-K subgraphs
- Applies to matmul_compute_per_step, mixed-K active_compute, and
  simulation path active_matmul_compute
- Removed dead total_k_steps variable from Python mixed-K path

Refs #22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the K_full consistency constraint from the fusion stage, allowing MatMul ops with different reduction dimensions (K_full values) to coexist in the same subgraph. The subgraph now runs for ceil(max(K_full) / k) total k-steps, with each MatMul active only for its own ceil(K_full_op / k) steps. The change is applied consistently across the Rust optimizer, Python evaluator, and documentation.

Changes:

  • Replaced min(K_full) with max(K_full) for num_k_steps computation, granularity search bounds, and split-K search in both Rust and Python
  • Removed K_full consistency checks from fusion (Rust + Python) and evaluator validation; validation now only checks k <= max(K_full)
  • Added mixed-K phase-based latency computation with O(distinct_K) closed-form evaluation per candidate, with a fast path for uniform-K subgraphs

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
solution/backend/rust/src/latency.rs Core mixed-K latency model: phase-based computation, min(k, K_full_op) clamping, max-based num_k_steps
solution/backend/rust/src/evaluate.rs Removed K_full consistency validation; now only checks k <= max(K_full)
solution/backend/rust/src/optimizer/fusion.rs Removed K_full consistency check from fusion eligibility
solution/backend/rust/src/optimizer/granularity.rs Changed find_k_full from min to max
solution/backend/rust/src/memory.rs Changed split-K search bound from min to max K_full
solution/agent/evaluator.py Python mirror: mixed-K latency, phase-based evaluation, validation relaxation
solution/agent/scheduler.py Removed K_full consistency check from Python fusion
solution/docs/decisions/ADR-006-mixed-k-fusion.md ADR documenting the mixed-K decision and execution model
solution/docs/architecture/system-design.md Updated latency model docs for mixed-K semantics
solution/requirements/mvp-scope.md Updated acceptance criteria and feature descriptions for mixed-K

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Simulation path k_strip loads now use per-tensor k_eff = min(k, K_full_op)
via k_strip_tensor_k_eff mapping. Prevents loading more elements than
the tensor's actual reduction dimension when k > K_full_op.

Refs #22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the K_full consistency constraint from the fusion stage, allowing MatMul ops with different reduction dimensions (K_full values) to be fused into the same subgraph. The change implements a mixed-K execution model where each MatMul runs for ceil(K_full_op / k) steps and the subgraph runs for ceil(max(K_full) / k) total steps, with inactive MatMuls contributing zero compute and memory on later steps.

Changes:

  • Changed num_k_steps from ceil(min(K_full) / k) to ceil(max(K_full) / k) and removed K_full consistency checks from fusion (Rust + Python)
  • Added phase-based closed-form latency computation for mixed-K subgraphs in both Rust and Python, with a uniform-K fast path for regression safety
  • Added ADR-006 and updated system design documentation to describe the mixed-K execution model

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
solution/backend/rust/src/latency.rs Changed minmax for K_full, added min(k, K_full_op) clamping to compute, and implemented phase-based mixed-K latency path
solution/backend/rust/src/optimizer/fusion.rs Removed K_full consistency check from merge eligibility
solution/backend/rust/src/optimizer/granularity.rs Changed find_k_full from min to max
solution/backend/rust/src/memory.rs Changed find_split_k to use max K_full as upper search bound
solution/backend/rust/src/evaluate.rs Relaxed validation: k <= max(K_full) instead of K_full equality
solution/agent/scheduler.py Removed K_full consistency check from Python fusion
solution/agent/evaluator.py Changed to max K_full, added mixed-K phase-based latency path with k_eff clamping, updated validation
solution/docs/decisions/ADR-006-mixed-k-fusion.md New ADR documenting the mixed-K fusion decision
solution/docs/architecture/system-design.md Updated latency model, granularity search, and performance docs for mixed-K
solution/requirements/mvp-scope.md Updated acceptance criteria and feature descriptions for mixed-K

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

…-K test

- Rust build_memory_plan: k_strip sizes use min(k, K_full_op) via k_eff
- Python comments: updated to show k_eff instead of raw k
- Added test_mixed_k_two_matmuls: verifies phase-based latency with
  K_full=64 and K_full=128 at various k values, checks compute_num_k_steps

17 tests passing.

Refs #22
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the K_full consistency constraint that prevented fusing MatMuls with different reduction dimensions into the same subgraph. The problem statement allows per-op k-steps, so this constraint was overly restrictive. The cost-based fusion criterion remains as the gatekeeper for merge decisions.

Changes:

  • Changed min(K_full) to max(K_full) across all components (granularity search, split-K search, num_k_steps, evaluator validation) and removed K_full equality checks from fusion (Rust + Python)
  • Added mixed-K phased latency computation in both Rust and Python: when MatMuls have different K_full values, steps are grouped into phases by active op set, with O(1) computation per phase
  • Added test_mixed_k_two_matmuls unit test and comprehensive documentation (ADR-006, system-design.md updates, mvp-scope.md updates)

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file
File Description
solution/backend/rust/src/latency.rs Core change: min→max for num_k_steps, k_eff = min(k, K_full_op) clamping in build_memory_plan and compute, mixed-K phased latency path
solution/backend/rust/src/optimizer/fusion.rs Removed K_full consistency check from merge eligibility
solution/backend/rust/src/optimizer/granularity.rs min→max for K_full search bound
solution/backend/rust/src/memory.rs min→max for split-K search upper bound
solution/backend/rust/src/evaluate.rs Replaced K_full consistency validation with k <= max(K_full) check
solution/backend/rust/src/main.rs Added test_mixed_k_two_matmuls test with K_full=64/128
solution/agent/evaluator.py Python mirror of all Rust changes: max K_full, mixed-K phased latency, k_eff clamping, per-op k_strip tracking
solution/agent/scheduler.py Removed K_full consistency check from Python fusion
solution/docs/decisions/ADR-006-mixed-k-fusion.md ADR documenting the decision and execution model
solution/docs/architecture/system-design.md Updated latency model docs for mixed-K semantics
solution/requirements/mvp-scope.md Updated acceptance criteria and feature descriptions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@melroyanthony melroyanthony merged commit 39c896e into main Mar 15, 2026
10 checks passed
@melroyanthony melroyanthony deleted the feat/issue-22-mixed-k-fusion branch March 15, 2026 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: relax K_full consistency to allow mixed-K fusion

2 participants