Skip to content

[Feature] Scheduler: avoid dispatching MIX task to a core group whose AIV is already occupied #851

@zhangqi-chen

Description

@zhangqi-chen

Summary

In the a2a3 tensormap_and_ringbuffer runtime, when an AIV core in a 1C2V cluster is already executing a task, the scheduler should not dispatch a MIX task to that AIV's core group. The MIX task would conflict with the in-flight AIV task because it requires both the AIC and both AIVs of the group.

Motivation / Use Case

Observed while iterating on models/deepseek/v4/moe.py. A MIX task was dispatched to a core group whose AIV slot was already occupied by an earlier AIV-only task, producing a conflict (screenshot to be attached).

The current scheduler appears to admit a MIX task into a group based on AIC availability without checking that all AIVs in the group are free. Because MIX requires the entire 1C2V cluster, partial-occupancy AIV state must be considered before MIX admission.

Without this guard, mixed AIV-only + MIX workloads on the same cluster (e.g. MoE pipelines that interleave vector-only and mix kernels) can hit hard-to-diagnose dispatch conflicts.

Proposed API / Behavior

MIX-queue admission check should be tightened from "AIC group free" to "AIC free AND every AIV in the same group free." Concretely, when picking a target group for a MIX task, the scheduler must consult the per-AIV occupancy state in addition to the group/AIC state, and skip groups where any AIV is currently running an AIV-only task.

Equivalently, AIV-only dispatch should reserve the AIV slot in a way that MIX admission can observe before it reserves the cluster.

Alternatives Considered

  • User-side serialization (insert a barrier so AIV-only tasks drain before MIX): pushes scheduler responsibility onto kernel authors and gives up overlap that is otherwise legal.
  • Always reserve full cluster for AIV-only tasks: loses AIV parallelism for vector-only workloads.

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions