Skip to content

Wrong instruction dependency #69

@jinhongyii

Description

@jinhongyii

Describe the bug

In this nki kernel (https://gist.github.com/jinhongyii/492d7079efdf186b1cbb6b2f0c0478be), line 158,

        p_transposed[p_loop_29[:, None, ], ((b_loop_2 * 128) + f_loop_22[None, :, ])] = nisa.tensor_copy(acc_psum_2[(block_kv * 126 + block_q) * 64 + b_loop_2, b_loop_2 % 8, p_loop_29[:, None, ], f_loop_22[None, :, ]])

this tensor_copy is accessing acc_psum_2 of bank b_loop_2 % 8, so the instruction in iteration b_loop_2 should have anti-dependency on iteration b_loop_2 - 8.
However, in the profile, it shows that this instruction depends on its previous iteration (b_loop_2 - 1). I believe this is a compiler bug which hurts performance.

Image

Expected Behavior

The copy instruction should not depend on its previous iteration. The event semaphore in the screenshot should not exist.

Current Behavior

event semaphore appearing between each copy instruction under b_loop_2

Reproduction Steps

Run the nki kernel on trn1. You can reproduce the all the logs and profiles.

Regression Issue

  • Select this option if this issue appears to be a regression.

Possible Solution

Fix the dependency building pass.

Additional Information/Context

No response

neuronx-cc version used

2.17.194

Framework(s) and their versions used (JAX, PyTorch, etc..)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions