-
Notifications
You must be signed in to change notification settings - Fork 73
Enable TensorIndexer with the stream tests #5726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Loop index fix
|
!test --diff |
Description
|
| Relevant files | |||
|---|---|---|---|
| Bug fix |
| ||
| Enhancement |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Stream parallel loop index handling
ParallelType::Stream alongside thread parallel types for using NamedScalar::getParallelIndex(). This fixes the loop index setting for stream parallel loops as mentioned in the PR description. The logic flow appears correct: device dimensions and zero-index cases use zero, while thread and stream parallel types use the parallel index. |
Greptile SummaryThis PR fixes loop index allocation for stream-parallel loops and enables TensorIndexer in stream tests. Key Changes:
The fix is necessary because when TensorIndexer is enabled via Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Test as StreamTest
participant Exec as KernelExecutor
participant Lower as GpuLower
participant IdModel as IdModel
participant NamedScalar as NamedScalar
Note over Test: EnableOption::IdModel {"all"} enabled
Test->>Exec: compile(&fusion)
Exec->>Lower: lower2device()
Lower->>IdModel: allocateLoopIndexVariables()
Note over IdModel: For each loop group
IdModel->>IdModel: getParallelType(loop_group)
alt ptype == Stream
IdModel->>NamedScalar: getParallelIndex(ParallelType::Stream)
NamedScalar-->>IdModel: streamIdx variable
IdModel->>IdModel: loop_index_variable_map_[loop_group] = streamIdx
else ptype is Thread type
IdModel->>NamedScalar: getParallelIndex(ptype)
NamedScalar-->>IdModel: thread index variable
end
IdModel-->>Lower: Loop indices allocated
Lower->>Lower: Build TensorIndexer with IdModel
Lower-->>Exec: Lowered kernel ready
Exec->>Test: Kernel compiled
Test->>Exec: run({in_tensor, kStreamIndex}, {out_tensor})
Exec-->>Test: Stream-parallel execution complete
|
Also fixes a loop index setting for stream parallel loops