how to understand "triton_gpu.slice"

I cannot understand the meaning of triton_gpu.slice just from the explanation in "SliceEncodingAttr", this explanation as following:
```
  let description = [{
    TODO: improve docs

    A = [x  x  x  x  x  x  x  x]

    parent = [0  1  2  3 ]
             [4  5  6  7 ]
             [8  9  10 11]
             [12 13 14 15]
    dim = 0

    Then the data of A would be distributed as follow between the 16 CUDA threads:
    L(A) = [ {0,4,8,12} , {1,5,9,13} , ... {3,7,11,15}, {0,4,8,12} , ..., {3,7,11,15} ]

    This is useful for constructing the inverse layout of an expand_dims operation during some optimization passes.

  }];
```

now, let's show a code about triton_gpu.slice in the .mlir，like this:
```
#blocked = #triton_gpu.blocked<{sizePerThread = [4, 1], threadsPerWarp = [16, 2], warpsPerCTA = [1, 4], order = [0, 1]}>
#blocked1 = #triton_gpu.blocked<{sizePerThread = [1, 4], threadsPerWarp = [2, 16], warpsPerCTA = [4, 1], order = [1, 0]}>
module attributes {"triton_gpu.num-ctas" = 1 : i32, "triton_gpu.num-warps" = 4 : i32} {
  tt.func @transpose(%arg0: !tt.ptr<f32, 1> {tt.divisibility = 16 : i32},
                     %arg1: i32 {tt.divisibility = 16 : i32},
                     %arg2: !tt.ptr<f32, 1> {tt.divisibility = 16 : i32},
                     %arg3: i32 {tt.divisibility = 16 : i32}) {
    %cst = arith.constant dense<true> : tensor<64x64xi1, #blocked>
    %cst_0 = arith.constant dense<0.000000e+00> : tensor<64x64xf32, #blocked1>
    %cst_1 = arith.constant dense<true> : tensor<64x64xi1, #blocked1>
    %0 = tt.make_range {end = 64 : i32, start = 0 : i32} : tensor<64xi32, #triton_gpu.slice<{dim = 1, parent = #blocked1}>>
    %1 = tt.make_range {end = 64 : i32, start = 0 : i32} : tensor<64xi32, #triton_gpu.slice<{dim = 1, parent = #blocked}>>
    %2 = tt.make_range {end = 64 : i32, start = 0 : i32} : tensor<64xi32, #triton_gpu.slice<{dim = 0, parent = #blocked1}>>
    %3 = tt.make_range {end = 64 : i32, start = 0 : i32} : tensor<64xi32, #triton_gpu.slice<{dim = 0, parent = #blocked}>>
......
......
......
}
```

I cannot figure out the layout about "%0", "%1", "%2", and "%3"

Is there anyone would like to teach me?
Is there anyone would like to tell me tensor layout about above values?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to understand "triton_gpu.slice" #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

how to understand "triton_gpu.slice" #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions