From f82d8467ac02ba32724a2df8cda5b025d9d49384 Mon Sep 17 00:00:00 2001 From: zhangstevenunity <128771452+zhangstevenunity@users.noreply.github.com> Date: Wed, 27 May 2026 16:37:18 +0800 Subject: [PATCH 1/5] Expose configurable frontend pipe slot_num --- docs/PTO_IR_manual.md | 23 ++++++--- docs/designs/ptoas-tpush-tpop-design.md | 25 +++++----- include/PTO/IR/PTOOps.td | 2 + lib/PTO/IR/PTO.cpp | 32 +++++++++--- .../PTOLowerFrontendPipeOpsPass.cpp | 18 +++++-- ...h_tpop_frontend_local_slot_num_invalid.pto | 2 +- .../pto/tpush_tpop_frontend_slot_num_a3.pto | 49 +++++++++++++++++++ .../tpush_tpop_frontend_slot_num_invalid.pto | 15 ++++++ ...h_tpop_frontend_slot_num_local_invalid.pto | 15 ++++++ .../pto/tpush_tpop_internal_slot_num_a3.pto | 20 ++++++++ 10 files changed, 169 insertions(+), 32 deletions(-) create mode 100644 test/lit/pto/tpush_tpop_frontend_slot_num_a3.pto create mode 100644 test/lit/pto/tpush_tpop_frontend_slot_num_invalid.pto create mode 100644 test/lit/pto/tpush_tpop_frontend_slot_num_local_invalid.pto create mode 100644 test/lit/pto/tpush_tpop_internal_slot_num_a3.pto diff --git a/docs/PTO_IR_manual.md b/docs/PTO_IR_manual.md index a32de72f7..27031046c 100644 --- a/docs/PTO_IR_manual.md +++ b/docs/PTO_IR_manual.md @@ -8346,6 +8346,10 @@ frontend/framework generated IR. The detailed design document is: function. - `slot_size` is expressed in bytes and uses the pre-split logical pipe-entry size. +- `slot_num` is an optional compile-time integer attribute on + `pto.aic_initialize_pipe` / `pto.aiv_initialize_pipe`. It controls the GM + ring FIFO depth and defaults to `8` for `dir_mask = 1/2` or `4` for + `dir_mask = 3`. - `local_slot_num` is an optional compile-time integer attribute on `pto.aic_initialize_pipe` / `pto.aiv_initialize_pipe`. On A2/A3 it overrides the default consumer-side local FIFO slot count only @@ -8369,9 +8373,10 @@ frontend/framework generated IR. The detailed design document is: (`pto.initialize_l2g2l_pipe`). It does not implicitly execute `pto.tstore` or `pto.tload`; callers move data explicitly before `tpush` or after `tpop`. - When every transfer op bound to one pipe id uses a global entry, the pipe is - a global-only GM FIFO. Its frontend initialize op carries only - `gm_slot_tensor`; `gm_slot_buffer`, `c2v_consumer_buf`, `v2c_consumer_buf`, `local_slot_num`, - `pto.reserve_buffer`, and `pto.import_reserved_buffer` are not used. + a global-only GM FIFO. Its frontend initialize op carries `gm_slot_tensor` + and may carry `slot_num`; `gm_slot_buffer`, `c2v_consumer_buf`, + `v2c_consumer_buf`, `local_slot_num`, `pto.reserve_buffer`, and + `pto.import_reserved_buffer` are not used. - For global entries, the matched initialize op's `gm_slot_tensor` describes one FIFO slot entry, not the full multi-slot FIFO buffer. Its dtype, shape, stride, and layout must match the `tensor_view` returned by `talloc` / @@ -8505,7 +8510,7 @@ this op. ```mlir // A2/A3 (with GM slot buffer): -pto.aic_initialize_pipe {id = 0, dir_mask = 1, slot_size = 1024, local_slot_num = 1} +pto.aic_initialize_pipe {id = 0, dir_mask = 1, slot_size = 1024, slot_num = 2, local_slot_num = 1} (gm_slot_buffer = %gm_buf : !pto.ptr, c2v_consumer_buf = %c2v_import : i32, v2c_consumer_buf = %c0_i32 : i32) @@ -8529,6 +8534,8 @@ pto.aic_initialize_pipe {id = 0, dir_mask = 1, slot_size = 1024, nosplit = true} the same function - `dir_mask`: communication direction encoding - `slot_size`: logical slot size in bytes +- `slot_num`: optional GM ring FIFO slot count; omitted defaults to `8` for + `dir_mask = 1/2` or `4` for `dir_mask = 3` - `local_slot_num`: optional A2/A3-only local FIFO slot count override for the lowered `pto.initialize_l2g2l_pipe`; omitted for global-only GM FIFO - `nosplit`: optional compile-time boolean controlling no-split pipe mode @@ -8551,12 +8558,12 @@ pto.aic_initialize_pipe {id = 0, dir_mask = 1, slot_size = 1024, nosplit = true} - Must appear in Cube kernels - Multiple `pto.aic_initialize_pipe` ops are allowed in one Cube function, but `id` must be unique among frontend initialize ops in that function +- If `slot_num` is present, it must be greater than `0` - If `local_slot_num` is present, it must be greater than `0` and no greater - than the legacy slot count implied by `dir_mask` - (`8` for `dir_mask = 1/2`, `4` for `dir_mask = 3`) + than the effective `slot_num` - A global-only GM FIFO initialize carries only `gm_slot_tensor`; it must not carry `gm_slot_buffer`, `local_slot_num`, `c2v_consumer_buf`, or - `v2c_consumer_buf` + `v2c_consumer_buf`; it may carry `slot_num` - For global-only GM FIFO, `slot_size` must match the byte size of `gm_slot_tensor` - Global-entry `talloc` / `tpush` / `tpop` / `tfree` entry types must match the @@ -8576,7 +8583,7 @@ pto.aic_initialize_pipe {id = 0, dir_mask = 1, slot_size = 1024, nosplit = true} ```mlir // A2/A3 (with GM slot buffer): -pto.aiv_initialize_pipe {id = 0, dir_mask = 1, slot_size = 1024, local_slot_num = 1} +pto.aiv_initialize_pipe {id = 0, dir_mask = 1, slot_size = 1024, slot_num = 2, local_slot_num = 1} (gm_slot_buffer = %gm_buf : !pto.ptr, c2v_consumer_buf = %c2v_local : i32, v2c_consumer_buf = %c0_i32 : i32) diff --git a/docs/designs/ptoas-tpush-tpop-design.md b/docs/designs/ptoas-tpush-tpop-design.md index ffb7eb623..776a5bc88 100644 --- a/docs/designs/ptoas-tpush-tpop-design.md +++ b/docs/designs/ptoas-tpush-tpop-design.md @@ -390,7 +390,7 @@ func.func @vector_kernel(%gm_slot_buffer : !pto.ptr, - 启用 local address planning 的编译流程:`reserve_buffer` 只允许 `auto = true` - 跳过 local address planning 的编译流程:`reserve_buffer` 只允许 `auto = false` 且显式提供 `base` - `import_reserved_buffer` 必须能在 `peer_func` 中找到同名 `reserve_buffer` -- global-only GM FIFO 的 initialize 只提供 `gm_slot_tensor`,不提供 `gm_slot_buffer`、`local_slot_num`、`c2v_consumer_buf`、`v2c_consumer_buf`,且不要求成对的 `reserve_buffer` / `import_reserved_buffer` +- global-only GM FIFO 的 initialize 只提供 `gm_slot_tensor`(可附带 `slot_num`),不提供 `gm_slot_buffer`、`local_slot_num`、`c2v_consumer_buf`、`v2c_consumer_buf`,且不要求成对的 `reserve_buffer` / `import_reserved_buffer` ## 4. 核心约定 @@ -516,7 +516,7 @@ DIR_BOTH 示例: - 表示 GM 路径下 consumer 侧 local slot buffer 的槽数,仅在存在 local FIFO buffer 的 tile-entry 路径有意义 - 仅在通过 GM 传递时对底层 `TPipe` 模板参数有意义,不改变 GM FIFO 的 `slot_num` - 存在 local FIFO buffer 且缺省时,默认值等于该内部 pipe 的 `slot_num` - - 因此当前固定规则下: + - 因此前端未显式指定 `slot_num` 时: - `DIR_MASK=1/2` 直接 lowering 时,`local_slot_num = 8` - `DIR_MASK=3` 单条 DIR_BOTH pipe,`local_slot_num = 4` - global-only GM FIFO 不携带 `local_slot_num` @@ -658,10 +658,10 @@ pto.tfree(%entry, %pipe : !pto.tensor_view<128x512xf32>, !pto.pipe) {split = 0} #### A2/A3 - `pto.aic_initialize_pipe` 和 `pto.aiv_initialize_pipe` lower 为 `pto.initialize_l2g2l_pipe` -- 若前端 init 只提供 `gm_slot_tensor`,则 lower 为只携带 `gm_slot_tensor` 的 global-only GM FIFO;不补 `local_slot_num`,不生成 local consumer address operand,也不依赖 `reserve_buffer` / `import_reserved_buffer` +- 若前端 init 只提供 `gm_slot_tensor`(可附带 `slot_num`),则 lower 为只携带 `gm_slot_tensor` 的 global-only GM FIFO;不补 `local_slot_num`,不生成 local consumer address operand,也不依赖 `reserve_buffer` / `import_reserved_buffer` - 若前端提供了 consumer 侧 local FIFO buffer,且提供了 `local_slot_num`,则直接转发到 lowered `pto.initialize_l2g2l_pipe` -- 若前端提供了 consumer 侧 local FIFO buffer 但未提供更具体信息,lowering 默认补上 `local_slot_num = slot_num` +- 若前端提供了 consumer 侧 local FIFO buffer 但未提供 `local_slot_num`,lowering 默认补上 `local_slot_num = slot_num` #### A5 @@ -670,8 +670,8 @@ pto.tfree(%entry, %pipe : !pto.tensor_view<128x512xf32>, !pto.pipe) {split = 0} ### 6.2 `DIR_MASK=1/2` - 只生成一条内部 pipe -- `slot_num = 8` -- 对带 consumer 侧 local FIFO buffer 的 `initialize_l2g2l_pipe`,默认 `local_slot_num = 8` +- `slot_num` 缺省为 `8`,也可由前端显式指定 +- 对带 consumer 侧 local FIFO buffer 的 `initialize_l2g2l_pipe`,默认 `local_slot_num = slot_num` - 若前端显式提供 `local_slot_num`,则使用显式值 - global-only GM FIFO 不携带 `local_slot_num`,地址/descriptor 操作数只有 `gm_slot_tensor` @@ -679,8 +679,8 @@ pto.tfree(%entry, %pipe : !pto.tensor_view<128x512xf32>, !pto.pipe) {split = 0} 前端一个 init op 生成**单条** DIR_BOTH 内部 pipe: -- `%pipe`:`dir_mask = 3`,`slot_num = 4` -- 若 lowering 为带 consumer 侧 local FIFO buffer 的 `initialize_l2g2l_pipe`,默认 `local_slot_num = 4` +- `%pipe`:`dir_mask = 3`,`slot_num` 缺省为 `4`,也可由前端显式指定 +- 若 lowering 为带 consumer 侧 local FIFO buffer 的 `initialize_l2g2l_pipe`,默认 `local_slot_num = slot_num` - 若前端显式提供 `local_slot_num`,则使用显式值 地址选择规则: @@ -995,7 +995,7 @@ pass 在模块级按两步执行: - 方向相关 op 只能出现在合法 kernel 中 - 前端数据传输 op 的 `split` 必须是合法的编译期常量属性 - `global` entry 形式的 `talloc_to_*` / `tpush_to_*` / `tpop_from_*` / `tfree_from_*` 只能绑定到 GM FIFO pipe(A2/A3 `initialize_l2g2l_pipe` 路径) -- 绑定到 global-only GM FIFO 的 initialize 只允许携带 `gm_slot_tensor`,不得携带 `gm_slot_buffer`、`local_slot_num`、`c2v_consumer_buf`、`v2c_consumer_buf`;该路径不要求 `reserve_buffer` / `import_reserved_buffer` +- 绑定到 global-only GM FIFO 的 initialize 只允许携带 `gm_slot_tensor`(可附带 `slot_num`),不得携带 `gm_slot_buffer`、`local_slot_num`、`c2v_consumer_buf`、`v2c_consumer_buf`;该路径不要求 `reserve_buffer` / `import_reserved_buffer` - `gm_slot_tensor` 本身描述单个 slot entry;其字节数必须匹配 `slot_size` - `talloc_to_*` / `tpop_from_*` 返回的 `tensor_view` 类型必须匹配 `gm_slot_tensor` - `global` entry 的 dtype、shape 与 stride/layout 必须足以生成底层 `GlobalTensor` 类型 @@ -1008,11 +1008,12 @@ pass 在模块级按两步执行: 内部 verifier 负责检查: - `slot_size > 0` -- `slot_num` 只允许 `8` 或 `4` -- `DIR_MASK=1/2` 时,`slot_num` 必须与单向/双向 lowering 规则一致 +- `slot_num >= 1` +- legacy 前端 `pto.aic_initialize_pipe` / `pto.aiv_initialize_pipe` 可显式提供 + `slot_num`;缺省时 `DIR_MASK=1/2` 使用 `8`,`DIR_MASK=3` 使用 `4` - `local_slot_num` 若出现,可出现在 `pto.initialize_l2g2l_pipe` 或 legacy 前端 `pto.aic_initialize_pipe` / `pto.aiv_initialize_pipe` 上,且必须大于 `0` - 且不大于其对应 lowering 规则下的 `slot_num`;global-only GM FIFO 不携带 `local_slot_num` + 且不大于其有效 `slot_num`;global-only GM FIFO 不携带 `local_slot_num` - `flag_base` 若出现,必须满足基本合法性;是否已填写以及具体分配值由 flag 分配保证 - `pto.initialize_l2g2l_pipe` 必须提供 `gm_addr` 或 `gm_slot_tensor`;只有存在 consumer 侧 local FIFO buffer 时才提供 `local_addr` / `peer_local_addr` - `pto.initialize_l2l_pipe` 必须提供 `local_addr` diff --git a/include/PTO/IR/PTOOps.td b/include/PTO/IR/PTOOps.td index 344b399a6..5a7dd18f5 100644 --- a/include/PTO/IR/PTOOps.td +++ b/include/PTO/IR/PTOOps.td @@ -1505,6 +1505,7 @@ def AicInitializePipeOp : PTO_Op<"aic_initialize_pipe", DefaultValuedOptionalAttr:$id, I8Attr:$dir_mask, I32Attr:$slot_size, + OptionalAttr:$slot_num, OptionalAttr:$local_slot_num, OptionalAttr:$nosplit, Optional:$gm_slot_buffer, @@ -1526,6 +1527,7 @@ def AivInitializePipeOp : PTO_Op<"aiv_initialize_pipe", DefaultValuedOptionalAttr:$id, I8Attr:$dir_mask, I32Attr:$slot_size, + OptionalAttr:$slot_num, OptionalAttr:$local_slot_num, OptionalAttr:$nosplit, Optional:$gm_slot_buffer, diff --git a/lib/PTO/IR/PTO.cpp b/lib/PTO/IR/PTO.cpp index 5427ac36e..df2ba8fb5 100644 --- a/lib/PTO/IR/PTO.cpp +++ b/lib/PTO/IR/PTO.cpp @@ -11457,6 +11457,7 @@ static ParseResult parseFrontendInitializePipeOp(OpAsmParser &parser, bool sawId = false; bool sawDirMask = false; bool sawSlotSize = false; + bool sawSlotNum = false; bool sawLocalSlotNum = false; bool sawNoSplit = false; @@ -11495,6 +11496,15 @@ static ParseResult parseFrontendInitializePipeOp(OpAsmParser &parser, "slot_size", attrs)) return failure(); sawSlotSize = true; + } else if (keyword == "slot_num") { + if (sawSlotNum) + return parser.emitError(parser.getCurrentLocation(), + "duplicate 'slot_num' clause"); + IntegerAttr slotNumAttr; + if (parser.parseAttribute(slotNumAttr, parser.getBuilder().getI32Type(), + "slot_num", attrs)) + return failure(); + sawSlotNum = true; } else if (keyword == "local_slot_num") { if (sawLocalSlotNum) return parser.emitError(parser.getCurrentLocation(), @@ -11632,6 +11642,8 @@ static void printFrontendInitializePipeOp(InitOpT op, OpAsmPrinter &p) { printClause("id", op.getId()); printClause("dir_mask", static_cast(op.getDirMask())); printClause("slot_size", op.getSlotSize()); + if (auto slotNumAttr = op.getSlotNumAttr()) + printClause("slot_num", slotNumAttr.getInt()); if (auto localSlotNumAttr = op.getLocalSlotNumAttr()) printClause("local_slot_num", localSlotNumAttr.getInt()); if (auto noSplitAttr = op.getNosplitAttr()) @@ -11658,7 +11670,8 @@ static void printFrontendInitializePipeOp(InitOpT op, OpAsmPrinter &p) { p << ")"; p.printOptionalAttrDict( op->getAttrs(), - /*elidedAttrs=*/{"id", "dir_mask", "slot_size", "local_slot_num", + /*elidedAttrs=*/{"id", "dir_mask", "slot_size", "slot_num", + "local_slot_num", "nosplit", "operandSegmentSizes"}); } @@ -11744,6 +11757,12 @@ static LogicalResult verifyFrontendInitCommon(InitOpT op, return op.emitOpError("expects 'dir_mask' to be 1, 2, or 3"); if (op.getSlotSize() <= 0) return op.emitOpError("expects 'slot_size' to be greater than 0"); + int32_t slotNum = dirMask == 3 ? 4 : 8; + if (auto slotNumAttr = op.getSlotNumAttr()) { + slotNum = slotNumAttr.getInt(); + if (slotNum <= 0) + return op.emitOpError("expects 'slot_num' to be greater than 0"); + } bool hasGlobalSlotTensor = static_cast(op.getGmSlotTensor()); bool hasC2vConsumerBuf = static_cast(op.getC2vConsumerBuf()); @@ -11779,11 +11798,10 @@ static LogicalResult verifyFrontendInitCommon(InitOpT op, int32_t localSlotNum = localSlotNumAttr.getInt(); if (localSlotNum <= 0) return op.emitOpError("expects 'local_slot_num' to be greater than 0"); - int32_t loweredSlotNum = dirMask == 3 ? 4 : 8; - if (localSlotNum > loweredSlotNum) { + if (localSlotNum > slotNum) { return op.emitOpError() - << "expects 'local_slot_num' to be less than or equal to " - << loweredSlotNum << " for dir_mask = " << static_cast(dirMask); + << "expects 'local_slot_num' to be less than or equal to slot_num (" + << slotNum << ") for dir_mask = " << static_cast(dirMask); } } @@ -12060,8 +12078,8 @@ static LogicalResult verifyPipeShape(Operation *op, int8_t dirMask, int32_t slot return op->emitOpError("expects 'dir_mask' to be 1, 2, or 3"); if (slotSize <= 0) return op->emitOpError("expects 'slot_size' to be greater than 0"); - if (slotNum != 4 && slotNum != 8) - return op->emitOpError("expects 'slot_num' to be 4 or 8"); + if (slotNum <= 0) + return op->emitOpError("expects 'slot_num' to be greater than 0"); if (flagBase && *flagBase < 0) return op->emitOpError("expects 'flag_base' to be non-negative when present"); if (flagBase) { diff --git a/lib/PTO/Transforms/PTOLowerFrontendPipeOpsPass.cpp b/lib/PTO/Transforms/PTOLowerFrontendPipeOpsPass.cpp index 162e7e9b5..e54047b09 100644 --- a/lib/PTO/Transforms/PTOLowerFrontendPipeOpsPass.cpp +++ b/lib/PTO/Transforms/PTOLowerFrontendPipeOpsPass.cpp @@ -65,6 +65,15 @@ static void propagateFrontendIdAttr(InitOpT initOp, Operation *pipeOp, rewriter.getI32IntegerAttr(initOp.getId())); } +template +static int32_t getFrontendSlotNum(InitOpT initOp) { + if (auto slotNumAttr = initOp.getSlotNumAttr()) + return slotNumAttr.getInt(); + return initOp.getDirMask() == kBidirectionalDirMask + ? kBidirectionalSlotNum + : kSingleDirectionSlotNum; +} + static std::optional getStaticIndexLikeValue(Value value) { if (auto cst = value.getDefiningOp()) return cst.value(); @@ -166,9 +175,10 @@ static FailureOr lowerSingleDirectionFrontendInit(InitOpT initOp, IRRewriter &rewriter, PTOArch arch, Type pipeTy, int8_t dirMask, Value localAddr) { + int32_t slotNum = getFrontendSlotNum(initOp); auto pipeOr = - createFrontendPipe(initOp, rewriter, arch, pipeTy, dirMask, - kSingleDirectionSlotNum, localAddr); + createFrontendPipe(initOp, rewriter, arch, pipeTy, dirMask, slotNum, + localAddr); if (failed(pipeOr)) return failure(); @@ -190,9 +200,9 @@ template static FailureOr lowerBidirectionalFrontendInit(InitOpT initOp, IRRewriter &rewriter, PTOArch arch, Type pipeTy) { + int32_t slotNum = getFrontendSlotNum(initOp); auto pipeOr = createFrontendPipe(initOp, rewriter, arch, pipeTy, - kBidirectionalDirMask, - kBidirectionalSlotNum, + kBidirectionalDirMask, slotNum, initOp.getC2vConsumerBuf(), initOp.getV2cConsumerBuf()); if (failed(pipeOr)) diff --git a/test/lit/pto/tpush_tpop_frontend_local_slot_num_invalid.pto b/test/lit/pto/tpush_tpop_frontend_local_slot_num_invalid.pto index 67084f2ac..6f0f76cc6 100644 --- a/test/lit/pto/tpush_tpop_frontend_local_slot_num_invalid.pto +++ b/test/lit/pto/tpush_tpop_frontend_local_slot_num_invalid.pto @@ -12,4 +12,4 @@ module { } } -// CHECK: error: 'pto.aic_initialize_pipe' op expects 'local_slot_num' to be less than or equal to 4 for dir_mask = 3 +// CHECK: error: 'pto.aic_initialize_pipe' op expects 'local_slot_num' to be less than or equal to slot_num (4) for dir_mask = 3 diff --git a/test/lit/pto/tpush_tpop_frontend_slot_num_a3.pto b/test/lit/pto/tpush_tpop_frontend_slot_num_a3.pto new file mode 100644 index 000000000..4195e7544 --- /dev/null +++ b/test/lit/pto/tpush_tpop_frontend_slot_num_a3.pto @@ -0,0 +1,49 @@ +// RUN: ptoas --pto-arch=a3 %s 2>&1 | FileCheck %s --check-prefix=A3 + +module { + func.func @cube_kernel(%gm_slot_buffer: !pto.ptr) + attributes {pto.kernel_kind = #pto.kernel_kind} { + %c0_i32 = arith.constant 0 : i32 + %v2c_local = pto.reserve_buffer { + name = "v2c_fifo", + size = 2048, + location = #pto.address_space, + auto = true + } -> i32 + pto.aic_initialize_pipe {id = 0, dir_mask = 2, slot_size = 1024, slot_num = 2} + (gm_slot_buffer = %gm_slot_buffer : !pto.ptr, + c2v_consumer_buf = %c0_i32 : i32, + v2c_consumer_buf = %v2c_local : i32) + + %recv_tile = pto.tpop_from_aiv {id = 0, split = 0} + -> !pto.tile_buf + pto.tfree_from_aiv {id = 0, split = 0} + return + } + + func.func @vector_kernel(%gm_slot_buffer: !pto.ptr) + attributes {pto.kernel_kind = #pto.kernel_kind} { + %c0_i32 = arith.constant 0 : i32 + %v2c_import = pto.import_reserved_buffer { + name = "v2c_fifo", + peer_func = @cube_kernel + } -> i32 + pto.aiv_initialize_pipe {id = 0, dir_mask = 2, slot_size = 1024, slot_num = 2} + (gm_slot_buffer = %gm_slot_buffer : !pto.ptr, + c2v_consumer_buf = %c0_i32 : i32, + v2c_consumer_buf = %v2c_import : i32) + + %vec_tile = pto.alloc_tile : !pto.tile_buf + pto.tpush_to_aic(%vec_tile : !pto.tile_buf) {id = 0, split = 0} + return + } +} + +// A3-LABEL: AICORE void cube_kernel(__gm__ float* +// A3: auto {{v[0-9]+}} = TPipe<0, Direction::DIR_V2C, 1024, 2, 2, true>( +// A3: TPOP +// A3: TFREE, TileSplitAxis::TILE_NO_SPLIT>( + +// A3-LABEL: AICORE void vector_kernel(__gm__ float* +// A3: auto {{v[0-9]+}} = TPipe<0, Direction::DIR_V2C, 1024, 2, 2, true>( +// A3: TPUSH diff --git a/test/lit/pto/tpush_tpop_frontend_slot_num_invalid.pto b/test/lit/pto/tpush_tpop_frontend_slot_num_invalid.pto new file mode 100644 index 000000000..7648f291b --- /dev/null +++ b/test/lit/pto/tpush_tpop_frontend_slot_num_invalid.pto @@ -0,0 +1,15 @@ +// RUN: not ptoas --pto-arch=a3 %s 2>&1 | FileCheck %s + +module { + func.func @cube_kernel(%gm_slot_buffer: !pto.ptr) + attributes {pto.kernel_kind = #pto.kernel_kind} { + %c0_i32 = arith.constant 0 : i32 + pto.aic_initialize_pipe {id = 0, dir_mask = 1, slot_size = 1024, slot_num = 0} + (gm_slot_buffer = %gm_slot_buffer : !pto.ptr, + c2v_consumer_buf = %c0_i32 : i32, + v2c_consumer_buf = %c0_i32 : i32) + return + } +} + +// CHECK: error: 'pto.aic_initialize_pipe' op expects 'slot_num' to be greater than 0 diff --git a/test/lit/pto/tpush_tpop_frontend_slot_num_local_invalid.pto b/test/lit/pto/tpush_tpop_frontend_slot_num_local_invalid.pto new file mode 100644 index 000000000..3f7a3da25 --- /dev/null +++ b/test/lit/pto/tpush_tpop_frontend_slot_num_local_invalid.pto @@ -0,0 +1,15 @@ +// RUN: not ptoas --pto-arch=a3 %s 2>&1 | FileCheck %s + +module { + func.func @cube_kernel(%gm_slot_buffer: !pto.ptr) + attributes {pto.kernel_kind = #pto.kernel_kind} { + %c0_i32 = arith.constant 0 : i32 + pto.aic_initialize_pipe {id = 0, dir_mask = 2, slot_size = 1024, slot_num = 2, local_slot_num = 3} + (gm_slot_buffer = %gm_slot_buffer : !pto.ptr, + c2v_consumer_buf = %c0_i32 : i32, + v2c_consumer_buf = %c0_i32 : i32) + return + } +} + +// CHECK: error: 'pto.aic_initialize_pipe' op expects 'local_slot_num' to be less than or equal to slot_num (2) for dir_mask = 2 diff --git a/test/lit/pto/tpush_tpop_internal_slot_num_a3.pto b/test/lit/pto/tpush_tpop_internal_slot_num_a3.pto new file mode 100644 index 000000000..2d314ad3f --- /dev/null +++ b/test/lit/pto/tpush_tpop_internal_slot_num_a3.pto @@ -0,0 +1,20 @@ +// RUN: ptoas --pto-arch=a3 %s 2>&1 | FileCheck %s --check-prefix=A3 + +module { + func.func @cube_kernel(%gm_slot_buffer: memref<256xf32, #pto.address_space>, + %c2v_consumer_buf: i32) + attributes {pto.kernel_kind = #pto.kernel_kind} { + %pipe = pto.initialize_l2g2l_pipe { + dir_mask = 1, + slot_size = 1024, + slot_num = 2, + local_slot_num = 1, + flag_base = 0 + }(%gm_slot_buffer : memref<256xf32, #pto.address_space>, + %c2v_consumer_buf : i32) -> !pto.pipe + return + } +} + +// A3-LABEL: AICORE void cube_kernel( +// A3: auto {{v[0-9]+}} = TPipe<0, Direction::DIR_C2V, 1024, 2, 1, false>( From 046f0db97d0cbd653d578d791edb8883ab477b68 Mon Sep 17 00:00:00 2001 From: zhangstevenunity <128771452+zhangstevenunity@users.noreply.github.com> Date: Thu, 28 May 2026 11:03:24 +0800 Subject: [PATCH 2/5] Reject frontend local_slot_num on A5 --- docs/PTO_IR_manual.md | 3 +++ docs/designs/ptoas-tpush-tpop-design.md | 3 ++- lib/PTO/IR/PTO.cpp | 6 +++++- ...ush_tpop_frontend_local_slot_num_a5_invalid.pto | 14 ++++++++++++++ 4 files changed, 24 insertions(+), 2 deletions(-) create mode 100644 test/lit/pto/tpush_tpop_frontend_local_slot_num_a5_invalid.pto diff --git a/docs/PTO_IR_manual.md b/docs/PTO_IR_manual.md index 27031046c..1175ec639 100644 --- a/docs/PTO_IR_manual.md +++ b/docs/PTO_IR_manual.md @@ -8561,6 +8561,9 @@ pto.aic_initialize_pipe {id = 0, dir_mask = 1, slot_size = 1024, nosplit = true} - If `slot_num` is present, it must be greater than `0` - If `local_slot_num` is present, it must be greater than `0` and no greater than the effective `slot_num` +- On A5, `local_slot_num` must be omitted; A5 frontend pipes lower to + `pto.initialize_l2l_pipe`, which does not use a local FIFO slot-count + template parameter - A global-only GM FIFO initialize carries only `gm_slot_tensor`; it must not carry `gm_slot_buffer`, `local_slot_num`, `c2v_consumer_buf`, or `v2c_consumer_buf`; it may carry `slot_num` diff --git a/docs/designs/ptoas-tpush-tpop-design.md b/docs/designs/ptoas-tpush-tpop-design.md index 776a5bc88..d6ec19509 100644 --- a/docs/designs/ptoas-tpush-tpop-design.md +++ b/docs/designs/ptoas-tpush-tpop-design.md @@ -666,6 +666,7 @@ pto.tfree(%entry, %pipe : !pto.tensor_view<128x512xf32>, !pto.pipe) {split = 0} #### A5 - `pto.aic_initialize_pipe` 和 `pto.aiv_initialize_pipe` lower 为 `pto.initialize_l2l_pipe` +- A5 不支持 `local_slot_num`;前端 init 若显式携带该属性,verifier 会报错 ### 6.2 `DIR_MASK=1/2` @@ -1013,7 +1014,7 @@ pass 在模块级按两步执行: `slot_num`;缺省时 `DIR_MASK=1/2` 使用 `8`,`DIR_MASK=3` 使用 `4` - `local_slot_num` 若出现,可出现在 `pto.initialize_l2g2l_pipe` 或 legacy 前端 `pto.aic_initialize_pipe` / `pto.aiv_initialize_pipe` 上,且必须大于 `0` - 且不大于其有效 `slot_num`;global-only GM FIFO 不携带 `local_slot_num` + 且不大于其有效 `slot_num`;A5 和 global-only GM FIFO 不携带 `local_slot_num` - `flag_base` 若出现,必须满足基本合法性;是否已填写以及具体分配值由 flag 分配保证 - `pto.initialize_l2g2l_pipe` 必须提供 `gm_addr` 或 `gm_slot_tensor`;只有存在 consumer 侧 local FIFO buffer 时才提供 `local_addr` / `peer_local_addr` - `pto.initialize_l2l_pipe` 必须提供 `local_addr` diff --git a/lib/PTO/IR/PTO.cpp b/lib/PTO/IR/PTO.cpp index df2ba8fb5..37d0dd8e3 100644 --- a/lib/PTO/IR/PTO.cpp +++ b/lib/PTO/IR/PTO.cpp @@ -11763,6 +11763,7 @@ static LogicalResult verifyFrontendInitCommon(InitOpT op, if (slotNum <= 0) return op.emitOpError("expects 'slot_num' to be greater than 0"); } + PTOArch arch = getTargetArch(op.getOperation()); bool hasGlobalSlotTensor = static_cast(op.getGmSlotTensor()); bool hasC2vConsumerBuf = static_cast(op.getC2vConsumerBuf()); @@ -11776,7 +11777,7 @@ static LogicalResult verifyFrontendInitCommon(InitOpT op, if (op.getLocalSlotNumAttr()) return op.emitOpError( "globaltensor pipe init does not use 'local_slot_num'"); - if (getTargetArch(op.getOperation()) == PTOArch::A5) { + if (arch == PTOArch::A5) { return op.emitOpError( "globaltensor pipe entries are supported for a2/a3 l2g2l pipes"); } @@ -11795,6 +11796,9 @@ static LogicalResult verifyFrontendInitCommon(InitOpT op, } if (auto localSlotNumAttr = op.getLocalSlotNumAttr()) { + if (arch == PTOArch::A5) + return op.emitOpError( + "'local_slot_num' is only supported for a2/a3 frontend pipe lowering"); int32_t localSlotNum = localSlotNumAttr.getInt(); if (localSlotNum <= 0) return op.emitOpError("expects 'local_slot_num' to be greater than 0"); diff --git a/test/lit/pto/tpush_tpop_frontend_local_slot_num_a5_invalid.pto b/test/lit/pto/tpush_tpop_frontend_local_slot_num_a5_invalid.pto new file mode 100644 index 000000000..06b9d25a3 --- /dev/null +++ b/test/lit/pto/tpush_tpop_frontend_local_slot_num_a5_invalid.pto @@ -0,0 +1,14 @@ +// RUN: not ptoas --pto-arch=a5 %s 2>&1 | FileCheck %s + +module { + func.func @cube_kernel() + attributes {pto.kernel_kind = #pto.kernel_kind} { + %c0_i32 = arith.constant 0 : i32 + pto.aic_initialize_pipe {id = 0, dir_mask = 1, slot_size = 1024, local_slot_num = 1} + (c2v_consumer_buf = %c0_i32 : i32, + v2c_consumer_buf = %c0_i32 : i32) + return + } +} + +// CHECK: error: 'pto.aic_initialize_pipe' op 'local_slot_num' is only supported for a2/a3 frontend pipe lowering From 5a8f51272cfe7555d7721e79ac9bfea9c51d756a Mon Sep 17 00:00:00 2001 From: zhangstevenunity <128771452+zhangstevenunity@users.noreply.github.com> Date: Thu, 28 May 2026 11:20:45 +0800 Subject: [PATCH 3/5] Handle top-level backward sync in GSS --- lib/PTO/Transforms/GraphSyncSolver/SyncSolver.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/PTO/Transforms/GraphSyncSolver/SyncSolver.cpp b/lib/PTO/Transforms/GraphSyncSolver/SyncSolver.cpp index 23a4032a6..e4c9ff8e3 100644 --- a/lib/PTO/Transforms/GraphSyncSolver/SyncSolver.cpp +++ b/lib/PTO/Transforms/GraphSyncSolver/SyncSolver.cpp @@ -128,7 +128,8 @@ bool Solver::checkSkipParallelLoop(Occurrence *occ1, Occurrence *occ2) { auto [parOcc1, parOcc2] = Occurrence::getLCAPair(occ1, occ2); assert(parOcc1 != nullptr && parOcc2 != nullptr); auto *parentLCALoopOcc = Occurrence::getParentloop(parOcc1); - assert(parentLCALoopOcc != nullptr); + if (parentLCALoopOcc == nullptr) + return false; auto *parentLCALoopOp = llvm::cast(parentLCALoopOcc->op); return parentLCALoopOp->isParallel; } From 996ff697f6051f9f3b423b2a73efd72c3e1196fc Mon Sep 17 00:00:00 2001 From: zhangstevenunity <128771452+zhangstevenunity@users.noreply.github.com> Date: Thu, 28 May 2026 12:52:39 +0800 Subject: [PATCH 4/5] Clarify reserve_buffer sizing for pipe slots --- docs/PTO_IR_manual.md | 16 ++++++++++++++-- docs/designs/ptoas-tpush-tpop-design.md | 17 ++++++++++++++--- 2 files changed, 28 insertions(+), 5 deletions(-) diff --git a/docs/PTO_IR_manual.md b/docs/PTO_IR_manual.md index 1175ec639..a30ea815a 100644 --- a/docs/PTO_IR_manual.md +++ b/docs/PTO_IR_manual.md @@ -8355,6 +8355,12 @@ frontend/framework generated IR. The detailed design document is: On A2/A3 it overrides the default consumer-side local FIFO slot count only when the pipe uses a local consumer FIFO buffer. Global-only GM FIFO pipes omit it. +- `pto.reserve_buffer.size` is the byte size of the consumer-side local FIFO + buffer. For A2/A3 local FIFO pipes, it should be + `slot_size * effective_local_slot_num`, where `effective_local_slot_num` is + the explicit `local_slot_num` when present or the effective `slot_num` + otherwise. For A5 local FIFO pipes, `local_slot_num` is not configurable and + the reserved byte size should be `slot_size * 4`. - `nosplit` is an optional compile-time boolean attribute on `pto.aic_initialize_pipe` / `pto.aiv_initialize_pipe`. - `split` is a compile-time attribute, not a runtime SSA operand. @@ -8446,7 +8452,9 @@ When the address is already fixed in the input IR: **Arguments:** - `name`: string attribute identifying the logical reserved buffer -- `size`: reserved buffer size in bytes +- `size`: reserved buffer size in bytes. For A2/A3 local FIFO pipes this is + `slot_size * effective_local_slot_num`; for A5 local FIFO pipes this is + `slot_size * 4`. Global-only GM FIFO pipes do not use `pto.reserve_buffer`. - `location`: local address-space attribute, typically `vec` or `mat` - `auto`: boolean allocation-mode flag in textual IR - `base`: optional explicit local base address @@ -8457,6 +8465,9 @@ When the address is already fixed in the input IR: - Multiple `pto.reserve_buffer` ops are allowed in one function, but `name` must be unique within that function +- `size` must be greater than `0`; PTOAS allocates exactly the requested byte + size, so it should match the local FIFO sizing rule of the pipe that consumes + this buffer - `location` must be a supported local address space - Op-level verification requires: - `auto = false` must provide `base` @@ -8563,7 +8574,8 @@ pto.aic_initialize_pipe {id = 0, dir_mask = 1, slot_size = 1024, nosplit = true} than the effective `slot_num` - On A5, `local_slot_num` must be omitted; A5 frontend pipes lower to `pto.initialize_l2l_pipe`, which does not use a local FIFO slot-count - template parameter + template parameter. Its consumer-side `pto.reserve_buffer.size` should be + `slot_size * 4` - A global-only GM FIFO initialize carries only `gm_slot_tensor`; it must not carry `gm_slot_buffer`, `local_slot_num`, `c2v_consumer_buf`, or `v2c_consumer_buf`; it may carry `slot_num` diff --git a/docs/designs/ptoas-tpush-tpop-design.md b/docs/designs/ptoas-tpush-tpop-design.md index d6ec19509..6e8fe2d36 100644 --- a/docs/designs/ptoas-tpush-tpop-design.md +++ b/docs/designs/ptoas-tpush-tpop-design.md @@ -380,7 +380,12 @@ func.func @vector_kernel(%gm_slot_buffer : !pto.ptr, - 单函数允许多条 `import_reserved_buffer` - `DIR_MASK` 只允许 `1`、`2`、`3` - `SLOT_SIZE > 0` -- 使用 consumer 侧 local FIFO buffer 时,`reserve_buffer.size == SLOT_SIZE * SLOT_NUM` +- 使用 consumer 侧 local FIFO buffer 时,`reserve_buffer.size` 表示该 + consumer FIFO 实际预留的本地字节数。A2/A3 GM FIFO 路径要求 + `reserve_buffer.size == SLOT_SIZE * EFFECTIVE_LOCAL_SLOT_NUM`,其中 + `EFFECTIVE_LOCAL_SLOT_NUM` 为显式 `local_slot_num`,缺省时为有效 + `slot_num`。A5 L2L 路径不支持 `local_slot_num`,要求 + `reserve_buffer.size == SLOT_SIZE * 4` - 使用 consumer 侧 local FIFO buffer 时,C2V consumer 的 `reserve_buffer.location` 必须是 `VEC` - 使用 consumer 侧 local FIFO buffer 时,V2C consumer 的 `reserve_buffer.location` 必须是 `MAT` - `reserve_buffer.name` 在本函数内必须唯一 @@ -515,6 +520,8 @@ DIR_BOTH 示例: `pto.aic_initialize_pipe` / `pto.aiv_initialize_pipe` 提供并在 A2/A3 lowering 时转发 - 表示 GM 路径下 consumer 侧 local slot buffer 的槽数,仅在存在 local FIFO buffer 的 tile-entry 路径有意义 - 仅在通过 GM 传递时对底层 `TPipe` 模板参数有意义,不改变 GM FIFO 的 `slot_num` + - A2/A3 consumer 侧 `reserve_buffer.size` 应按 + `slot_size * effective_local_slot_num` 预留 - 存在 local FIFO buffer 且缺省时,默认值等于该内部 pipe 的 `slot_num` - 因此前端未显式指定 `slot_num` 时: - `DIR_MASK=1/2` 直接 lowering 时,`local_slot_num = 8` @@ -667,6 +674,8 @@ pto.tfree(%entry, %pipe : !pto.tensor_view<128x512xf32>, !pto.pipe) {split = 0} - `pto.aic_initialize_pipe` 和 `pto.aiv_initialize_pipe` lower 为 `pto.initialize_l2l_pipe` - A5 不支持 `local_slot_num`;前端 init 若显式携带该属性,verifier 会报错 +- A5 的 consumer 侧 `reserve_buffer.size` 不由 `local_slot_num` 决定;当前 + L2L pipe 约定按 `slot_size * 4` 预留本地 FIFO buffer ### 6.2 `DIR_MASK=1/2` @@ -978,13 +987,15 @@ pass 在模块级按两步执行: ### 9.1 前端 verifier -前端 verifier 负责检查: +前端 IR 需满足以下约束: - 每个函数 init op 数量是否合法 - 每个函数 `reserve_buffer` / `import_reserved_buffer` 数量是否合法 - `DIR_MASK` 取值是否合法 - `SLOT_SIZE > 0` -- 使用 consumer 侧 local FIFO buffer 时,`reserve_buffer.size == SLOT_SIZE * SLOT_NUM` +- 使用 consumer 侧 local FIFO buffer 时,`reserve_buffer.size` 必须匹配对应 + pipe 的本地 FIFO 字节数:A2/A3 GM FIFO 路径为 + `SLOT_SIZE * EFFECTIVE_LOCAL_SLOT_NUM`,A5 L2L 路径为 `SLOT_SIZE * 4` - 使用 consumer 侧 local FIFO buffer 时,`reserve_buffer.location` 与 consumer 函数类型匹配 - `reserve_buffer.name` 在函数内唯一 - `import_reserved_buffer` 的 `(name, peer_func)` 在函数内唯一 From e90932529bc3f55ee1f6c299d5febf92a115d765 Mon Sep 17 00:00:00 2001 From: zhangstevenunity <128771452+zhangstevenunity@users.noreply.github.com> Date: Thu, 28 May 2026 13:09:03 +0800 Subject: [PATCH 5/5] Correct A5 reserve_buffer slot sizing docs --- docs/PTO_IR_manual.md | 13 +++++++------ docs/designs/ptoas-tpush-tpop-design.md | 12 ++++++++---- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/docs/PTO_IR_manual.md b/docs/PTO_IR_manual.md index a30ea815a..1050009dc 100644 --- a/docs/PTO_IR_manual.md +++ b/docs/PTO_IR_manual.md @@ -8347,9 +8347,9 @@ frontend/framework generated IR. The detailed design document is: - `slot_size` is expressed in bytes and uses the pre-split logical pipe-entry size. - `slot_num` is an optional compile-time integer attribute on - `pto.aic_initialize_pipe` / `pto.aiv_initialize_pipe`. It controls the GM - ring FIFO depth and defaults to `8` for `dir_mask = 1/2` or `4` for - `dir_mask = 3`. + `pto.aic_initialize_pipe` / `pto.aiv_initialize_pipe`. It controls the pipe + FIFO depth. The `effective_slot_num` is the explicit value when present, or + the default value: `8` for `dir_mask = 1/2` or `4` for `dir_mask = 3`. - `local_slot_num` is an optional compile-time integer attribute on `pto.aic_initialize_pipe` / `pto.aiv_initialize_pipe`. On A2/A3 it overrides the default consumer-side local FIFO slot count only @@ -8360,7 +8360,7 @@ frontend/framework generated IR. The detailed design document is: `slot_size * effective_local_slot_num`, where `effective_local_slot_num` is the explicit `local_slot_num` when present or the effective `slot_num` otherwise. For A5 local FIFO pipes, `local_slot_num` is not configurable and - the reserved byte size should be `slot_size * 4`. + the reserved byte size should be `slot_size * effective_slot_num`. - `nosplit` is an optional compile-time boolean attribute on `pto.aic_initialize_pipe` / `pto.aiv_initialize_pipe`. - `split` is a compile-time attribute, not a runtime SSA operand. @@ -8454,7 +8454,8 @@ When the address is already fixed in the input IR: - `name`: string attribute identifying the logical reserved buffer - `size`: reserved buffer size in bytes. For A2/A3 local FIFO pipes this is `slot_size * effective_local_slot_num`; for A5 local FIFO pipes this is - `slot_size * 4`. Global-only GM FIFO pipes do not use `pto.reserve_buffer`. + `slot_size * effective_slot_num`. Global-only GM FIFO pipes do not use + `pto.reserve_buffer`. - `location`: local address-space attribute, typically `vec` or `mat` - `auto`: boolean allocation-mode flag in textual IR - `base`: optional explicit local base address @@ -8575,7 +8576,7 @@ pto.aic_initialize_pipe {id = 0, dir_mask = 1, slot_size = 1024, nosplit = true} - On A5, `local_slot_num` must be omitted; A5 frontend pipes lower to `pto.initialize_l2l_pipe`, which does not use a local FIFO slot-count template parameter. Its consumer-side `pto.reserve_buffer.size` should be - `slot_size * 4` + `slot_size * effective_slot_num` - A global-only GM FIFO initialize carries only `gm_slot_tensor`; it must not carry `gm_slot_buffer`, `local_slot_num`, `c2v_consumer_buf`, or `v2c_consumer_buf`; it may carry `slot_num` diff --git a/docs/designs/ptoas-tpush-tpop-design.md b/docs/designs/ptoas-tpush-tpop-design.md index 6e8fe2d36..13526419c 100644 --- a/docs/designs/ptoas-tpush-tpop-design.md +++ b/docs/designs/ptoas-tpush-tpop-design.md @@ -385,7 +385,9 @@ func.func @vector_kernel(%gm_slot_buffer : !pto.ptr, `reserve_buffer.size == SLOT_SIZE * EFFECTIVE_LOCAL_SLOT_NUM`,其中 `EFFECTIVE_LOCAL_SLOT_NUM` 为显式 `local_slot_num`,缺省时为有效 `slot_num`。A5 L2L 路径不支持 `local_slot_num`,要求 - `reserve_buffer.size == SLOT_SIZE * 4` + `reserve_buffer.size == SLOT_SIZE * EFFECTIVE_SLOT_NUM`。这里的 + `EFFECTIVE_SLOT_NUM` 为显式 `slot_num`,缺省时 `DIR_MASK=1/2` 为 `8`、 + `DIR_MASK=3` 为 `4` - 使用 consumer 侧 local FIFO buffer 时,C2V consumer 的 `reserve_buffer.location` 必须是 `VEC` - 使用 consumer 侧 local FIFO buffer 时,V2C consumer 的 `reserve_buffer.location` 必须是 `MAT` - `reserve_buffer.name` 在本函数内必须唯一 @@ -674,8 +676,9 @@ pto.tfree(%entry, %pipe : !pto.tensor_view<128x512xf32>, !pto.pipe) {split = 0} - `pto.aic_initialize_pipe` 和 `pto.aiv_initialize_pipe` lower 为 `pto.initialize_l2l_pipe` - A5 不支持 `local_slot_num`;前端 init 若显式携带该属性,verifier 会报错 -- A5 的 consumer 侧 `reserve_buffer.size` 不由 `local_slot_num` 决定;当前 - L2L pipe 约定按 `slot_size * 4` 预留本地 FIFO buffer +- A5 的 consumer 侧 `reserve_buffer.size` 不由 `local_slot_num` 决定;A5 + L2L pipe 本地 FIFO 地址按 `slot_num` 取模,按 + `slot_size * effective_slot_num` 预留本地 FIFO buffer ### 6.2 `DIR_MASK=1/2` @@ -995,7 +998,8 @@ pass 在模块级按两步执行: - `SLOT_SIZE > 0` - 使用 consumer 侧 local FIFO buffer 时,`reserve_buffer.size` 必须匹配对应 pipe 的本地 FIFO 字节数:A2/A3 GM FIFO 路径为 - `SLOT_SIZE * EFFECTIVE_LOCAL_SLOT_NUM`,A5 L2L 路径为 `SLOT_SIZE * 4` + `SLOT_SIZE * EFFECTIVE_LOCAL_SLOT_NUM`,A5 L2L 路径为 + `SLOT_SIZE * EFFECTIVE_SLOT_NUM` - 使用 consumer 侧 local FIFO buffer 时,`reserve_buffer.location` 与 consumer 函数类型匹配 - `reserve_buffer.name` 在函数内唯一 - `import_reserved_buffer` 的 `(name, peer_func)` 在函数内唯一