Conversation
| %feature3 = linalg.matmul(%feature2, %foldedWeight2) | ||
| return %feature3 | ||
| } | ||
| compiletime_fold(%weight1: tensor<*xbf16>) -> %foldedWeight0, %foldedWeight1: tensor<*xbf16>, tensor<*xbf16> { |
There was a problem hiding this comment.
compiletime_fold will be a problem because:
- The kernel binary will contain the whole tensor to be folded, which is too large. If we only want to use
runtime_foldfrom it, the binary size is wasted and this is not friendly for kernel cache. - For GPU device, we may want to do the folding on CPU. We shouldn't put three functions in the same module to achieve that.
If we want to support direct compile-time folding, I suggest following the direction of section 2.1 to implement it.
There was a problem hiding this comment.
For the compile time available tensors, the integration can choose to:
- lower them to arith.constant, which is not suggested,
- put them into the arguments list of the module and mark them as
compiletime_const_args, - put them into the arguments list of the module and mark them as
runtime_const_args.
For the first two choices, they will be folded by compiletime_fold, and for the third choice, by runtime_fold. There will be no large literal tensors in the kernel for the last two choices.
There was a problem hiding this comment.
We shouldn't put three functions in the same module to achieve that.
I'm not clear if we can generate a new module in the pass pipeline. If so, shall we put the compiletime_fold in one module, and runtime_fold and compute in another module?
There was a problem hiding this comment.
I'm not clear if we can generate a new module in the pass pipeline. If so, shall we put the
compiletime_foldin one module, andruntime_foldandcomputein another module?
I think so. But my current thinking is, we can support compiletime_fold in the future when there's a demand for this.
There was a problem hiding this comment.
OK, I will put all folding operations into runtime_fold.
|
waiting from OV integration, how they handle const weight..? |
lmontigny
left a comment
There was a problem hiding this comment.
RFC LGTM
Do we have an idea of the performance impact of this pass on some example?
Ideally, the execution time of operations on weights of matmuls, such as |
Related to issue #146 .