[AIEX] enable VextrBcstShfl with copy-propagate G_IMPLICIT_DEF#802
[AIEX] enable VextrBcstShfl with copy-propagate G_IMPLICIT_DEF#802F-Stuckmann merged 3 commits intoaie-publicfrom
Conversation
6b26173 to
c324e22
Compare
| const MachineInstr *SrcDef = MRI.getVRegDef(SrcReg); | ||
| if (!SrcDef || SrcDef->getOpcode() != TargetOpcode::G_IMPLICIT_DEF) | ||
| return false; | ||
| return MRI.getType(DstReg) == MRI.getType(SrcReg); |
There was a problem hiding this comment.
nit: G_IMPLICIT_DEF doesn't have type-specific semantics, we can create it in the required type. Hence we don't need this condition.
| const RegisterBank *DstRB = MRI.getRegBankOrNull(DstReg); | ||
| const RegisterBank *SrcRB = MRI.getRegBankOrNull(SrcReg); | ||
| if (DstRB == SrcRB) { | ||
| MRI.replaceRegWith(DstReg, SrcReg); |
There was a problem hiding this comment.
nit: I don't there's merit in reusing IMPLICIT_DEF. I'd just always create a new one. It will blindly fulfill single-use constraints etc.
martien-de-jong
left a comment
There was a problem hiding this comment.
Please adjust the title to something like copy-propagate G_IMPLICIT_DEF.
I would simplify it to always create a pristine G_IMPLICIT_DEF with the proper type and regbank. Less conditions, less required testing, no unnecessary multi-use nodes.
6f2f7bf to
236ebed
Compare
|
@martien-de-jong check out that the simplification breaks a CSE in the fixup commits. Before we had a common Implicit def and CSE could delete common code, now with a fresh implicit def, we have to reproduce exaclty the same behaviour twice. |
236ebed to
45e0971
Compare
|
Removed the simplification because touching CSE is outside of the scope of this PR |
Add test cases to inst-select-vextbcstshfl.mir that exercise the VEXTBCSTSHFL pattern when the second vshuffle operand is a COPY of G_IMPLICIT_DEF through fiforegbank rather than a direct G_IMPLICIT_DEF. The cross-bank copy prevents ISel from recognizing the operand as undef, resulting in separate VEXTRACT + VBCST + VSHUFFLE instructions instead of a single VEXTBCSTSHFL. New test variants: - 64/32/16-bit extract+broadcast+shuffle with cross-bank COPY of undef - Multi-use: broadcast feeding two shuffles with different modes
After register bank selection, cross-bank copies of G_IMPLICIT_DEF prevent ISel patterns from recognizing undef operands. For example, the VEXTBCSTSHFL pattern requires its second vshuffle operand to be undef, but a COPY from fiforegbank to vregbank makes it invisible to the pattern matcher. Add a new Pre-ISel combiner pass that runs between RegBankSelect and InstructionSelect. It replaces COPY of G_IMPLICIT_DEF with a new G_IMPLICIT_DEF in the destination register bank. For same-bank copies the destination register is replaced directly with the source. For cross-bank copies a new G_IMPLICIT_DEF is created with the correct type and bank. This enables 16 vextbcstshfl.64 instructions in Conv2D bf16 kernels that were previously emitted as separate vbcst.64 + vshuffle pairs.
Add pipeline test entries and RUN lines for aie2ps target.
45e0971 to
69f8255
Compare
Instruction matching cannot look through copies of Implicit def, if an implicit Def is needed.
This can happen if an implicit Def is assigned to a Register Bank that is different to the one expected in the VextrBcstShfl instruction matcher(i.e. FIFOreg).
This PR replaces these copies.
Note: the big gains for this PR come from removing Copies when source and destination have the same regbank.