[AIEX][AIE2P] Several new combiners to simplify vector operations#772
[AIEX][AIE2P] Several new combiners to simplify vector operations#772andcarminati merged 16 commits intoaie-publicfrom
Conversation
3b74690 to
2fbd949
Compare
| const uint64_t Mask = MaskVal->getZExtValue(); | ||
| if (Mask != 0) { | ||
| const unsigned HighestBit = 63 - llvm::countl_zero(Mask); | ||
| MaxElement = std::max(MaxElement, HighestBit); |
There was a problem hiding this comment.
do we expect multiple vsels in the register users?
There was a problem hiding this comment.
We look to all the users, it is too restrictive to assume just one user.
2fbd949 to
0780d74
Compare
| const Register IdxReg = User.getOperand(2).getReg(); | ||
| const auto Idx = getIConstantVRegVal(IdxReg, MRI); | ||
| if (Idx) { | ||
| MaxElement = |
There was a problem hiding this comment.
why don't we return here? why should we have multiple G_EXTRACT_VECTOR_ELT s coupled with G_UNMERGE_VALUES and return the maximum? aren't we then confusing combine patterns?
There was a problem hiding this comment.
You can see the name of the function, this is just a helper function that we use to try prove that the upper part of a vector is not used, so we cannot just return max here because we may have other users accessing even higher elements. But, maybe you are talking about an early return in case if !Idx - this I agree, it will be part of the next version.
0780d74 to
c344bcb
Compare
| /// %zero_broadcast = G_AIE_BROADCAST_VECTOR %zero_scalar | ||
| /// %zero_lo, %zero_hi = G_UNMERGE_VALUES %zero_broadcast | ||
| /// %result = G_CONCAT_VECTORS %src, %zero_hi | ||
| bool llvm::matchVShiftChainToZeroPad(MachineInstr &MI, MachineRegisterInfo &MRI, |
There was a problem hiding this comment.
wouldn't it be more generic to first combine two vshifts into a single one and in the second stop combine shift to a unmerge?
| // First shift: adds padding (quarter of source size) | ||
| const unsigned ExpectedShift1 = SrcSizeInBytes / 4; | ||
| // Second shift: brings in zeros (1.5x source size) | ||
| const unsigned ExpectedShift2 = (SrcSizeInBytes + SrcSizeInBytes / 2) / 2; |
There was a problem hiding this comment.
nit: I think it is sufficient to check the total shift size, not some predetermined shift amounts.
There was a problem hiding this comment.
I am taking a look.
| ; SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| ; | ||
| ; (c) Copyright 2025 Advanced Micro Devices, Inc. or its affiliates | ||
| ; (c) Copyright 2025-2025 Advanced Micro Devices, Inc. or its affiliates |
There was a problem hiding this comment.
nit: this line can remain unchanged for now
| # RUN: -verify-machineinstrs -o - | FileCheck %s | ||
|
|
||
| # Tests for VSHIFT chain combiner that optimizes VSHIFT sequences into | ||
| # CONCAT operations with usage-aware padding (UNDEF vs zeros). |
There was a problem hiding this comment.
nit: also add a unit-test with different shift constants
e4f21e8 to
0619d47
Compare
e9cecaa to
681e6e2
Compare
681e6e2 to
686e173
Compare
| @@ -419,6 +419,96 @@ bool isUseOf(const MachineInstr &MI, const MachineInstr &Def) { | |||
| return false; | |||
| } | |||
|
|
|||
| bool llvm::matchUnpadUnmerge(MachineInstr &UnpadMI, MachineRegisterInfo &MRI, | |||
There was a problem hiding this comment.
Can I have a short description of what we're trying to match?
| } | ||
|
|
||
| // 4. Check dominance (one must dominate the other) | ||
| const bool UnpadDominates = Helper.dominates(UnpadMI, *UnmergeMI); |
There was a problem hiding this comment.
I suggest to setup a dedicated transform lambda and return if it dominates. Otherwise, check the reverse dominance and setup the reverse transform lambda. We don't need to check things we don't need, and we don't need to capture unnecessary data in the lambda.
There was a problem hiding this comment.
I will keep as is this part.
| // 3 elements, then 8 % 3 = 2 (NOT aligned), and we can't | ||
| // cleanly extract 8 elements using 3-element chunks. | ||
| if (UnpadElements % ConcatOpElements != 0) | ||
| return false; |
There was a problem hiding this comment.
Do we have automatic guarantees about matching element types?
There was a problem hiding this comment.
I guess yes, but I prefer to be defensive here.
|
|
||
| if (OpMI->getOpcode() == TargetOpcode::G_CONCAT_VECTORS) { | ||
| // Nested CONCAT - flatten by extracting all its operands | ||
| for (unsigned J = 1; J < OpMI->getNumOperands(); ++J) { |
There was a problem hiding this comment.
nit: I think we could match recursively, but there's probably very little return on investment.
| const unsigned PaddingSubVecs = NumSubVecs - 1; | ||
|
|
||
| for (unsigned K = 0; K < PaddingSubVecs; ++K) { | ||
| FlattenedOps.push_back( |
There was a problem hiding this comment.
FlattenedOps.resize(size+PaddingSubVecs?)
There was a problem hiding this comment.
I prefer to keep it explicitly to self document what we are doing here.
| // Verify all valid operands in FlattenedOps have the same type | ||
| // and ensure we have at least one valid operand | ||
| LLT SubVecTy; | ||
| bool FoundValidOp = false; |
There was a problem hiding this comment.
optional<LLT> ?
There was a problem hiding this comment.
Nice suggestion.
| AllMatch = false; | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
If you move the opcode check down, you could bundle this in a nice generic isEquivalentMI(I1, I2) helper.
martien-de-jong
left a comment
There was a problem hiding this comment.
I'm getting into real nitpicking territory. I see nothing really blocking, it's time it went in.
686e173 to
4872cd1
Compare
4872cd1 to
15ee34d
Compare
15ee34d to
4dad714
Compare

The combiner eliminates a VSHIFT chain and replaces it with a simpler G_CONCAT_VECTORS operation with zero-padded upper half. The transformation is generalized to work with any vector size by calculating expected shift amounts based on source and destination types (e.g., for 512 bit expansion: shift1=16 bytes, shift2=48 bytes). The combiner includes usage-aware optimization: it analyzes how the result vector is actually used (via a new getMaxUsedVectorElement utility) and chooses between padding with UNDEF or zeros.
This PR also includes two redundant pad/unpad removal combiners.