[AIEX][AIE2P] Several new combiners to simplify vector operations by andcarminati · Pull Request #772 · Xilinx/llvm-aie

andcarminati · 2026-02-02T16:47:25Z

The combiner eliminates a VSHIFT chain and replaces it with a simpler G_CONCAT_VECTORS operation with zero-padded upper half. The transformation is generalized to work with any vector size by calculating expected shift amounts based on source and destination types (e.g., for 512 bit expansion: shift1=16 bytes, shift2=48 bytes). The combiner includes usage-aware optimization: it analyzes how the result vector is actually used (via a new getMaxUsedVectorElement utility) and chooses between padding with UNDEF or zeros.

This PR also includes two redundant pad/unpad removal combiners.

andcarminati · 2026-02-04T14:38:00Z

QoR:

F-Stuckmann · 2026-02-05T14:33:54Z

+        const uint64_t Mask = MaskVal->getZExtValue();
+        if (Mask != 0) {
+          const unsigned HighestBit = 63 - llvm::countl_zero(Mask);
+          MaxElement = std::max(MaxElement, HighestBit);


do we expect multiple vsels in the register users?

We look to all the users, it is too restrictive to assume just one user.

F-Stuckmann · 2026-02-05T16:22:30Z

+      const Register IdxReg = User.getOperand(2).getReg();
+      const auto Idx = getIConstantVRegVal(IdxReg, MRI);
+      if (Idx) {
+        MaxElement =


why don't we return here? why should we have multiple G_EXTRACT_VECTOR_ELT s coupled with G_UNMERGE_VALUES and return the maximum? aren't we then confusing combine patterns?

You can see the name of the function, this is just a helper function that we use to try prove that the upper part of a vector is not used, so we cannot just return max here because we may have other users accessing even higher elements. But, maybe you are talking about an early return in case if !Idx - this I agree, it will be part of the next version.

F-Stuckmann · 2026-02-12T14:00:22Z

+///   %zero_broadcast = G_AIE_BROADCAST_VECTOR %zero_scalar
+///   %zero_lo, %zero_hi = G_UNMERGE_VALUES %zero_broadcast
+///   %result = G_CONCAT_VECTORS %src, %zero_hi
+bool llvm::matchVShiftChainToZeroPad(MachineInstr &MI, MachineRegisterInfo &MRI,


wouldn't it be more generic to first combine two vshifts into a single one and in the second stop combine shift to a unmerge?

F-Stuckmann · 2026-02-12T14:04:47Z

+  // First shift: adds padding (quarter of source size)
+  const unsigned ExpectedShift1 = SrcSizeInBytes / 4;
+  // Second shift: brings in zeros (1.5x source size)
+  const unsigned ExpectedShift2 = (SrcSizeInBytes + SrcSizeInBytes / 2) / 2;


nit: I think it is sufficient to check the total shift size, not some predetermined shift amounts.

I am taking a look.

F-Stuckmann · 2026-02-12T15:34:41Z

 ; SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
 ;
-; (c) Copyright 2025 Advanced Micro Devices, Inc. or its affiliates
+; (c) Copyright 2025-2025 Advanced Micro Devices, Inc. or its affiliates


nit: this line can remain unchanged for now

F-Stuckmann · 2026-02-17T15:34:53Z

+# RUN:         -verify-machineinstrs -o - | FileCheck %s
+
+# Tests for VSHIFT chain combiner that optimizes VSHIFT sequences into
+# CONCAT operations with usage-aware padding (UNDEF vs zeros).


nit: also add a unit-test with different shift constants

martien-de-jong · 2026-03-10T09:44:52Z

@@ -419,6 +419,96 @@ bool isUseOf(const MachineInstr &MI, const MachineInstr &Def) {
  return false;
 }

+bool llvm::matchUnpadUnmerge(MachineInstr &UnpadMI, MachineRegisterInfo &MRI,


Can I have a short description of what we're trying to match?

martien-de-jong · 2026-03-10T09:55:18Z

+  }
+
+  // 4. Check dominance (one must dominate the other)
+  const bool UnpadDominates = Helper.dominates(UnpadMI, *UnmergeMI);


I suggest to setup a dedicated transform lambda and return if it dominates. Otherwise, check the reverse dominance and setup the reverse transform lambda. We don't need to check things we don't need, and we don't need to capture unnecessary data in the lambda.

I will keep as is this part.

martien-de-jong · 2026-03-10T11:23:46Z

+  //                  3 elements, then 8 % 3 = 2 (NOT aligned), and we can't
+  //                  cleanly extract 8 elements using 3-element chunks.
+  if (UnpadElements % ConcatOpElements != 0)
+    return false;


Do we have automatic guarantees about matching element types?

I guess yes, but I prefer to be defensive here.

martien-de-jong · 2026-03-24T09:39:41Z

+
+    if (OpMI->getOpcode() == TargetOpcode::G_CONCAT_VECTORS) {
+      // Nested CONCAT - flatten by extracting all its operands
+      for (unsigned J = 1; J < OpMI->getNumOperands(); ++J) {


nit: I think we could match recursively, but there's probably very little return on investment.

martien-de-jong · 2026-03-24T09:43:07Z

+      const unsigned PaddingSubVecs = NumSubVecs - 1;
+
+      for (unsigned K = 0; K < PaddingSubVecs; ++K) {
+        FlattenedOps.push_back(


FlattenedOps.resize(size+PaddingSubVecs?)

I prefer to keep it explicitly to self document what we are doing here.

martien-de-jong · 2026-03-24T09:43:55Z

+  // Verify all valid operands in FlattenedOps have the same type
+  // and ensure we have at least one valid operand
+  LLT SubVecTy;
+  bool FoundValidOp = false;


optional<LLT> ?

Nice suggestion.

martien-de-jong · 2026-03-24T10:01:10Z

+        AllMatch = false;
+        break;
+      }
+    }


If you move the opcode check down, you could bundle this in a nice generic isEquivalentMI(I1, I2) helper.

martien-de-jong

I'm getting into real nitpicking territory. I see nothing really blocking, it's time it went in.

andcarminati requested review from F-Stuckmann, SagarMaheshwari99, abhinay-anubola, abnikant, katerynamuts, khallouh, konstantinschwarz, martien-de-jong, mludevid, niwinanto and stephenneuendorffer as code owners February 2, 2026 16:47

andcarminati force-pushed the andreu.shift.combiner branch 4 times, most recently from 3b74690 to 2fbd949 Compare February 4, 2026 13:23