[AIEX] Improve prologue scheduling by using AIERegMemEventTracker by andcarminati · Pull Request #765 · Xilinx/llvm-aie

andcarminati · 2026-01-30T08:51:11Z

The goal is to reduce pessimism by precise register event tracking. Locks and events are handled in the same way as in the epilogue.

andcarminati · 2026-01-30T08:54:29Z

llvm/test/CodeGen/AIE/aie2p/elongate/zol_phdr_loopbundles_112bytes.mir

  ; CHECK-LABEL: zol_elongated_preheader_plus_NumOfLoopBundles_ge112bytes:
  ; CHECK:         .p2align 4
  ; CHECK-NEXT:  // %bb.0: // %entry
-  ; CHECK-NEXT:    vldb x6, [p0], #0


This is something relevant. The old approach detected a dependency through p0, but by using maxLatency, we extend the distance to 7 (load latency), what is quite pessimistic.

andcarminati · 2026-01-30T08:56:01Z

Next step is to refactor a bit the code.

llvm/test/CodeGen/AIE/aie2p/end-to-end/add-att-broadcasting-aa.ll

F-Stuckmann · 2026-01-30T14:40:18Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

+                                        /*InSeparateRegion=*/false);
+
+      // Find the loop successor - must exist if we have bot-fixed bundles in a
+      // prologue


What is the difference between regular interblock scheduling and the pipelining interblock scheduling?

They are different in the sense that without pipelining:

We push a safety margin on the epilogue block based on what we see on the loop.

We initialize the bot scoreboard of the prologue based on the loop state.

With a pipelined loop, we have the top and bot insert regions that are already scheduled. We carefully dump those regions to a place where the normal DAG construction will not see them (sequence of bundles), because we need full control to not rearrange/change timing properties. After this timing region placement, we carefully create DAG nodes for the bundles (Also, normal DAG construction is not prepared for bundles- but this is not the reason), tying them together to a proper in-sequence scheduling. After, we construct a timeline of events relative the EntrySU and ExitSU, so we can create dependencies for the "free instructions". Register dependencies are the easy part on this timeline construction, the problem arises when we consider also memory cycles and alias relations (most of the code). As result, for each "free" instruction, we have a single edge to Entry/Exit modeling all memory and register latencies. For example, a memory instruction in the prologue can execute a register write inside de loop, and this must be also tracked - we do this here using events with negative cycles.

Also, this is the starting point for prologue/epilogue merging as a whole, the most complicated part is done by the scheduler: release each fixed node in sequence before any other node. This involves, for example, the switching the from top to bottom-um scheduling and delta cycles in both directions.

F-Stuckmann · 2026-01-30T14:41:21Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

+      const BlockState &LoopBS =
+          Scheduler->getInterBlock().getBlockState(LoopSucc);
+      ArrayRef<AIE::MachineBundle> LoopTimedBundles = LoopBS.getTop().Bundles;
+      BackwardRET.computeUseDefBackward(LoopTimedBundles,


How do we handle the wrap around nature of the loops?

Once we execute the first iteration of the loop (all instructions will be there), this first iteration will be responsible for the first events in the timeline that may influence the prologue. Once we iterate again, events will not be interesting anymore because we already saw them previously, closer to the prologue. Happily, pipelined loops execute at least once and they contain all instructions.

F-Stuckmann · 2026-01-30T14:49:35Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

 }

+void AIERegMemEventTracker::updateLastStoreCycle(int StoreCycle) {
+  LastStoreCycle = std::max(LastStoreCycle, StoreCycle);


What is the difference between this and the scoreboard we use in the Postpipeliner? Can't we reuse it here?

A scoreboard can track resources (functional units/slots etc.) in the pipeline without the notion of latency. An instruction may reserve/require resources the the scoreboard, but there in no notion of operand latency/data dependency there. Here, we are not interested any resources, only latencies that can accommodate data dependencies and also memory relations between "free" instructions and already scheduled instructions.

mludevid · 2026-02-02T15:59:34Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

-          SDep Dep(FixedDepSU, SDep::Artificial);
-          Dep.setLatency(1);
-          FreeSU.addPred(Dep, /*Required=*/true);
+    // Handle bot-fixed instructions (prologue) with backward tracking


Looks really good. I think that we can make this function even more understandable by factoring out the intermediate steps into helper functions. Maybe it's even possible to share some code between the prologue and epilogue handling.

mludevid · 2026-02-02T16:02:24Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

-          FreeSU.addPred(Dep, /*Required=*/true);
+    // Handle bot-fixed instructions (prologue) with backward tracking
+    // Only handle prologues (preheader blocks), not epilogues here
+    if (!BotFixedBundles.empty() && BS.Kind != BlockType::Epilogue) {


I think this check is redundant. I suggest moving the block type check into an assert (as we do in the prologue case) or removing it altogether.

New checks available. We can have 1-stage pipelined loop with BotFixedBundles.

mludevid · 2026-02-02T16:04:07Z

llvm/lib/Target/AIE/AIEBaseSubtarget.cpp

Typo: iterate

mludevid · 2026-02-02T16:20:13Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

  int Cycle = 0;
  const int TotalCycles = Bundles.size();
+
+  // Track top-fixed region size (only for the first call, not separate region)


Nit: This comment explains the what and not the why. Could you add an explanation of why this tracking is only needed in the first call? Also, why not track it in both cases and move the check to the place where the total cycles is actually used, i.e. if the size is only needed in a computation in certain scenarios, then I think it makes sense to keep that check closer to the computation rather than here.

mludevid · 2026-02-02T16:28:09Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

+
+  // Count progressively from 0 as we iterate backward through bundles
+  // Cycle represents the distance from ExitSU for each bundle
+  int Cycle = 0;


Nit: AFAIK we usually call this the height. Please rename for consistency.

Definitely not. AIERegMemEventTracker is a register and memory event tracker so we register events in their respective cycles.

mludevid · 2026-02-02T16:28:45Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

+  int Cycle = 0;
+
+  for (const auto &Bundle : reverse(Bundles)) {
+    // Cycle is the distance from ExitSU for this bundle


Nit: Repeated comment.

mludevid · 2026-02-02T16:30:19Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

+    if (getFirstMemoryAccessCycle() > INT_MIN) {
+      const int CoreResumeCycle = TII->getCoreResumeCycleAfterLock();
+      const int MIResumeCycle = CoreResumeCycle - 1;
+      const int MemDep = getFirstMemoryAccessCycle() - MIResumeCycle + 2;


I rewrote this part for clarity.

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

F-Stuckmann · 2026-02-03T11:17:37Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

-    }
+    // Check stores in top-fixed (RAW for loads, WAR for stores)
+    const int StoreCycle =
+        AA ? getMaxAliasingMemCycle(MI, /*IsStore=*/true) : getLastStoreCycle();


shouldn't this be getLastMemoryAccessCycle since we are unifying store and load?

This seems correct to me, we care about stores here.

shouldn't we then only consider stores in the if condition?

You are right! I extended to cover the load case in a cleaner way!

andcarminati · 2026-02-04T12:58:04Z

QoR:

This helps to improve the outermost loop of GEMM int8.

F-Stuckmann · 2026-02-16T13:42:19Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

+      const std::map<MCRegister, int> &RegToCycle =
+          IsDef ? RegisterToCycleDef : RegisterToCycleUse;
+      int CurrMaxLatency = 0;
+      for (MCRegAliasIterator Ali(MO.getReg(), TRI, true); Ali.isValid();


we could also directly use RegUnits instead of going through all aliases.

In this way it is simple, we just keep one register in the timeline. RegUnit can also work, but we need to create events for each RegUnit - add all reguinits and check all regunits after (I see a complexity increase.)

F-Stuckmann · 2026-02-16T13:45:21Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

+        if (RegCycle != RegToCycle.end()) {
+          const int RegCycleValue = RegCycle->second;
+          int ThisOperandLatency = IsBackward
+                                       ? RegCycleValue + OperandCycle + 1


Could you add a comment why we have +1 here?
Is this the latency after which the register is free to be used again?

nit: we could also call checkMemoryDependency

F-Stuckmann · 2026-02-16T14:06:06Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

+        MO.isDef() ? SafeDistance(MO, /*IsDef*/ false) : 0;
+
+    // Only use positive distances
+    if (DistFromLastWrite > 0)


nit: std::max will filter out the case of DistFromLastWrite < 0

Good catch: I replaced by MaxLatency = std::max({MaxLatency, DistFromLastWrite, DistFromLastRead});

F-Stuckmann · 2026-02-16T14:13:59Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

+      return CurrMaxLatency;
+    };
+
+    const int DistFromLastWrite = SafeDistance(MO, /*IsDef*/ true);


nit: If we unify RegisterToCycleDef and RegisterToCycleUse we only need to execute the lambda once, since the difference between this call and the one below is just a change in the datastructure we use throught the lambda.

F-Stuckmann · 2026-02-16T15:09:36Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

can we also refactor this to call checkMemoryDependency ?
Can we harmonize the Cycles. Why dont we go to the next cycle that is free like in checkMemoryDependency ?

If i understand the code correctly, the ints in RegisterToCycleDef and MemoryCycleToStoreInstrs are complelty different, right? Could we harmonize this so that a cycle always means the same thing:
a) Cycle before the last time it is blocked (as is used here and MemoryCycleToStoreInstrs/MemoryCycleToLoadInstrs)
b) Cycle the instruction Def/Use is last used (as in updateUseDefMaxCycle(MO.getReg(), EventCycle, IsDef);)

F-Stuckmann · 2026-02-16T15:24:54Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

@@ -73,7 +219,7 @@
          int MemAccessCycle = Cycle + *OptLastMemCycle - 1;


up until this point the code is identical to the if (BundledMI->mayStore()) { case. We could harmonize these two conditions.

F-Stuckmann · 2026-02-16T15:45:03Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

        assert(OptMOCycle);
        const int OperandCycle = *OptMOCycle;
        if (InSeparateRegion) {
          const int EventCycle = Cycle + OperandCycle;


nit: we can hoist const int EventCycle = Cycle + OperandCycle; and use the variable in the conditional branches

F-Stuckmann · 2026-02-16T15:56:57Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

+        auto OptFirstMemCycle = TII->getFirstMemoryCycle(SchedClass);
+        assert(OptFirstMemCycle && "Load instruction without MemoryCycles");
+        const int FirstMemCycle = *OptFirstMemCycle;
+        int FirstLoadFromEnd = Cycle - FirstMemCycle + 1;


Why do I need a +1 here?
Shouldn't the current variable be called CycleAfterFirstLoadFromEnd?

Discussed offline.

F-Stuckmann · 2026-02-16T15:59:46Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

+      }
+
+      // Track stores for AA-based dependencies
+      if (BundledMI->mayStore()) {


nit: unify with the if (BundledMI->mayLoad()) { block

F-Stuckmann · 2026-02-16T16:01:31Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp


-unsigned
-AIERegMemEventTracker::getSafeOperandsDistance(const MachineInstr &MI) const {
+void AIERegMemEventTracker::computeUseDefBackward(


I think we could harmonize this method and computeUseDefForward, since they compute similar results (except for a backward forward update of cycles)

We have some differences that could lead to hard to maintain conditional code.

F-Stuckmann · 2026-02-16T16:02:22Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

-    // For example, if we have LastStoreCycle = 21 (absolute) and this
-    // instruction FirstMemoryCycle = 5 (relative to the start cycle of this
-    // instruction), then this instruction should start earliest at cycle 18,
-    // to be able to start the FirstMemoryCycle (absolute) at cycle 22. In


nit: could you keep the comments that explain magic numbers?

F-Stuckmann · 2026-02-16T17:28:02Z

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp

+    }
+
+    // Free load/store: check stores in bot-fixed (RAW dependency)
+    const int LoadStoreCycle = AA ? getMaxAliasingMemCycle(MI, /*IsStore=*/true)


nit we could remove the above if (MI.mayStore()) { code, if we use const bool IsStore= MI.mayStore() in these method calls

But this flag is related to the events, not to MI.

andcarminati · 2026-02-23T15:18:23Z

Hi @F-Stuckmann, thank you for the review and for catching some inconsistencies! I tried to improve the code with some refactor and harmonization, but I kept backward and forward mechanisms separated.

The goal is to reduce pessimism by precise register event tracking. Locks and events are handled in the same way as in the epilogue.

andcarminati requested review from F-Stuckmann, SagarMaheshwari99, abhinay-anubola, abnikant, katerynamuts, khallouh, konstantinschwarz, martien-de-jong, mludevid, niwinanto and stephenneuendorffer as code owners January 30, 2026 08:51

andcarminati marked this pull request as draft January 30, 2026 08:51

andcarminati commented Jan 30, 2026

View reviewed changes

llvm/test/CodeGen/AIE/aie2p/end-to-end/add-att-broadcasting-aa.ll Show resolved Hide resolved

andcarminati commented Jan 30, 2026

View reviewed changes

llvm/test/CodeGen/AIE/aie2p/end-to-end/add-att-broadcasting-aa.ll Show resolved Hide resolved

andcarminati force-pushed the andreu.prologue.improve branch from eced1ea to d3620c9 Compare January 30, 2026 13:05

F-Stuckmann reviewed Jan 30, 2026

View reviewed changes

mludevid reviewed Feb 2, 2026

View reviewed changes

F-Stuckmann reviewed Feb 3, 2026

View reviewed changes

llvm/lib/Target/AIE/AIERegMemEventTracker.cpp Show resolved Hide resolved

F-Stuckmann reviewed Feb 3, 2026

View reviewed changes

andcarminati force-pushed the andreu.prologue.improve branch 4 times, most recently from 94f81d8 to 627374c Compare February 10, 2026 11:42

andcarminati marked this pull request as ready for review February 10, 2026 11:42

F-Stuckmann reviewed Feb 16, 2026

View reviewed changes

andcarminati force-pushed the andreu.prologue.improve branch 3 times, most recently from 862fb17 to 4604156 Compare February 23, 2026 15:16

andcarminati added 3 commits March 4, 2026 02:53

[AIEX] Improve prologue scheduling by using AIERegMemEventTracker

9cf33cb

The goal is to reduce pessimism by precise register event tracking. Locks and events are handled in the same way as in the epilogue.

[AIEX] Better organization for RegionEndEdges code

9e954f0

fixup! [AIEX] Improve prologue scheduling by using AIERegMemEventTracker

74adf86

andcarminati force-pushed the andreu.prologue.improve branch from 4604156 to 74adf86 Compare March 4, 2026 10:19

		@@ -73,7 +219,7 @@
		int MemAccessCycle = Cycle + *OptLastMemCycle - 1;

Conversation

andcarminati commented Jan 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andcarminati commented Jan 30, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andcarminati commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

F-Stuckmann Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

F-Stuckmann Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

F-Stuckmann Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

andcarminati commented Feb 4, 2026 •

edited

Loading

F-Stuckmann Feb 16, 2026 •

edited

Loading

F-Stuckmann Feb 16, 2026 •

edited

Loading

F-Stuckmann Feb 16, 2026 •

edited

Loading

F-Stuckmann Feb 16, 2026 •

edited

Loading

F-Stuckmann Feb 16, 2026 •

edited

Loading

andcarminati commented Feb 23, 2026 •

edited

Loading