Conversation
cc8fca8 to
33264be
Compare
| @@ -169,7 +169,7 @@ namespace llvm { | |||
| /// Tracks the last instruction(s) in this region defining each virtual | |||
| /// register. There may be multiple current definitions for a register with | |||
| /// disjunct lanemasks. | |||
| VReg2SUnitMultiMap CurrentVRegDefs; | |||
| VReg2SUnitOperIdxMultiMap CurrentVRegDefs; | |||
There was a problem hiding this comment.
This was asymmetric between Uses and Defs. We need the operand index of the outstanding defs to compute operand latencies.
|
|
||
| // Use TRI's regsOverlap which handles both physical and virtual registers, | ||
| // including subregisters and lane masks | ||
| return TRI->regsOverlap(SrcReg, DstReg); |
There was a problem hiding this comment.
I guess this was only needed transiently, but it looks really good.
There was a problem hiding this comment.
nice they work on RegUnits
| PostSWP->isPostPipelineCandidate(*TheBlock)) | ||
| staticallyMaterializeMultiSlotInstructions(*TheBlock, HR); | ||
| PostSWP->isPostPipelineCandidate(*TheBlock)) { | ||
| staticallyMaterializeMultiSlotInstructions(*TheBlock, HR, MaterializeAll); |
There was a problem hiding this comment.
Would have been nice to be able to skip the scheduler before postpipelining. Sadly, the scheduler sometimes makes better decisions.
| for (int T = 0; T < II; ++T) { | ||
| LaneBitmask Mask = LanesByOffset[T]; | ||
| if (Mask.any()) { | ||
| // Show a simple indicator - could be enhanced to show actual lanes |
There was a problem hiding this comment.
Indeed. Full lanemasks are bulky though.
| static cl::opt<bool> TestRegDefUseTracker( | ||
| "aie-test-regdefuse-tracker", cl::Hidden, cl::init(false), | ||
| cl::desc("[AIE] TEST MODE: Run RegDefUseTracker analysis on all loops " | ||
| "(for testing only)")); |
There was a problem hiding this comment.
This is accommodating a dump for the early stages of live range analysis.
|
|
||
| void BlockState::restorePipelining() { | ||
| // Restore to the original allocation of the virtual registers | ||
| RegTracker->restoreOriginalPhysRegs(); |
There was a problem hiding this comment.
These registers were used by the scheduler whose result we're going to use as a fallback.
7930abc to
dcc908c
Compare
| BS.FixPoint.PipelinerMode = firstPipelinerMode(); | ||
| if (BS.FixPoint.PipelinerMode != PostPipelinerMode::None) { | ||
| return SchedulingStage::Pipelining; | ||
| } |
There was a problem hiding this comment.
This looks a bit weird: we have been pipelining and are trying to restore to the first allowed pipelinermode for the next II. This should be invariant, so I don't think we can get None here. Perhaps assert.
|
|
||
| // For virtual mode, re-analyze and virtualize | ||
| if (FixPoint.PipelinerMode == PostPipelinerMode::Virtual) { | ||
| // RegTracker might not exist if we have multiple regions |
There was a problem hiding this comment.
Someone missed that we can't do physical mode either if we have more than one region.
I would hope that RegTracker is always there for a SWP candidate.
dcc908c to
133f034
Compare
| // - Merges aliasing register accesses into unified live ranges | ||
| // - Filters out unsafe ranges (tied operands, live-in/out, implicit uses) | ||
| // - Computes appropriate register classes for each live range | ||
| // - Optionally replaces physical registers with virtual registers for testing |
There was a problem hiding this comment.
Check: are we going to use them to pipeline, apart of testing?
There was a problem hiding this comment.
Yes, cline comment quirk.
|
|
||
| using namespace llvm; | ||
|
|
||
| RegLiveRangeTracker::RegLiveRangeTracker(MachineBasicBlock &MBB) |
There was a problem hiding this comment.
Can we pass directly MF here? It is a bit confusing that we pass MBB but we don't use it "as is".
|
|
||
| void RegLiveRangeTracker::computeAliasClosure(MCRegister Reg, | ||
| DenseSet<MCRegister> &Out) const { | ||
| Out.insert(Reg); |
There was a problem hiding this comment.
Could we remove this by passing /*IncludeSelf*/=true in the first for?
There was a problem hiding this comment.
Even better, we remove it in my current branch since it is suboptimal.
| @@ -82,6 +83,13 @@ class InterBlockEdges { | |||
| // handling. | |||
| enum class BlockType { Regular, Loop, Epilogue }; | |||
|
|
|||
| // PostPipelinerMode determines whether the postpipeliner operates on physical | |||
| // registers or virtualizes them for better scheduling opportunities. | |||
| enum class PostPipelinerMode { None, Physical, Virtual }; | |||
There was a problem hiding this comment.
None is just a default value. It can also be used as a sentinel to indicate end() when iterating through pipeliner modes.
699934a to
383d92b
Compare
383d92b to
cf5d8e3
Compare
| // Get the register class constraint for this operand | ||
| const TargetRegisterClass *OpRC = | ||
| MI->getRegClassConstraint(OpIdx, TII, TRI); |
There was a problem hiding this comment.
To resolve the bypass issue, we'd need to query if the MI uses itinerary-reg-pairs and get the reg class that was used to select the current itinerary. This would require a reverse look-up of those itinerary-reg-pairs, but that shouldn't be difficult to add to our tablegen backend. Correct?
cf5d8e3 to
2f5bea7
Compare
This is abstracting the live ranges to be used by PostRegAlloc
This module analyses live ranges of physical registers that can be safely reallocated in a basic block. It supplies facilities to rewrite to virtual registers and to restore the original allocation.
This module produces an EventSchedule from the instructions and their issue cycle. The event schedule contains the read and write events of the virtual registers occuring in the instructions ordered in the processor pipeline stage timeline. From the EventSchedule the modulo liveranges for a particular II can be constructed. These represent the lanes of each register that are live at a particular point.
This is a strategy that prioritizes scheduling of scarce ranges. Scarce ranges are live ranges that compete for one svailable register. The live ranges are virtualized, which means we have no serializing WAR deps. However, we need to be careful not to have more than one live, which means we want to finish the range before starting a new one. We try all legal permutations of these live ranges. For the current live range, we first prioritize all its ancestors, then the instructions in the range itself. Once we are finished with the range, we simulate the WAR dependences that are necessary to keep the next ranges non-overlapping
More aggressive check for liveout Slight logging changes. (perhaps to be split in separatecommit)
2f5bea7 to
b05c2bd
Compare
This is a POC of register allocation during postpipelining.
We add
Status:
It's aggressive enough to reach II=7 on gemm-bfp16-opt0, but sadly, the code it produces is not correct. I'm trying to find out what is causing my diff failure.