-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[GVN] Support rnflow pattern matching and transform #162259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions cpp,h -- llvm/include/llvm/Transforms/Scalar/GVN.h llvm/lib/Transforms/Scalar/GVN.cpp --diff_from_common_commit
View the diff from clang-format here.diff --git a/llvm/include/llvm/Transforms/Scalar/GVN.h b/llvm/include/llvm/Transforms/Scalar/GVN.h
index c35f9a031..9c7585f82 100644
--- a/llvm/include/llvm/Transforms/Scalar/GVN.h
+++ b/llvm/include/llvm/Transforms/Scalar/GVN.h
@@ -19,10 +19,10 @@
#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallVector.h"
+#include "llvm/Analysis/LoopInfo.h"
#include "llvm/IR/Dominators.h"
#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/PassManager.h"
-#include "llvm/Analysis/LoopInfo.h"
#include "llvm/IR/ValueHandle.h"
#include "llvm/Support/Allocator.h"
#include "llvm/Support/Compiler.h"
|
|
Hi @fhahn and @nikic, The patch is functionally correct as the rnflow application passes. However, compared to trunk this patch regresses the app by ~10-12% as the new select instruction the pass is generating is more constly than hoisting the load. I played around |
In the test you've added the original IR seems to already have a select instruction - do you also see a select in rnflow prior to GVN? |
|
Sorry, please ignore me! I see there is an extra select being generated. Got it. :) |
|
Ping to reviewers! |
And this is also true with GCC (without vectorization), as IIRC it was doing this transform? I feel like I saw a comment somewhere that doing the GVN transform would improve performance slightly, but I have trouble finding it, maybe I imagined it... |
Ah, I think it was this one, on the test case PR: #141556 (comment) I guess that one was an estimated number and not a measurement. |
Yeah, that was an estimate, but unfortunately, in reality, it does not match (at least) on the Grace platform. |
|
@llvm/pr-subscribers-llvm-transforms Author: Madhur Amilkanthwar (madhur13490) ChangesThis is an alternative patch to #144987, adding support in GVN. Full diff: https://github.com/llvm/llvm-project/pull/162259.diff 3 Files Affected:
diff --git a/llvm/include/llvm/Transforms/Scalar/GVN.h b/llvm/include/llvm/Transforms/Scalar/GVN.h
index bc0f108ac8260..b886140348e79 100644
--- a/llvm/include/llvm/Transforms/Scalar/GVN.h
+++ b/llvm/include/llvm/Transforms/Scalar/GVN.h
@@ -22,6 +22,7 @@
#include "llvm/IR/Dominators.h"
#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/PassManager.h"
+#include "llvm/Analysis/LoopInfo.h"
#include "llvm/IR/ValueHandle.h"
#include "llvm/Support/Allocator.h"
#include "llvm/Support/Compiler.h"
@@ -45,6 +46,7 @@ class FunctionPass;
class GetElementPtrInst;
class ImplicitControlFlowTracking;
class LoadInst;
+class SelectInst;
class LoopInfo;
class MemDepResult;
class MemoryAccess;
@@ -405,6 +407,8 @@ class GVNPass : public PassInfoMixin<GVNPass> {
void addDeadBlock(BasicBlock *BB);
void assignValNumForDeadCode();
void assignBlockRPONumber(Function &F);
+
+ bool optimizeMinMaxFindingSelectPattern(SelectInst *Select);
};
/// Create a legacy GVN pass.
diff --git a/llvm/lib/Transforms/Scalar/GVN.cpp b/llvm/lib/Transforms/Scalar/GVN.cpp
index 72e1131a54a86..f817d045862cd 100644
--- a/llvm/lib/Transforms/Scalar/GVN.cpp
+++ b/llvm/lib/Transforms/Scalar/GVN.cpp
@@ -2743,6 +2743,10 @@ bool GVNPass::processInstruction(Instruction *I) {
}
return Changed;
}
+ if (SelectInst *Select = dyn_cast<SelectInst>(I)) {
+ if (optimizeMinMaxFindingSelectPattern(Select))
+ return true;
+ }
// Instructions with void type don't return a value, so there's
// no point in trying to find redundancies in them.
@@ -3330,6 +3334,123 @@ void GVNPass::assignValNumForDeadCode() {
}
}
+bool GVNPass::optimizeMinMaxFindingSelectPattern(SelectInst *Select) {
+ LLVM_DEBUG(
+ dbgs()
+ << "GVN: Analyzing select instruction for minimum finding pattern\n");
+ LLVM_DEBUG(dbgs() << "GVN: Select: " << *Select << "\n");
+ Value *Condition = Select->getCondition();
+ CmpInst *Comparison = dyn_cast<CmpInst>(Condition);
+ if (!Comparison) {
+ LLVM_DEBUG(dbgs() << "GVN: Condition is not a comparison\n");
+ return false;
+ }
+
+ // Check if this is ULT comparison.
+ CmpInst::Predicate Pred = Comparison->getPredicate();
+ if (Pred != CmpInst::ICMP_SLT && Pred != CmpInst::ICMP_ULT &&
+ Pred != CmpInst::FCMP_OLT && Pred != CmpInst::FCMP_ULT) {
+ LLVM_DEBUG(dbgs() << "GVN: Not a less-than comparison, predicate: " << Pred
+ << "\n");
+ return false;
+ }
+
+ // Check that both operands are loads.
+ Value *LHS = Comparison->getOperand(0);
+ Value *RHS = Comparison->getOperand(1);
+ if (!isa<LoadInst>(LHS) || !isa<LoadInst>(RHS)) {
+ LLVM_DEBUG(dbgs() << "GVN: Not both operands are loads\n");
+ return false;
+ }
+
+ LLVM_DEBUG(dbgs() << "GVN: Found minimum finding pattern in Block: "
+ << Select->getParent()->getName() << "\n");
+
+ // Transform the pattern.
+ // Hoist the chain of operations for the second load to preheader.
+ // Get predecessor of the block containing the select instruction.
+ BasicBlock *BB = Select->getParent();
+
+ // Get preheader of the loop.
+ Loop *L = LI->getLoopFor(BB);
+ if (!L) {
+ LLVM_DEBUG(dbgs() << "GVN: Could not find loop\n");
+ return false;
+ }
+ BasicBlock *Preheader = L->getLoopPreheader();
+ if (!Preheader) {
+ LLVM_DEBUG(dbgs() << "GVN: Could not find loop preheader\n");
+ return false;
+ }
+
+ // Hoist the chain of operations for the second load to preheader.
+ // %90 = sext i32 %.05.i to i64
+ // %91 = getelementptr float, ptr %0, i64 %90 ; %0 + (sext i32 %85 to i64)*4
+ // %92 = getelementptr i8, ptr %91, i64 -4 ; %0 + (sext i32 %85 to i64)*4 - 4
+ // %93 = load float, ptr %92, align 4
+
+ Value *BasePtr = nullptr, *IndexVal = nullptr, *OffsetVal = nullptr;
+ IRBuilder<> Builder(Preheader->getTerminator());
+ if (match(RHS,
+ m_Load(m_GEP(m_GEP(m_Value(BasePtr), m_SExt(m_Value(IndexVal))),
+ m_Value(OffsetVal))))) {
+ LLVM_DEBUG(dbgs() << "GVN: Found pattern: " << *RHS << "\n");
+ LLVM_DEBUG(dbgs() << "GVN: Found pattern: " << "\n");
+
+ PHINode *Phi = dyn_cast<PHINode>(IndexVal);
+ if (!Phi) {
+ LLVM_DEBUG(dbgs() << "GVN: IndexVal is not a PHI node\n");
+ return false;
+ }
+ Value *InitialMinIndex = Phi->getIncomingValueForBlock(Preheader);
+
+ // Insert PHI node at the top of this block.
+ PHINode *KnownMinPhi =
+ PHINode::Create(Builder.getFloatTy(), 2, "known_min", BB->begin());
+
+ // Build the GEP chain in the preheader.
+ // 1. hoist_0 = sext i32 to i64
+ Value *HoistedSExt =
+ Builder.CreateSExt(InitialMinIndex, Builder.getInt64Ty(), "hoist_sext");
+
+ // 2. hoist_gep1 = getelementptr float, ptr BasePtr, i64 HoistedSExt
+ Value *HoistedGEP1 = Builder.CreateGEP(Builder.getFloatTy(), BasePtr,
+ HoistedSExt, "hoist_gep1");
+
+ // 3. hoist_gep2 = getelementptr i8, ptr HoistedGEP1, i64 OffsetVal
+ Value *HoistedGEP2 = Builder.CreateGEP(Builder.getInt8Ty(), HoistedGEP1,
+ OffsetVal, "hoist_gep2");
+
+ // 4. hoisted_load = load float, ptr HoistedGEP2
+ LoadInst *NewLoad =
+ Builder.CreateLoad(Builder.getFloatTy(), HoistedGEP2, "hoisted_load");
+
+ // Replace all uses of load with new load.
+ RHS->replaceAllUsesWith(NewLoad);
+ dyn_cast<LoadInst>(RHS)->eraseFromParent();
+
+ // Replace second operand of comparison with KnownMinPhi.
+ Comparison->setOperand(1, KnownMinPhi);
+
+ // Create new select instruction for selecting the minimum value.
+ IRBuilder<> SelectBuilder(BB->getTerminator());
+ SelectInst *CurrentMinSelect =
+ dyn_cast<SelectInst>(SelectBuilder.CreateSelect(
+ Comparison, LHS, KnownMinPhi, "current_min"));
+
+ // Populate PHI node.
+ KnownMinPhi->addIncoming(NewLoad, Preheader);
+ KnownMinPhi->addIncoming(CurrentMinSelect, BB);
+ LLVM_DEBUG(dbgs() << "Transformed the code\n");
+ return true;
+ } else {
+ LLVM_DEBUG(dbgs() << "GVN: Could not find pattern: " << *RHS << "\n");
+ return false;
+ }
+ return false;
+}
+
+
class llvm::gvn::GVNLegacyPass : public FunctionPass {
public:
static char ID; // Pass identification, replacement for typeid.
diff --git a/llvm/test/Transforms/GVN/PRE/rnflow-gvn-pre.ll b/llvm/test/Transforms/GVN/PRE/rnflow-gvn-pre.ll
new file mode 100644
index 0000000000000..6f17d4ab30240
--- /dev/null
+++ b/llvm/test/Transforms/GVN/PRE/rnflow-gvn-pre.ll
@@ -0,0 +1,59 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; Minimal test case containing only the .lr.ph.i basic block
+; RUN: opt -passes=gvn -S < %s | FileCheck %s
+
+define void @test_lr_ph_i(ptr %0) {
+; CHECK-LABEL: define void @test_lr_ph_i(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*]]:
+; CHECK-NEXT: [[HOIST_GEP1:%.*]] = getelementptr float, ptr [[TMP0]], i64 1
+; CHECK-NEXT: [[HOIST_GEP2:%.*]] = getelementptr i8, ptr [[HOIST_GEP1]], i64 -4
+; CHECK-NEXT: [[HOISTED_LOAD:%.*]] = load float, ptr [[HOIST_GEP2]], align 4
+; CHECK-NEXT: br label %[[DOTLR_PH_I:.*]]
+; CHECK: [[_LR_PH_I:.*:]]
+; CHECK-NEXT: [[KNOWN_MIN:%.*]] = phi float [ [[HOISTED_LOAD]], %[[ENTRY]] ], [ [[CURRENT_MIN:%.*]], %[[DOTLR_PH_I]] ]
+; CHECK-NEXT: [[INDVARS_IV_I:%.*]] = phi i64 [ 1, %[[ENTRY]] ], [ [[INDVARS_IV_NEXT_I:%.*]], %[[DOTLR_PH_I]] ]
+; CHECK-NEXT: [[TMP1:%.*]] = phi i64 [ 0, %[[ENTRY]] ], [ [[TMP10:%.*]], %[[DOTLR_PH_I]] ]
+; CHECK-NEXT: [[DOT05_I:%.*]] = phi i32 [ 1, %[[ENTRY]] ], [ [[DOT1_I:%.*]], %[[DOTLR_PH_I]] ]
+; CHECK-NEXT: [[INDVARS_IV_NEXT_I]] = add nsw i64 [[INDVARS_IV_I]], -1
+; CHECK-NEXT: [[TMP2:%.*]] = getelementptr float, ptr [[TMP0]], i64 [[INDVARS_IV_I]]
+; CHECK-NEXT: [[TMP3:%.*]] = getelementptr i8, ptr [[TMP2]], i64 -8
+; CHECK-NEXT: [[TMP4:%.*]] = load float, ptr [[TMP3]], align 4
+; CHECK-NEXT: [[TMP5:%.*]] = sext i32 [[DOT05_I]] to i64
+; CHECK-NEXT: [[TMP6:%.*]] = getelementptr float, ptr [[TMP0]], i64 [[TMP5]]
+; CHECK-NEXT: [[TMP7:%.*]] = getelementptr i8, ptr [[TMP6]], i64 -4
+; CHECK-NEXT: [[TMP8:%.*]] = fcmp contract olt float [[TMP4]], [[KNOWN_MIN]]
+; CHECK-NEXT: [[TMP9:%.*]] = trunc nsw i64 [[INDVARS_IV_NEXT_I]] to i32
+; CHECK-NEXT: [[DOT1_I]] = select i1 [[TMP8]], i32 [[TMP9]], i32 [[DOT05_I]]
+; CHECK-NEXT: [[TMP10]] = add nsw i64 [[TMP1]], -1
+; CHECK-NEXT: [[TMP11:%.*]] = icmp samesign ugt i64 [[TMP1]], 1
+; CHECK-NEXT: [[CURRENT_MIN]] = select i1 [[TMP8]], float [[TMP4]], float [[KNOWN_MIN]]
+; CHECK-NEXT: br i1 [[TMP11]], label %[[DOTLR_PH_I]], label %[[EXIT:.*]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret void
+;
+entry:
+ br label %.lr.ph.i
+
+.lr.ph.i: ; preds = %.lr.ph.i, %entry
+ %indvars.iv.i = phi i64 [ 1, %entry ], [ %indvars.iv.next.i, %.lr.ph.i ]
+ %86 = phi i64 [ 0, %entry ], [ %96, %.lr.ph.i ]
+ %.05.i = phi i32 [ 1, %entry ], [ %.1.i, %.lr.ph.i ]
+ %indvars.iv.next.i = add nsw i64 %indvars.iv.i, -1
+ %87 = getelementptr float, ptr %0, i64 %indvars.iv.i
+ %88 = getelementptr i8, ptr %87, i64 -8 ; first load : %0 + 4 * 1 - 8
+ %89 = load float, ptr %88, align 4
+ %90 = sext i32 %.05.i to i64
+ %91 = getelementptr float, ptr %0, i64 %90 ; %0 + 4 * 1
+ %92 = getelementptr i8, ptr %91, i64 -4 ; second load : %0 + 4 * 1 - 4
+ %93 = load float, ptr %92, align 4
+ %94 = fcmp contract olt float %89, %93
+ %95 = trunc nsw i64 %indvars.iv.next.i to i32
+ %.1.i = select i1 %94, i32 %95, i32 %.05.i
+ %96 = add nsw i64 %86, -1
+ %97 = icmp samesign ugt i64 %86, 1
+ br i1 %97, label %.lr.ph.i, label %exit
+
+exit:
+ ret void
+}
|
|
Hi @nikic and @david-arm Either this due to recent patches in LLVM or refreshed setup at my end. It is complicated to pinpoint to the patch which might have helped this patch gain the perf. Nevertheless, as it stands today, I see 15% end to end perf gain in Polyhedron's rnflow app on AArch64 Grace platform. My colleague, @ssijaric-nv also checked it on X86, AMD EPYC Milan1 and he sees slightly less gain but indeed there is roughtly 6% gain. I have cleaned up this patch and is ready for review. Can you please have a look? |
|
FYI nikic is currently out of the office and will come back at the start of December. |
|
Hi @nikic, If you're back, can you please have a look? |
| #include "llvm/IR/Dominators.h" | ||
| #include "llvm/IR/InstrTypes.h" | ||
| #include "llvm/IR/PassManager.h" | ||
| #include "llvm/Analysis/LoopInfo.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can forward declare Loop instead?
| Value *BasePtr = nullptr, *IndexVal = nullptr, *OffsetVal = nullptr; | ||
| LLVM_DEBUG( | ||
| dbgs() | ||
| << "GVN: Analyzing select instruction for minimum finding pattern.\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd omit this kind of debug print that is going to be emitted for every select instruction.
| // Check if this is less-than comparison. | ||
| CmpInst::Predicate Pred = Comparison->getPredicate(); | ||
| if (Pred != CmpInst::ICMP_SLT && Pred != CmpInst::ICMP_ULT && | ||
| Pred != CmpInst::FCMP_OLT && Pred != CmpInst::FCMP_ULT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that you will fail to handle the pattern where x > y ? b : a is used instead of x < y ? a : b. It's better to use swapped predicate in that case.
| // %min.idx.ext = sext i32 %min.idx to i64 | ||
| // %ptr.float.min = getelementptr float, ptr %0, i64 %min.idx.ext | ||
| // %ptr.second.load = getelementptr i8, ptr %ptr.float.min, i64 -4 | ||
| // %val.current.min = load float, ptr %ptr.second.load, align 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if there was a comment with the full before + after pattern before the implementation code.
| // Hoist the load and build the necessary operations. | ||
| // 1. hoist_0 = sext i32 to i64 | ||
| Value *HoistedSExt = | ||
| Builder.CreateSExt(InitialMinIndex, Builder.getInt64Ty(), "hoist_sext"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sext i64 seems to make assumptions about the types that are involved without ever checking them. I expect assertion failures if you use different types. Ideally this should be type independent.
|
|
||
| // 2. hoist_gep1 = getelementptr float, ptr BasePtr, i64 HoistedSExt | ||
| Value *HoistedGEP1 = | ||
| Builder.CreateGEP(LoadType, BasePtr, HoistedSExt, "hoist_gep1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to make an assumption that the load type and the GEP type are the same, which is not necessarily true.
| OffsetVal, "hoist_gep2"); | ||
|
|
||
| // 4. hoisted_load = load float, ptr HoistedGEP2 | ||
| LoadInst *NewLoad = Builder.CreateLoad(LoadType, HoistedGEP2, "hoisted_load"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to hoist loads, shouldn't we be querying MDA about clobbers somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we probably also need to update MSSA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a lot more test coverage for the various pre-conditions of the transform.
This is an alternative patch to #144987, adding support in GVN.