Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DXC common factor optimization #6595

Closed
wants to merge 2 commits into from

Conversation

lizhengxing
Copy link
Collaborator

@lizhengxing lizhengxing commented May 7, 2024

This PR pulls the upstream change, Reassociate: add global reassociation algorithm (llvm/llvm-project@b8a330c), into DXC.

For the code below:
foo = (a * b) * c
bar = (a * d) * c

As the upstream change states, it can recognize the a*c is a common factor and redundant.

However, this upstream change might overlook some obvious common factors. One case has been observed is:
%2 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)
%3 = extractvalue %dx.types.CBufRet.f32 %2, 3
%4 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)
%5 = extractvalue %dx.types.CBufRet.f32 %4, 1
%6 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)
%7 = extractvalue %dx.types.CBufRet.f32 %6, 3
%8 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)
%9 = extractvalue %dx.types.CBufRet.f32 %8, 1

%11 = fmul fast float %3, %10
%12 = fmul fast float %11, %5

%14 = fmul fast float %7, %13
%15 = fmul fast float %14, %9 ---> %3*%5 == %7*%9 --> they should be reassociated to a common factor

The upstream change can't identify this common factor because DXC don't know (%3, %7) and (%7, %9) are redundant when running Reassociate pass. The redundancy of (%3, %7) and (%7, %9) will be eliminated after GVN pass.

For DXC can identify more common factors, this PR will aggressively run Reassociate pass again after GVN pass and it will run GVN pass after it to remove the redundancy generared in the second run of Reassociate pass.

One issue of this PR is changing the order of floating point operations will cause the precision issue and very different result of some edge cases.

So this PR adds the disable-aggressive-reassociation flag (it's false by default) to the non-upstream change. It can be used to mitigate and debug the issue caused by this PR.

This PR pulls the upstream change, Reassociate: add global reassociation algorithm (llvm/llvm-project@b8a330c), into DXC.

For the code below:
  foo = (a * b) * c
  bar = (a * d) * c

As the upstream change states, it can recognize the a*c is a common factor and redundant.

However, this upstream change might overlook some obvious common factors. One case has been observed is:
  %2 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)
  %3 = extractvalue %dx.types.CBufRet.f32 %2, 3
  %4 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)
  %5 = extractvalue %dx.types.CBufRet.f32 %4, 1
  %6 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)
  %7 = extractvalue %dx.types.CBufRet.f32 %6, 3
  %8 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)
  %9 = extractvalue %dx.types.CBufRet.f32 %8, 1

  %11 = fmul fast float %3, %10
  %12 = fmul fast float %11, %5

  %14 = fmul fast float %7, %13
  %15 = fmul fast float %14, %9   ---> %3*%5 == %7*%9 --> they should be reassociated to a common factor

The upstream change can't identify this common factor because DXC don't know (%3, %7) and (%7, %9) are redundant when running Reassociate pass. The redundancy of (%3, %7) and (%7, %9) will be eliminated after GVN pass.

For DXC can identify more common factors, this PR will aggressively run Reassociate pass again after GVN pass and it will run GVN pass after it to remove the redundancy generared in the second run of Reassociate pass.

One issue of this PR is changing the order of floating point operations will cause the precision issue and very different result of some edge cases.

So this PR adds the disable-aggressive-reassociation flag (it's false by default) to the non-upstream change. It can be used to mitigate and debug the issue caused by this PR.

Fixes #6593.
@lizhengxing lizhengxing requested a review from a team as a code owner May 7, 2024 17:44
@lizhengxing lizhengxing requested a review from tex3d May 7, 2024 17:45
Copy link
Contributor

github-actions bot commented May 7, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 14c440712d7411f184bd29a70718e3b8568dcfe7 6e0baa18fcad6ede2e2d2cda775fbb51137d0e69 -- include/dxc/Support/HLSLOptions.h include/llvm/Transforms/IPO/PassManagerBuilder.h lib/DxcSupport/HLSLOptions.cpp lib/Transforms/IPO/PassManagerBuilder.cpp lib/Transforms/Scalar/Reassociate.cpp tools/clang/include/clang/Frontend/CodeGenOptions.h tools/clang/lib/CodeGen/BackendUtil.cpp tools/clang/tools/dxcompiler/dxcompilerobj.cpp
View the diff from clang-format here.
diff --git a/include/dxc/Support/HLSLOptions.h b/include/dxc/Support/HLSLOptions.h
index 75d096d3..1690986e 100644
--- a/include/dxc/Support/HLSLOptions.h
+++ b/include/dxc/Support/HLSLOptions.h
@@ -236,7 +236,8 @@ public:
   std::string TimeTrace = "";           // OPT_ftime_trace[EQ]
   unsigned TimeTraceGranularity = 500;  // OPT_ftime_trace_granularity_EQ
   bool VerifyDiagnostics = false;       // OPT_verify
-  bool EnableAggressiveReassociation = true; // OPT_disable_aggressive_reassociation
+  bool EnableAggressiveReassociation =
+      true; // OPT_disable_aggressive_reassociation
 
   // Optimization pass enables, disables and selects
   OptimizationToggles
diff --git a/include/llvm/Transforms/IPO/PassManagerBuilder.h b/include/llvm/Transforms/IPO/PassManagerBuilder.h
index 0b3837a5..8e813702 100644
--- a/include/llvm/Transforms/IPO/PassManagerBuilder.h
+++ b/include/llvm/Transforms/IPO/PassManagerBuilder.h
@@ -139,7 +139,7 @@ public:
   bool HLSLEnableDebugNops = false; // HLSL Change
   bool HLSLEarlyInlining = true; // HLSL Change
   bool HLSLNoSink = false; // HLSL Change
-  bool HLSLEnableAggressiveReassociation = true; // HLSL Change
+  bool HLSLEnableAggressiveReassociation = true;    // HLSL Change
   void addHLSLPasses(legacy::PassManagerBase &MPM); // HLSL Change
 
 private:
diff --git a/lib/Transforms/IPO/PassManagerBuilder.cpp b/lib/Transforms/IPO/PassManagerBuilder.cpp
index fa3178eb..e64d0c49 100644
--- a/lib/Transforms/IPO/PassManagerBuilder.cpp
+++ b/lib/Transforms/IPO/PassManagerBuilder.cpp
@@ -504,8 +504,8 @@ void PassManagerBuilder::populateModulePassManager(
 
   // HLSL Change Begins.
   {
-    // Run reassociate pass again after GVN since GVN will expose more opportunities
-    // for reassociation.
+    // Run reassociate pass again after GVN since GVN will expose more
+    // opportunities for reassociation.
     if (HLSLEnableAggressiveReassociation) {
       MPM.add(createReassociatePass()); // Reassociate expressions
       if (EnableGVN) {
diff --git a/lib/Transforms/Scalar/Reassociate.cpp b/lib/Transforms/Scalar/Reassociate.cpp
index cb67beef..b5b0f7fa 100644
--- a/lib/Transforms/Scalar/Reassociate.cpp
+++ b/lib/Transforms/Scalar/Reassociate.cpp
@@ -20,7 +20,6 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "llvm/Transforms/Scalar.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/PostOrderIterator.h"
 #include "llvm/ADT/STLExtras.h"
@@ -38,6 +37,7 @@
 #include "llvm/Pass.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
+#include "llvm/Transforms/Scalar.h"
 #include "llvm/Transforms/Utils/Local.h"
 #include <algorithm>
 using namespace llvm;
@@ -2286,8 +2286,7 @@ void Reassociate::ReassociateExpression(BinaryOperator *I) {
   RewriteExprTree(I, Ops);
 }
 
-void Reassociate::BuildPairMap(
-    ReversePostOrderTraversal<Function *> &RPOT) {
+void Reassociate::BuildPairMap(ReversePostOrderTraversal<Function *> &RPOT) {
   // Make a "pairmap" of how often each operand pair occurs.
   for (BasicBlock *BI : RPOT) {
     for (Instruction &I : *BI) {
@@ -2357,11 +2356,11 @@ bool Reassociate::runOnFunction(Function &F) {
   // pass pipeline for further potential gains.
   // It might also be possible to update the pair map during runtime, but the
   // overhead of that may be large if there's many reassociable chains.
-// TODO: RPOT
-// Get the functions basic blocks in Reverse Post Order. This order is used by
-// BuildRankMap to pre calculate ranks correctly. It also excludes dead basic
-// blocks (it has been seen that the analysis in this pass could hang when
-// analysing dead basic blocks).
+  // TODO: RPOT
+  // Get the functions basic blocks in Reverse Post Order. This order is used by
+  // BuildRankMap to pre calculate ranks correctly. It also excludes dead basic
+  // blocks (it has been seen that the analysis in this pass could hang when
+  // analysing dead basic blocks).
   ReversePostOrderTraversal<Function *> RPOT(&F);
   BuildPairMap(RPOT);
 
  • Check this box to apply formatting changes to this branch.

Copy link
Contributor

@dmpots dmpots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to split this into two changes since we are really doing two separate things here.

  1. Port the upstream fix back to the reassociation pass.
  2. Add the extra reassocation+gvn run with the new -aggressive-reassociate flag.

Both of these would be useful bisect points if we need to chase down a regression.

@lizhengxing
Copy link
Collaborator Author

I think it would be good to split this into two changes since we are really doing two separate things here.

  1. Port the upstream fix back to the reassociation pass.
  2. Add the extra reassocation+gvn run with the new -aggressive-reassociate flag.

Both of these would be useful bisect points if we need to chase down a regression.

Yeah, I thought about it after I found it's hard to put the upstream change under the new -aggressive-reassociate flag. Let me refactor this PR after I got more reviews.

@damyanp
Copy link
Member

damyanp commented May 7, 2024

Yeah, I thought about it after I found it's hard to put the upstream change under the new -aggressive-reassociate flag. Let me refactor this PR after I got more reviews.

Probably best to get the refactored PR out so people can review the thing that you're planning to commit.

@lizhengxing
Copy link
Collaborator Author

lizhengxing commented May 8, 2024

I will split this PR into 3 separate PRs based on the reviews. This PR is no longer needed. So close and delete it.

@lizhengxing lizhengxing closed this May 8, 2024
@lizhengxing lizhengxing deleted the zhengxingli/common-factor-optimization branch May 16, 2024 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants