More aggressive reassociations #6626

lizhengxing · 2024-05-15T18:21:50Z

Although DXC applied the upstream change, Reassociate: add global reassociation algorithm (llvm/llvm-project@b8a330c) in this PR (#6598), it still might overlook some obvious common factors.

One case has been observed is:

  %Float4_0   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)
  %Float4_0.w = extractvalue %dx.types.CBufRet.f32 %Float4_0, 3
  %Float2_0   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)
  %Float2_0.y = extractvalue %dx.types.CBufRet.f32 %Float2_0, 1

  /* %Float4_1 is redundant with %Float4_0 since they invokes cbufferLoadLegacy with same parameters */
  %Float4_1   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)

  /* %Float4_1.w is redundant with %Float4_0.w */
  %Float4_1.w = extractvalue %dx.types.CBufRet.f32 %Float4_1, 3 

  /* %Float2_1 is redundant with %Float2_0 since they invokes cbufferLoadLegacy with same parameters */
  %Float2_1   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)

  /* %Float2_1.y is redundant with %Float2_0.y */
  %Float2_1.y = extractvalue %dx.types.CBufRet.f32 %Float2_1, 1

  ....
  %11 = fmul fast float %Float4_0.w, %10
  %12 = fmul fast float %11, %Float2_0.y
  ....
  %14 = fmul fast float %Float4_1.w, %13
  %15 = fmul fast float %14, %Float2_1.y 

(%Float4_0.w * %Float2_0.y) equals to (%Float4_1.w * %Float2_1.y), they should be reassociated to a common factor

The upstream change can't identify this common factor because DXC doesn't know (%Float4_0.w, %Float4_1.w) and (%Float2_0.y, %Float2_1.y) are redundant when running Reassociate pass. Those redundancies will be eliminated in GVN pass.

For DXC can identify more common factors, this PR will aggressively run Reassociate pass again after GVN pass and then run GVN pass again to remove the redundancies generared in this run of Reassociate pass.

Changing the order of floating point operations causes the precision issue. In case some shaders get unexpected results due to this PR, use "-opt-disable aggressive-reassociation" to disable this PR and roll back.

This is part 3 of the fix for #6593.

dmpots · 2024-05-15T19:07:11Z

In your message you say

DXC doesn't know (%3, %7) and (%7, %9) are redundant

How are %7 and %9 redundant here? Should it be %5 and %9?

It would be easier to read this if you gave them names I think.

dmpots

We should look at the performance impact of this change before committing it.

lizhengxing · 2024-05-16T16:06:43Z

In your message you say

DXC doesn't know (%3, %7) and (%7, %9) are redundant

How are %7 and %9 redundant here? Should it be %5 and %9?

It would be easier to read this if you gave them names I think.

Yes, %5 and %9 are redundant. I simplified the dxils and renamed the virtual register number. I'll update the description.

lizhengxing · 2024-05-16T22:10:34Z

In your message you say

DXC doesn't know (%3, %7) and (%7, %9) are redundant

How are %7 and %9 redundant here? Should it be %5 and %9?

It would be easier to read this if you gave them names I think.

Done.

lib/Transforms/IPO/PassManagerBuilder.cpp

lizhengxing · 2024-05-21T17:44:54Z

We should look at the performance impact of this change before committing it.

Overall, compared to the upstream change (#6598), this change can reduce ALU in ~37% of shaders. The occupancy of a small number of shaders are improved (~1%). There were an equal number of regressions and improvements.

Overall, this change looks to be positive.

dmpots

LGTM

Although DXC applied the upstream change, Reassociate: add global reassociation algorithm (llvm/llvm-project@b8a330c) in this PR (#6598), it still might overlook some obvious common factors. One case has been observed is: %Float4_0 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1) %Float4_0.w = extractvalue %dx.types.CBufRet.f32 %Float4_0, 3 %Float2_0 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0) %Float2_0.y = extractvalue %dx.types.CBufRet.f32 %Float2_0, 1 /* %Float4_1 is redundant with %Float4_0 since they invokes cbufferLoadLegacy with same parameters */ %Float4_1 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1) /* %Float4_1.w is redundant with %Float4_0.w */ %Float4_1.w = extractvalue %dx.types.CBufRet.f32 %Float4_1, 3 /* %Float2_1 is redundant with %Float2_0 since they invokes cbufferLoadLegacy with same parameters */ %Float2_1 = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0) /* %Float2_1.y is redundant with %Float2_0.y */ %Float2_1.y = extractvalue %dx.types.CBufRet.f32 %Float2_1, 1 .... %11 = fmul fast float %Float4_0.w, %10 %12 = fmul fast float %11, %Float2_0.y .... %14 = fmul fast float %Float4_1.w, %13 %15 = fmul fast float %14, %Float2_1.y (%Float4_0.w * %Float2_0.y) equals to (%Float4_1.w * %Float2_1.y), they should be reassociated to a common factor The upstream change can't identify this common factor because DXC doesn't know (%Float4_0.w, %Float4_1.w) and (%Float2_0.y, %Float2_1.y) are redundant when running Reassociate pass. Those redundancies will be eliminated in GVN pass. For DXC can identify more common factors, this PR will aggressively run Reassociate pass again after GVN pass and then run GVN pass again to remove the redundancies generared in this run of Reassociate pass. Changing the order of floating point operations causes the precision issue. In case some shaders get unexpected results due to this PR, use "-opt-disable aggressive-reassociation" to disable this PR and roll back. This is part 3 of the fix for #6593.

lizhengxing requested a review from a team as a code owner May 15, 2024 18:21

dmpots suggested changes May 15, 2024

View reviewed changes

lizhengxing force-pushed the zhengxingli/common-factor-optimization-upstream-with-flag branch from 83b7b96 to 6623591 Compare May 16, 2024 20:13

lizhengxing force-pushed the zhengxingli/common-factor-optimization-more-reassocations branch 2 times, most recently from c72d018 to 8af0fbe Compare May 16, 2024 22:04

llvm-beanz reviewed May 17, 2024

View reviewed changes

lib/Transforms/IPO/PassManagerBuilder.cpp Outdated Show resolved Hide resolved

lizhengxing force-pushed the zhengxingli/common-factor-optimization-upstream-with-flag branch 2 times, most recently from aa61d01 to d4efcf2 Compare May 17, 2024 19:23

lizhengxing force-pushed the zhengxingli/common-factor-optimization-more-reassocations branch from 8af0fbe to 06c4b11 Compare May 17, 2024 21:05

tex3d approved these changes May 21, 2024

View reviewed changes

dmpots approved these changes May 21, 2024

View reviewed changes

lizhengxing force-pushed the zhengxingli/common-factor-optimization-upstream-with-flag branch from d4efcf2 to 4a2d290 Compare May 21, 2024 19:10

lizhengxing changed the base branch from zhengxingli/common-factor-optimization-upstream-with-flag to main May 21, 2024 20:45

lizhengxing force-pushed the zhengxingli/common-factor-optimization-more-reassocations branch from 06c4b11 to b052e1b Compare May 21, 2024 20:48

lizhengxing merged commit 9ee3f23 into main May 21, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More aggressive reassociations #6626

More aggressive reassociations #6626

lizhengxing commented May 15, 2024 •

edited

Loading

dmpots commented May 15, 2024

dmpots left a comment

lizhengxing commented May 16, 2024

lizhengxing commented May 16, 2024

lizhengxing commented May 21, 2024

dmpots left a comment

More aggressive reassociations #6626

More aggressive reassociations #6626

Conversation

lizhengxing commented May 15, 2024 • edited Loading

dmpots commented May 15, 2024

dmpots left a comment

Choose a reason for hiding this comment

lizhengxing commented May 16, 2024

lizhengxing commented May 16, 2024

lizhengxing commented May 21, 2024

dmpots left a comment

Choose a reason for hiding this comment

lizhengxing commented May 15, 2024 •

edited

Loading