Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More aggressive reassociations #6626

Merged

Conversation

lizhengxing
Copy link
Collaborator

@lizhengxing lizhengxing commented May 15, 2024

Although DXC applied the upstream change, Reassociate: add global reassociation algorithm (llvm/llvm-project@b8a330c) in this PR (#6598), it still might overlook some obvious common factors.

One case has been observed is:

  %Float4_0   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)
  %Float4_0.w = extractvalue %dx.types.CBufRet.f32 %Float4_0, 3
  %Float2_0   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)
  %Float2_0.y = extractvalue %dx.types.CBufRet.f32 %Float2_0, 1

  /* %Float4_1 is redundant with %Float4_0 since they invokes cbufferLoadLegacy with same parameters */
  %Float4_1   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)

  /* %Float4_1.w is redundant with %Float4_0.w */
  %Float4_1.w = extractvalue %dx.types.CBufRet.f32 %Float4_1, 3 

  /* %Float2_1 is redundant with %Float2_0 since they invokes cbufferLoadLegacy with same parameters */
  %Float2_1   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)

  /* %Float2_1.y is redundant with %Float2_0.y */
  %Float2_1.y = extractvalue %dx.types.CBufRet.f32 %Float2_1, 1

  ....
  %11 = fmul fast float %Float4_0.w, %10
  %12 = fmul fast float %11, %Float2_0.y
  ....
  %14 = fmul fast float %Float4_1.w, %13
  %15 = fmul fast float %14, %Float2_1.y 

(%Float4_0.w * %Float2_0.y) equals to (%Float4_1.w * %Float2_1.y), they should be reassociated to a common factor

The upstream change can't identify this common factor because DXC doesn't know (%Float4_0.w, %Float4_1.w) and (%Float2_0.y, %Float2_1.y) are redundant when running Reassociate pass. Those redundancies will be eliminated in GVN pass.

For DXC can identify more common factors, this PR will aggressively run Reassociate pass again after GVN pass and then run GVN pass again to remove the redundancies generared in this run of Reassociate pass.

Changing the order of floating point operations causes the precision issue. In case some shaders get unexpected results due to this PR, use "-opt-disable aggressive-reassociation" to disable this PR and roll back.

This is part 3 of the fix for #6593.

@lizhengxing lizhengxing requested a review from a team as a code owner May 15, 2024 18:21
@dmpots
Copy link
Contributor

dmpots commented May 15, 2024

In your message you say

DXC doesn't know (%3, %7) and (%7, %9) are redundant

How are %7 and %9 redundant here? Should it be %5 and %9?

It would be easier to read this if you gave them names I think.

Copy link
Contributor

@dmpots dmpots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should look at the performance impact of this change before committing it.

@lizhengxing
Copy link
Collaborator Author

In your message you say

DXC doesn't know (%3, %7) and (%7, %9) are redundant

How are %7 and %9 redundant here? Should it be %5 and %9?

It would be easier to read this if you gave them names I think.

Yes, %5 and %9 are redundant. I simplified the dxils and renamed the virtual register number. I'll update the description.

@lizhengxing lizhengxing force-pushed the zhengxingli/common-factor-optimization-upstream-with-flag branch from 83b7b96 to 6623591 Compare May 16, 2024 20:13
@lizhengxing lizhengxing force-pushed the zhengxingli/common-factor-optimization-more-reassocations branch 2 times, most recently from c72d018 to 8af0fbe Compare May 16, 2024 22:04
@lizhengxing
Copy link
Collaborator Author

In your message you say

DXC doesn't know (%3, %7) and (%7, %9) are redundant

How are %7 and %9 redundant here? Should it be %5 and %9?

It would be easier to read this if you gave them names I think.

Done.

@lizhengxing lizhengxing force-pushed the zhengxingli/common-factor-optimization-upstream-with-flag branch 2 times, most recently from aa61d01 to d4efcf2 Compare May 17, 2024 19:23
@lizhengxing lizhengxing force-pushed the zhengxingli/common-factor-optimization-more-reassocations branch from 8af0fbe to 06c4b11 Compare May 17, 2024 21:05
@lizhengxing
Copy link
Collaborator Author

We should look at the performance impact of this change before committing it.

Overall, compared to the upstream change (#6598), this change can reduce ALU in ~37% of shaders. The occupancy of a small number of shaders are improved (~1%). There were an equal number of regressions and improvements.

Overall, this change looks to be positive.

Copy link
Contributor

@dmpots dmpots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lizhengxing lizhengxing force-pushed the zhengxingli/common-factor-optimization-upstream-with-flag branch from d4efcf2 to 4a2d290 Compare May 21, 2024 19:10
@lizhengxing lizhengxing changed the base branch from zhengxingli/common-factor-optimization-upstream-with-flag to main May 21, 2024 20:45
Although DXC applied the upstream change, Reassociate: add global reassociation algorithm (llvm/llvm-project@b8a330c) in this PR (#6598), it still might overlook some obvious common factors.

One case has been observed is:
  %Float4_0   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)
  %Float4_0.w = extractvalue %dx.types.CBufRet.f32 %Float4_0, 3
  %Float2_0   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)
  %Float2_0.y = extractvalue %dx.types.CBufRet.f32 %Float2_0, 1

  /* %Float4_1 is redundant with %Float4_0 since they invokes cbufferLoadLegacy with same parameters */
  %Float4_1   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 1)

  /* %Float4_1.w is redundant with %Float4_0.w */
  %Float4_1.w = extractvalue %dx.types.CBufRet.f32 %Float4_1, 3

  /* %Float2_1 is redundant with %Float2_0 since they invokes cbufferLoadLegacy with same parameters */
  %Float2_1   = call %dx.types.CBufRet.f32 @dx.op.cbufferLoadLegacy.f32(i32 59, %dx.types.Handle %1, i32 0)

  /* %Float2_1.y is redundant with %Float2_0.y */
  %Float2_1.y = extractvalue %dx.types.CBufRet.f32 %Float2_1, 1

  ....
  %11 = fmul fast float %Float4_0.w, %10
  %12 = fmul fast float %11, %Float2_0.y
  ....
  %14 = fmul fast float %Float4_1.w, %13
  %15 = fmul fast float %14, %Float2_1.y

(%Float4_0.w * %Float2_0.y) equals to (%Float4_1.w * %Float2_1.y), they should be reassociated to a common factor

The upstream change can't identify this common factor because DXC doesn't know (%Float4_0.w, %Float4_1.w) and (%Float2_0.y, %Float2_1.y) are redundant when running Reassociate pass. Those redundancies will be eliminated in GVN pass.

For DXC can identify more common factors, this PR will aggressively run Reassociate pass again after GVN pass and then run GVN pass again to remove the redundancies generared in this run of Reassociate pass.

Changing the order of floating point operations causes the precision issue. In case some shaders get unexpected results due to this PR, use "-opt-disable aggressive-reassociation" to disable this PR and roll back.

This is part 3 of the fix for #6593.
@lizhengxing lizhengxing force-pushed the zhengxingli/common-factor-optimization-more-reassocations branch from 06c4b11 to b052e1b Compare May 21, 2024 20:48
@lizhengxing lizhengxing merged commit 9ee3f23 into main May 21, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants