Vectorization of RANS2P2D #1232

JohanMabille · 2020-09-24T09:03:20Z

Mandatory Checklist

Please ensure that the following criteria are met:

Title of pull request describes the changes/features
Request at least 2 reviewers
If new files are being added, the files are no larger than 100kB. Post the file sizes.
Code coverage did not decrease. If this is a bug fix, a test should cover that bug fix. If a new feature is added, a test should be made to cover that feature.
New features have appropriate documentation strings (readable by sphinx)
Contributor has read and agreed with CONTRIBUTING.md and has added themselves to CONTRIBUTORS.md

As a general rule of thumb, try to follow PEP8 guidelines.

Description

codecov · 2020-09-28T12:40:56Z

Codecov Report

Merging #1232 (b04f9c3) into main (7f4f32b) will increase coverage by 5.18%.
The diff coverage is n/a.

❗ Current head b04f9c3 differs from pull request most recent head 6dbd9e2. Consider uploading reports for the commit 6dbd9e2 to get more accurate results

@@            Coverage Diff             @@
##             main    #1232      +/-   ##
==========================================
+ Coverage   47.56%   52.74%   +5.18%     
==========================================
  Files          90      531     +441     
  Lines       71776   109533   +37757     
==========================================
+ Hits        34140    57777   +23637     
- Misses      37636    51756   +14120

Impacted Files	Coverage Δ
proteus/NumericalSolution.py	`70.73% <0.00%> (-7.41%)`	⬇️
proteus/mprans/RDLS.py	`66.98% <0.00%> (-7.40%)`	⬇️
proteus/Archiver.py	`31.64% <0.00%> (-4.55%)`	⬇️
proteus/TwoPhaseFlow/TwoPhaseFlowProblem.py	`92.96% <0.00%> (-2.84%)`	⬇️
proteus/Gauges.py	`93.58% <0.00%> (-1.19%)`	⬇️
proteus/mprans/BodyDynamics.py	`85.73% <0.00%> (-0.74%)`	⬇️
proteus/iproteus.py	`24.53% <0.00%> (-0.63%)`	⬇️
proteus/default_so.py	`90.90% <0.00%> (-0.40%)`	⬇️
proteus/LinearSolvers.py	`57.83% <0.00%> (-0.30%)`	⬇️
proteus/Profiling.py	`47.15% <0.00%> (-0.28%)`	⬇️
... and 462 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf2bf66...6dbd9e2. Read the comment docs.

JohanMabille · 2020-09-29T13:42:22Z

@cekees @zhang-alvin @tridelat This is ready for a review. I haven't replaced the data()[...] in the call of methods of CompKernel since I will add a new class that accepts xtensor objects (but I will do that in a dedicated PR).

Can you confirm that this does not hurt performance?

zhang-alvin · 2020-09-29T22:21:04Z

I ran a cutfem-based 2D case using this branch and the master branch. There was no major difference in the running times.

cekees · 2020-09-29T22:21:09Z

@cekees @zhang-alvin @tridelat This is ready for a review. I haven't replaced the data()[...] in the call of methods of CompKernel since I will add a new class that accepts xtensor objects (but I will do that in a dedicated PR).

Can you confirm that this does not hurt performance?

Nice work! @jhcollins you might have some parallel jobs set up where you could do a timing comparison as well. My allocations on HPC are not ready yet, but I'll test some compute intensive jobs on mac osx and linux.

@JohanMabille did you make this conversion by hand or did you write a python script? If via script, it would be nice if you could add that to the scripts directory for future use.

JohanMabille · 2020-09-30T05:17:15Z

I did this one by hand because I wanted to see if I could add other simplifications (like replacing initialization loops). I can work on a Python script for the other files.

jhcollins · 2020-09-30T05:47:46Z

@cekees do you want the parallel timing comparison using a mesh conforming setup or cutfem like alvin?

cekees · 2020-09-30T21:01:18Z

@cekees do you want the parallel timing comparison using a mesh conforming setup or cutfem like alvin?

Sorry, just saw this. I think we need to verify the performance on something we've run a lot and load up the cores with mesh nodes. Maybe a dambreak or one of your wave flume simulations and try it on 2 or 3 core counts so you can get maybe 1000 vertices per core, 2000 vertices per core and 4000 vertices per core. In 2D you can likely get more like 20,000 vertices per core. If you run it with --profiling you should get a list of the top 20 functions. Typically the residual and jacobian for RANS2P will make it onto the list. The PETSc solve and preconditioner setup would be the top costs, in the 80-90% range, then below that we should see the calculateResidual and calculateJacobian functions. If yuou have go-to FSI simulation, like a floating caisson with ALE, that would be handy because it tests more of the functionality.

cekees · 2020-10-02T15:00:25Z

My timings are looking great @JohanMabille. I'll merge this tomorrow once a few large jobs run on HPC platforms from Cray and SGI, and I confirm the results are identical and timings equivalent. So far I see some cases where the new implementation appears faster, but it may just be some kind of load fluctuations (though these tests are done on dedicated nodes).

cekees · 2020-10-06T15:26:21Z

@JohanMabille and @jhcollins I verified that the numerical results are essentially identical on a 2D dambreak (two-phase) and 2D FSI (two-phase with mesh deformation/ALE). There are some differences on the order of 1e-22, which I suspect have to do with the compiler taking different paths at the aggressive -O3 optimization level. For both a "standard load" of 2000 vertices per core and a heavier load of about 10,000 vertices per core, the new indexing is actually slightly faster. @jhcollins let me know if you are able to identify the issue where you found non-trivial differences in the computed solutions. I tested on a Cray XC40 with gnu 7.3.0 compilers.

cekees

Looks great! 👍

JohanMabille force-pushed the RANS2P2D branch from 5ed0393 to 9a49a6c Compare September 25, 2020 13:06

JohanMabille force-pushed the RANS2P2D branch from 253f88b to e32dd55 Compare September 28, 2020 14:33

Replaced initialization and obvious computation with STL algorithms

e32dd55

JohanMabille force-pushed the RANS2P2D branch 4 times, most recently from 97a516a to b04f9c3 Compare September 29, 2020 11:39

JohanMabille added 2 commits September 29, 2020 13:40

Use pyarray to compute data offset

1fc615b

Code cleanup

b04f9c3

JohanMabille marked this pull request as ready for review September 29, 2020 13:38

zhang-alvin requested review from zhang-alvin and cekees September 29, 2020 14:43

cekees approved these changes Oct 6, 2020

View reviewed changes

cekees added the nomerge label Oct 8, 2020

Base automatically changed from master to main February 10, 2021 16:25

Merge branch 'main' into RANS2P2D

6dbd9e2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorization of RANS2P2D #1232

Vectorization of RANS2P2D #1232

JohanMabille commented Sep 24, 2020 •

edited by zhang-alvin

Loading

codecov bot commented Sep 28, 2020 •

edited

Loading

JohanMabille commented Sep 29, 2020

zhang-alvin commented Sep 29, 2020

cekees commented Sep 29, 2020

JohanMabille commented Sep 30, 2020

jhcollins commented Sep 30, 2020

cekees commented Sep 30, 2020

cekees commented Oct 2, 2020

cekees commented Oct 6, 2020

cekees left a comment

Vectorization of RANS2P2D #1232

Are you sure you want to change the base?

Vectorization of RANS2P2D #1232

Conversation

JohanMabille commented Sep 24, 2020 • edited by zhang-alvin Loading

Mandatory Checklist

Description

codecov bot commented Sep 28, 2020 • edited Loading

Codecov Report

JohanMabille commented Sep 29, 2020

zhang-alvin commented Sep 29, 2020

cekees commented Sep 29, 2020

JohanMabille commented Sep 30, 2020

jhcollins commented Sep 30, 2020

cekees commented Sep 30, 2020

cekees commented Oct 2, 2020

cekees commented Oct 6, 2020

cekees left a comment

Choose a reason for hiding this comment

JohanMabille commented Sep 24, 2020 •

edited by zhang-alvin

Loading

codecov bot commented Sep 28, 2020 •

edited

Loading