Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorization of RANS2P2D #1232

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

JohanMabille
Copy link
Member

@JohanMabille JohanMabille commented Sep 24, 2020

Mandatory Checklist

Please ensure that the following criteria are met:

  • Title of pull request describes the changes/features
  • Request at least 2 reviewers
  • If new files are being added, the files are no larger than 100kB. Post the file sizes.
  • Code coverage did not decrease. If this is a bug fix, a test should cover that bug fix. If a new feature is added, a test should be made to cover that feature.
  • New features have appropriate documentation strings (readable by sphinx)
  • Contributor has read and agreed with CONTRIBUTING.md and has added themselves to CONTRIBUTORS.md

As a general rule of thumb, try to follow PEP8 guidelines.

Description

@codecov
Copy link

codecov bot commented Sep 28, 2020

Codecov Report

Merging #1232 (b04f9c3) into main (7f4f32b) will increase coverage by 5.18%.
The diff coverage is n/a.

❗ Current head b04f9c3 differs from pull request most recent head 6dbd9e2. Consider uploading reports for the commit 6dbd9e2 to get more accurate results

@@            Coverage Diff             @@
##             main    #1232      +/-   ##
==========================================
+ Coverage   47.56%   52.74%   +5.18%     
==========================================
  Files          90      531     +441     
  Lines       71776   109533   +37757     
==========================================
+ Hits        34140    57777   +23637     
- Misses      37636    51756   +14120     
Impacted Files Coverage Δ
proteus/NumericalSolution.py 70.73% <0.00%> (-7.41%) ⬇️
proteus/mprans/RDLS.py 66.98% <0.00%> (-7.40%) ⬇️
proteus/Archiver.py 31.64% <0.00%> (-4.55%) ⬇️
proteus/TwoPhaseFlow/TwoPhaseFlowProblem.py 92.96% <0.00%> (-2.84%) ⬇️
proteus/Gauges.py 93.58% <0.00%> (-1.19%) ⬇️
proteus/mprans/BodyDynamics.py 85.73% <0.00%> (-0.74%) ⬇️
proteus/iproteus.py 24.53% <0.00%> (-0.63%) ⬇️
proteus/default_so.py 90.90% <0.00%> (-0.40%) ⬇️
proteus/LinearSolvers.py 57.83% <0.00%> (-0.30%) ⬇️
proteus/Profiling.py 47.15% <0.00%> (-0.28%) ⬇️
... and 462 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bf2bf66...6dbd9e2. Read the comment docs.

@JohanMabille JohanMabille marked this pull request as ready for review September 29, 2020 13:38
@JohanMabille
Copy link
Member Author

@cekees @zhang-alvin @tridelat This is ready for a review. I haven't replaced the data()[...] in the call of methods of CompKernel since I will add a new class that accepts xtensor objects (but I will do that in a dedicated PR).

Can you confirm that this does not hurt performance?

@zhang-alvin
Copy link
Contributor

I ran a cutfem-based 2D case using this branch and the master branch. There was no major difference in the running times.

@cekees
Copy link
Member

cekees commented Sep 29, 2020

@cekees @zhang-alvin @tridelat This is ready for a review. I haven't replaced the data()[...] in the call of methods of CompKernel since I will add a new class that accepts xtensor objects (but I will do that in a dedicated PR).

Can you confirm that this does not hurt performance?

Nice work! @jhcollins you might have some parallel jobs set up where you could do a timing comparison as well. My allocations on HPC are not ready yet, but I'll test some compute intensive jobs on mac osx and linux.

@JohanMabille did you make this conversion by hand or did you write a python script? If via script, it would be nice if you could add that to the scripts directory for future use.

@JohanMabille
Copy link
Member Author

I did this one by hand because I wanted to see if I could add other simplifications (like replacing initialization loops). I can work on a Python script for the other files.

@jhcollins
Copy link
Contributor

@cekees do you want the parallel timing comparison using a mesh conforming setup or cutfem like alvin?

@cekees
Copy link
Member

cekees commented Sep 30, 2020

@cekees do you want the parallel timing comparison using a mesh conforming setup or cutfem like alvin?

Sorry, just saw this. I think we need to verify the performance on something we've run a lot and load up the cores with mesh nodes. Maybe a dambreak or one of your wave flume simulations and try it on 2 or 3 core counts so you can get maybe 1000 vertices per core, 2000 vertices per core and 4000 vertices per core. In 2D you can likely get more like 20,000 vertices per core. If you run it with --profiling you should get a list of the top 20 functions. Typically the residual and jacobian for RANS2P will make it onto the list. The PETSc solve and preconditioner setup would be the top costs, in the 80-90% range, then below that we should see the calculateResidual and calculateJacobian functions. If yuou have go-to FSI simulation, like a floating caisson with ALE, that would be handy because it tests more of the functionality.

@cekees
Copy link
Member

cekees commented Oct 2, 2020

My timings are looking great @JohanMabille. I'll merge this tomorrow once a few large jobs run on HPC platforms from Cray and SGI, and I confirm the results are identical and timings equivalent. So far I see some cases where the new implementation appears faster, but it may just be some kind of load fluctuations (though these tests are done on dedicated nodes).

@cekees
Copy link
Member

cekees commented Oct 6, 2020

@JohanMabille and @jhcollins I verified that the numerical results are essentially identical on a 2D dambreak (two-phase) and 2D FSI (two-phase with mesh deformation/ALE). There are some differences on the order of 1e-22, which I suspect have to do with the compiler taking different paths at the aggressive -O3 optimization level. For both a "standard load" of 2000 vertices per core and a heavier load of about 10,000 vertices per core, the new indexing is actually slightly faster. @jhcollins let me know if you are able to identify the issue where you found non-trivial differences in the computed solutions. I tested on a Cray XC40 with gnu 7.3.0 compilers.

Copy link
Member

@cekees cekees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! 👍

@cekees cekees added the nomerge label Oct 8, 2020
Base automatically changed from master to main February 10, 2021 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants