-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarking / performance measurement of LLVM backend on Intel CPUs : Part I #613
Comments
Just to update here @georgemitenkov : I have tested few small examples and SSE vs AVX-2 examples locally. But for detailed analysis, I will wait for #611 ( / #612) so that assembly & performance metrics could be analysed in the detailed. |
Great! I had an exam yesterday so Monday/Tuesday were a bit out for me. I have started looking at the debug info, so hopefully this one should be ready soonish (~Thursday). Regarding assembly verification: ideally, do we want to dump it to the log file, so that the structure is: ====== start
====== JIT part
====== end What do you think? @pramodk |
Oh ok! Np!
Yup, above part LGTM! |
Just as initial reference, below is a summary of current timings on x86_64. --fmf nnan contract afn --vector-width 8 --veclib SVML benchmark \
--opt-level-ir 3 --opt-level-codegen 3 --run --instance-size 100000000 \
--repeat 10 compute-bound_clang_-O3-march=skylake-avx512-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.322915
compute-bound_clang_-O3-march=skylake-avx512-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.419407
compute-bound_clang_-O3-mavx2-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.344690
compute-bound_clang_-O3-mavx2-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.423696
compute-bound_clang_-O3-mavx512f-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.350585
compute-bound_clang_-O3-mavx512f-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.319667
compute-bound_clang_-O3-mavx512f-ffast-math-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.347119
compute-bound_clang_-O3-mavx512f-ffast-math-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.320830
compute-bound_clang_-O3-mavx512f-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.323365
compute-bound_clang_-O3-mavx512f-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.317312
compute-bound_clang_-O3-msse2-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.347382
compute-bound_clang_-O3-msse2-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.629991
hh_clang_-O3-march=skylake-avx512-ffast-math-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 1.659959
hh_clang_-O3-march=skylake-avx512-ffast-math-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 10.597442
hh_clang_-O3-mavx2-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 1.639105
hh_clang_-O3-mavx2-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 2.132582
hh_clang_-O3-mavx512f-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 1.635455
hh_clang_-O3-mavx512f-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 1.510965
hh_clang_-O3-mavx512f-ffast-math-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 1.634934
hh_clang_-O3-mavx512f-ffast-math-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 10.587418
hh_clang_-O3-mavx512f-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 1.610168
hh_clang_-O3-mavx512f-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 12.130137
hh_clang_-O3-msse2-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 1.634898
hh_clang_-O3-msse2-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 3.086421
hh_gcc_-O3-mavx2-ffast-math-ftree-vectorize-mveclibabi=svml.log:[NMODL] [info] :: Average compute time = 1.610445
hh_gcc_-O3-mavx2-ffast-math-ftree-vectorize-mveclibabi=svml.log:[NMODL] [info] :: Average compute time = 10.701414
hh_gcc_-O3-mavx512f-ffast-math-ftree-vectorize-mveclibabi=svml.log:[NMODL] [info] :: Average compute time = 1.614212
hh_gcc_-O3-mavx512f-ffast-math-ftree-vectorize-mveclibabi=svml.log:[NMODL] [info] :: Average compute time = 10.897828
hh_gcc_-O3-msse2-ffast-math-ftree-vectorize-mveclibabi=svml.log:[NMODL] [info] :: Average compute time = 1.611068
hh_gcc_-O3-msse2-ffast-math-ftree-vectorize-mveclibabi=svml.log:[NMODL] [info] :: Average compute time = 11.025482
hh_icpc_-O2-march=skylake-avx512-mtune=skylake-avx512-prec-div-fimf-use-svml.log:[NMODL] [info] :: Average compute time = 1.622493
hh_icpc_-O2-march=skylake-avx512-mtune=skylake-avx512-prec-div-fimf-use-svml.log:[NMODL] [info] :: Average compute time = 1.913908
hh_icpc_-O2-mavx2-prec-div-fimf-use-svml.log:[NMODL] [info] :: Average compute time = 1.792381
hh_icpc_-O2-mavx2-prec-div-fimf-use-svml.log:[NMODL] [info] :: Average compute time = 1.908091
hh_icpc_-O2-mavx512f-prec-div-fimf-use-svml.log:[NMODL] [info] :: Average compute time = 1.794239
hh_icpc_-O2-mavx512f-prec-div-fimf-use-svml.log:[NMODL] [info] :: Average compute time = 1.576430
hh_icpc_-O2-msse2-prec-div-fimf-use-svml.log:[NMODL] [info] :: Average compute time = 1.792621
hh_icpc_-O2-msse2-prec-div-fimf-use-svml.log:[NMODL] [info] :: Average compute time = 3.003994
hh_icpc_-O2-qopt-zmm-usage=high-xCORE-AVX512-prec-div-fimf-use-svml.log:[NMODL] [info] :: Average compute time = 1.612436
hh_icpc_-O2-qopt-zmm-usage=high-xCORE-AVX512-prec-div-fimf-use-svml.log:[NMODL] [info] :: Average compute time = 1.750384
memory-bound_clang_-O3-march=skylake-avx512-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.402982
memory-bound_clang_-O3-march=skylake-avx512-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.404010
memory-bound_clang_-O3-mavx2-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.402691
memory-bound_clang_-O3-mavx2-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.403016
memory-bound_clang_-O3-mavx512f-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.402822
memory-bound_clang_-O3-mavx512f-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.403130
memory-bound_clang_-O3-mavx512f-ffast-math-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.402736
memory-bound_clang_-O3-mavx512f-ffast-math-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.403115
memory-bound_clang_-O3-mavx512f-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.405940
memory-bound_clang_-O3-mavx512f-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.406087
memory-bound_clang_-O3-msse2-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.403234
memory-bound_clang_-O3-msse2-ffast-math-fopenmp-fveclib=SVML.log:[NMODL] [info] :: Average compute time = 0.404857
|
Thanks @castigli ! Any specific reason we use only |
no, except that I forgot to add it! I will re-run the test. |
@pramodk @castigli @iomaganaris Current configurations would be, with [..] indicating a test parameter
For CPU names, we can use any that Clang supports. We also want to see the effect of aliasing, and see how performance for floats differ (float => 32bits => vector width is greater => maybe more scatter/gather overhead) |
As part of this ticket, we are going to benchmark LLVM code generation backend with different configurations. Here are some practical considerations:
@georgemitenkov : I have assigned this to myself temporarily as I am going to do simple cross-checks with performance numbers with recently added
--veclib SVML
option.The text was updated successfully, but these errors were encountered: