Skip to content

fb_smc_speedup: Performance optimize on SMC grid#1

Open
Yun1Liu wants to merge 1 commit intodevelopfrom
fb_smc_speedup
Open

fb_smc_speedup: Performance optimize on SMC grid#1
Yun1Liu wants to merge 1 commit intodevelopfrom
fb_smc_speedup

Conversation

@Yun1Liu
Copy link
Copy Markdown
Owner

@Yun1Liu Yun1Liu commented Jun 3, 2021

Pull Request Summary

Added extra 1-dimension arrays with duplicated data for IJKUFc, IJKVFc, IJKCel, to make the data more contiguous, which can save memory bandwidth, achieve better performance.

Description

The performance of SMC grid is limited by the memory bandwidth, because it needs to read
the 2-dimension arrays IJKUFc, IJKVFc, IJKCel many times.
In the loop, only 2 cells of the column (such as IJKCel(3,j), IJKCel(4,j)) are used,
the pre-fetch on the other cells is wasted.
So these arrays are kind of "AOS" (Array of Structure), I built some extra 1-dimension arrays
like "SOA" to make the data more contiguous, achieve better performance.

On the PRC NMEFC workloads, these changes can achieve 17% performance increasement.

Issue(s) addressed

  • Is there an issue associated with this development (bug fix, enhancement, new feature)?
    This is an enhancement, I didn't open an issue before.
  • fixes #<issue_number>
  • fixes noaa-emc/ww3/issues/<issue_number>

Check list

Testing

  • How were these changes tested?
    This has been tested on NMEFC's workload, and regtests/ww3_tp2.16, on Intel Icelake and Cascadelake platforms.

  • Are the changes covered by regression tests? (If not, why? Do new tests need to be added?)
    tested on ww3_tp2.16.

  • If a new feature was added, was a new regression test added?

  • Have regression tests been run?
    Yes

  • Which compiler / HPC you used to run the regression tests in the PR?
    On Intel ifort & mpiifort.

  • Please provide the summary output of matrix.comp (matrix.Diff.out, matrixCompFull.out and matrixCompSummary.out):

Summary:
******************************************************
*** compare WAVEWATCH III matrix of regression tests ***
******************************************************

base directory : /home/liuyun/WW3/regtests/
comp directory : /home/liuyun/Yun1Liu/WW3/regtests/
test(s) :
ww3_tp2.16


********************* non-identical cases ****************************



************************ identical cases *****************************


ww3_tp2.16/./work


******************** summary of comparison ***************************
********** only results of non-identical cases are listed ************
****** if less than 10 files differ for a case, they are listed ******


Full:

   ******************************************************
 ***  compare WAVEWATCH III matrix of regression tests  ***
   ******************************************************

base directory : /home/liuyun/WW3/regtests/
comp directory : /home/liuyun/Yun1Liu/WW3/regtests/
test(s) :
ww3_tp2.16


********************* non-identical cases ****************************



************************ identical cases *****************************


ww3_tp2.16/./work


******************* full output of comparison ************************


  • test case: ww3_tp2.16; test run: ./work

found 36 files in base directory
found 36 files in compare directory
restart.ww3 are identical (binary)
ww3.68060600.wnd are identical
ww3.68060600.hs are identical
ww3.68060600.t01 are identical
ww3.68060603.wnd are identical
ww3.68060603.hs are identical
ww3.68060603.t01 are identical
ww3.68060606.wnd are identical
ww3.68060606.hs are identical
ww3.68060606.t01 are identical
ww3.68060609.wnd are identical
ww3.68060609.hs are identical
ww3.68060609.t01 are identical
ww3.68060612.wnd are identical
ww3.68060612.hs are identical
ww3.68060612.t01 are identical
ww3.68060615.wnd are identical
ww3.68060615.hs are identical
ww3.68060615.t01 are identical
ww3.68060618.wnd are identical
ww3.68060618.hs are identical
ww3.68060618.t01 are identical
ww3.68060621.wnd are identical
ww3.68060621.hs are identical
ww3.68060621.t01 are identical
ww3.68060700.wnd are identical
ww3.68060700.hs are identical
ww3.68060700.t01 are identical
ww3_grid.out are identical
mod_def.ww3 are identical (binary)
ww3_strt.out are identical
skipped ww3_shel.out
log.ww3 are identical (filtered)
out_grd.ww3 are identical (binary)
out_pnt.ww3 are identical (binary)
ww3_outf.out are identical

Please indicate the expected changes in the outputs (excluding the known list of non-identical tests).

… IJKUFc, IJKVFc, IJKCel, to make the data more contiguous, which can save memory bandwidth, achieve better performance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants