-
Notifications
You must be signed in to change notification settings - Fork 50
Description
It has been observed for a year or two that Flowthrough_damr yields relative diffs of order 1e-6 in testpackage runs even in cases when a branch being tested only changes things that do not affect the computations directly, e.g. #1099 in recent weeks for me. As discussed at #1099 and #1161 I dug deeper and this issue is to keep notes of what I observed, and how to reproduce for further investigation.
The key new observation is that I get diffs even when running Flowthrough_damr serially, meaning that our previous interpretation that these were the result of threading/summation differences especially at AMR and MPI domain boundaries is at best incomplete. Of course threading etc will amplify diffs possibly, but the setup documented here yields diffs serially, again.
Another key observation is that the diffs jump up significantly (sometimes more than x10 increase) when AMR happens, but not at plain LB, which I confirmed by setting LB to occur every 5 steps, and AMR to occur every 20 steps (so faster LB but same AMR intervals as the default Flowthrough_amr). I did not see strong jumps in diffs that are not just after an AMR.
Things that had little or no impact on the overall level of diffs include
- changing the LB algorithm (not relevant given the serial diffs observation)
- compiling with O0
- disabling jemalloc
- using short pencils
- using fallback vectorisation
- disabling the randomised order of acceleration directions
- disabling fsgrid filtering
- 1st order space and time
Steps to reproduce and see the diffs:
- build
devand a branch, e.g. the current state of FsGrid & fieldsolver refactoring #1099 - run Flowthrough_damr (increase IO cadence to write a file every 0.8 s i.e. every 10 steps)
- run
vlsvdiff_DPforproton/vg_rho - the diffs go from relative 1e-12 at the beginning to 1e-6 about half-way through the test.
Changing some of the parameters I tested above may change e.g. how early the first big jump in diffs occurs, but none got rid of that jump and subsequent increase in diffs, as opposed to what I would see as acceptable levels of ~1e-12 relative diffs throughout without such massive jumps.
It would seem that short pencils "help" in bringing the abrupt jumps to the fore.