Skip to content

Conversation

@ykempf
Copy link
Contributor

@ykempf ykempf commented Aug 18, 2025

During investigations of diffs in #1099 I was baffled again by our diffs in the testpackage Flowthrough_amr test. What struck me today is that diffs were jumping up by factors of 2 to 6 at every change in dt step-by-step when the "normal" changes are diffs growing by about 10% per step or thereabout.

I tested activating

[vlasovsolver]
accelerateMaxwellianBoundaries = 1

which is well-established in production.

In #1099 at today's state https://github.com/fmihpc/vlasiator/actions/runs/16967516987/job/48291190426 the diffs in Flowthrough_amr are (Carrington CI), but this has been known to be a bit instable and sometimes be 100x larger:

 variable                                     | absolute diff | relative diff |
----------
Comparing file Flowthrough_amr/bulk.0000001.vlsv against reference
 proton/vg_rho_0                                0.0193          9.01e-09
 proton/vg_v_0                                  0.00259         1.19e-08
 proton/vg_v_1                                  0.0081          8.01e-07
 proton/vg_v_2                                  0.00564         3.89e-07
 fg_b_0                                         1.07e-17        9.11e-09
 fg_b_1                                         2.04e-17        9.92e-09
 fg_b_2                                         9.98e-18        4.66e-09
 fg_e_0                                         2.05e-12        8.67e-08
 fg_e_1                                         4.96e-12        1.14e-08
 fg_e_2                                         7.56e-12        1.83e-08

Running manually dev vs the branch on LUMI-C, on a single task and single thread, I get

proton/vg_rho 0          5.18    2.41e-06
proton/vg_v 0          0.945   4.35e-06
fg_b 0                     1.03e-15        8.81e-07
fg_e 0                      4.99e-10        2.08e-05

If I activate the Maxwellian boundary acceleration, still single task, single thread, I get

proton/vg_rho 0         0.226   1.05e-07
proton/vg_v 0         0.0522  2.39e-07
fg_b 0                       1.08e-16        9.19e-08
fg_e 0                        8.76e-11        3.69e-06

If I activate the Maxwellian boundary acceleration on 16 tasks x 16 threads I get

proton/vg_rho 0       0.0174  8.07e-09
proton/vg_v 0           0.00789 3.61e-08
fg_b 0                         1.89e-18        1.61e-09
fg_e 0                          2.4e-12 1.01e-07

In the latter two cases, the diffs don't increase monotonically any more for all variables, so this doesn't rid us of all diffs. I'm also a bit surprised the 16 x 16 diverges less than the serial version. But well.

Note that dt changes "macroscopically" of course, and new reference data will be needed after this, but it seems to make Flowthrough_amr less diffy.

@ykempf
Copy link
Contributor Author

ykempf commented Aug 18, 2025

Here are the results for similar tests with Flowthrough_damr.

As reference, that #1099 latest stage linked above:

Comparing file Flowthrough_damr/bulk.0000004.vlsv against reference
 proton/vg_rho_0                                8.05            2.7e-06
 proton/vg_v_0                                  0.229           9.79e-07
 proton/vg_v_1                                  0.162           1.31e-05
 proton/vg_v_2                                  0.218           1.51e-05
 vg_amr_alpha1_0                                1.41e-06        1.41e-06
 vg_amr_alpha2_0                                9.87e-07        1.14e-06
 fg_b_0                                         2.13e-15        1.92e-06
 fg_b_1                                         3.5e-15         1.08e-06
 fg_b_2                                         2.06e-15        6.27e-07
 fg_e_0                                         2.15e-10        2.71e-05
 fg_e_1                                         4.5e-10         6.23e-07
 fg_e_2                                         5.36e-10        7.55e-07

The below run on LUMI-C.

On 16 tasks x 16 threads, default case (existing testpackage settings), I get:

proton/vg_rho
5       18.5    6.21e-06
proton/vg_v
5       0.868   3.72e-06
--meshname=fsgrid fg_b
5       3.58e-15        3.22e-06
--meshname=fsgrid fg_e
5       4e-10   5.11e-05

On 32 tasks x 1 thread each, Maxwellian boundary acc on, comparing the last file, I get (absolute and relative 0-diff listed)

proton/vg_rho
5       7.99    2.68e-06
proton/vg_v
5       0.149   6.37e-07
--meshname=fsgrid fg_b
5       5.26e-15        4.65e-06
--meshname=fsgrid fg_e
5       2.53e-10        3.36e-05

On 1 task x 64 threads, Maxwellian boundary acc on:

proton/vg_rho
5       9.16    3.07e-06
proton/vg_v
5       0.312   1.34e-06
--meshname=fsgrid fg_b
5       2.7e-14 2.44e-05
--meshname=fsgrid fg_e
5       6.04e-10        8.36e-05

On 16 tasks x 16 threads, Maxwellian acc on:

ERROR Datasets have different size.
4       17.1    5.93e-06
proton/vg_v
ERROR Datasets have different size.
4       0.436   1.87e-06
--meshname=fsgrid fg_b
5       6.28e-11        0.0565
--meshname=fsgrid fg_e
5       4.09e-06        0.52

So for this test, the benefit of accelerating these boundary cells is not obvious/this doesn't fix anything.

@ykempf
Copy link
Contributor Author

ykempf commented Aug 18, 2025

Sneak peek: Flowthrough_damr run serially has slightly larger diffs with accelerated Maxwellian boundaries than without, so the diff source of this test isn't affected it seems. I'll let that run overnight.

@ykempf
Copy link
Contributor Author

ykempf commented Aug 18, 2025

Final state of Flowthrough_damr run serially, for the base case and the Maxwellian-accelerated case, in dev and in the #1099 branch.

Flowthrough_damr_serial
proton/vg_rho
The absolute 0-distance between both datasets is 8.73
The relative 0-distance between both datasets is 2.92e-06
proton/vg_v
The absolute 0-distance between both datasets is 0.139
The relative 0-distance between both datasets is 5.96e-07
--meshname=fsgrid fg_b
The absolute 0-distance between both datasets is 3.88e-15
The relative 0-distance between both datasets is 3.49e-06
--meshname=fsgrid fg_e
The absolute 0-distance between both datasets is 2.49e-10
The relative 0-distance between both datasets is 2.93e-05


Flowthrough_damr_serial_accelerateMaxwellianBoundaries
proton/vg_rho
The absolute 0-distance between both datasets is 13.1
The relative 0-distance between both datasets is 4.39e-06
proton/vg_v
The absolute 0-distance between both datasets is 0.38
The relative 0-distance between both datasets is 1.63e-06
--meshname=fsgrid fg_b
The absolute 0-distance between both datasets is 2.9e-14
The relative 0-distance between both datasets is 2.61e-05
--meshname=fsgrid fg_e
The absolute 0-distance between both datasets is 6.53e-10
The relative 0-distance between both datasets is 7.68e-05

Which is fascinating as I was ascribing at least part of these diffs (compare the 16x16 CI values above) to the "well-known" AMR pencil summation + threading. 🤔

@ykempf
Copy link
Contributor Author

ykempf commented Aug 19, 2025

Here is the vg_rho diff data for a full run of FLowthrough_damr run on single task and single thread, with the base case and the accelerated Maxwellian boundary case, diffing the #1099 branch vs dev.

image

@ykempf
Copy link
Contributor Author

ykempf commented Aug 19, 2025

Now with the base case diff run on 16 x 16 threads x ranks – the errors are in the same ballpark. Note however that the last 2 files are not available as they end up with a different AMR grid (a known feature of this test at times...).

image

@ykempf
Copy link
Contributor Author

ykempf commented Aug 19, 2025

I'd still advocate for activating the Maxwellian boundary acceleration as suggested in this branch given it alleviated some spurious diffs we can do without when working on some branch.

@ykempf
Copy link
Contributor Author

ykempf commented Aug 19, 2025

Fell again into that rabbit hole.

  • lowering the phase-space density threshold by 2 orders of magnitude
  • disabling coarsening
  • disabling the randomised order of acceleration
  • not using Agner
  • using short pencils
  • compiling with O0
  • disabling jemalloc
  • disabling fsgrid filtering

all have little to no effect on the errors/diffs appearing in Flowthrough_damr. Add to that the fact running serially yields the same diffs and I am still utterly baffled. I can't help but think there must be some subtle use of uninitialised memory somewhere, or some incorrect pencil neighbor? pointer? VDF? use that's not quite wrong but not quite right either, and unrelated to all of the parameters tested above, notably OpenMP threading and MPI/ghosts.

@ykempf
Copy link
Contributor Author

ykempf commented Aug 20, 2025

All right in my current test setup using Flowthrough_damr I see diffs (as have been commonly observed in the last couple of years), with none of the above making much difference. What I do observe is that the diffs are the same/very similar no matter what LB algorithm I use, if I change the LB frequency but retain the AMR at the same times, and especially when run serially.

This means that I suggest to activate this Maxwellian acceleration for Flowthrough_amr as it seemingly "fixes" some of the diffs we were disliking so far, but I leave the remaining investigation of the cause of the Flowthrough_damr diffs to a later stage/other investigator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants