Skip to content

Conversation

@giacomomagni
Copy link
Collaborator

Address #397 using the FHMRUVV tables.

@giacomomagni giacomomagni added the benchmarks Benchmark (or infrastructure) related label Jan 13, 2026
@giacomomagni
Copy link
Collaborator Author

do you want to add it also to CI by default?

@giacomomagni giacomomagni linked an issue Jan 13, 2026 that may be closed by this pull request
@felixhekhorn
Copy link
Contributor

do you want to add it also to CI by default?

I'd say yes - let's see how long it takes

@felixhekhorn
Copy link
Contributor

I'd say yes - let's see how long it takes

okay, so we have the Rust run and also the Python run and we can make a number of observations:

  1. The benchmark should add 4 new EKOs (2 FFNS x 2 SV). The Rust time increases from ~7min to ~11min (~1min/EKO) and Python from ~45min to ~2h (~18min/EKO); I would keep both nevertheless just so we know if something is going wrong. The job is only running now and then ...
  2. I assume the FFNS to be the latter half: why are we off by up to 5% in the singlet sector? this sounds fishy ... in the non-singlet sector we agree better than 1e-3% (sic!). Actually, only in the non-singlet sector we can see that Rust and Python do not agree bit-by-bit but only up to ~1e-7 - instead the singlet error is so big that all displayed digits are identical
  3. for VFNS we need to set matching_order = (2,0), right?
    • on the Rust side this is done implicitly, since the N3LO OMEs are not even translated. Still, we are off by up to 600% in the non-singlet sector and up to 5% in the singlet sector. Surprisingly here we are clearly better in the singlet side, and, e.g., V and T15 are quite off in the small x region
    • on the Python side the misconfiguration matters and we are worse in some places - however, not everywhere and in some cases the bug even seems to work in our favour

@giacomomagni
Copy link
Collaborator Author

giacomomagni commented Jan 15, 2026

  1. The benchmark should add 4 new EKOs (2 FFNS x 2 SV). The Rust time increases from ~7min to ~11min (~1min/EKO) and Python from ~45min to ~2h (~18min/EKO); I would keep both nevertheless just so we know if something is going wrong. The job is only running now and then ...

okay looks good.

  1. I assume the FFNS to be the latter half: why are we off by up to 5% in the singlet sector? this sounds fishy ... in the non-singlet sector we agree better than 1e-3% (sic!). Actually, only in the non-singlet sector we can see that Rust and Python do not agree bit-by-bit but only up to ~1e-7 - instead the singlet error is so big that all displayed digits are identical

I think this is related to these motivations: #484 (comment)

  1. for VFNS we need to set matching_order = (2,0), right?

    • on the Rust side this is done implicitly, since the N3LO OMEs are not even translated. Still, we are off by up to 600% in the non-singlet sector and up to 5% in the singlet sector. Surprisingly here we are clearly better in the singlet side, and, e.g., V and T15 are quite off in the small x region

This looks more a bug, as for rust the disagreement should be of the same order as of FFNS. Maybe I miss copied the table, let me check.

  • on the Python side the misconfiguration matters and we are worse in some places - however, not everywhere and in some cases the bug even seems to work in our favour

yes I should set matching_order = (2,0) explicitly.

EDIT:
the tables seems to be good. The value which looks odd is this 600% difference for the valence in VFNS.

@felixhekhorn
Copy link
Contributor

We need NNPDF/banana#79 to be merged (+tagged+released)

@felixhekhorn
Copy link
Contributor

Okay, benchmarks seem to be back running. The mystery on why they don't match remains ... However, it also seems to be worse with SV than without - at least for some distributions:

$ poetry poe lha -m "n3lo and not sv and vfns"
[...]
─── 
  V  
 ─── 
               x       Q2       eko     eko_error       LHA  percent_error
0   1.000000e-07  10000.0  0.000085  6.211928e-10  0.000151     -43.569049
1   1.000000e-06  10000.0  0.000809  4.122221e-09  0.000910     -11.072806
2   1.000000e-05  10000.0  0.004748  2.396675e-08  0.004734       0.285170
3   1.000000e-04  10000.0  0.022717  4.002072e-08  0.022189       2.378366
4   1.000000e-03  10000.0  0.096559  2.837288e-07  0.095632       0.968525
[...]
 ───── 
  T15  
 ───── 
               x       Q2       eko     eko_error       LHA  percent_error
0   1.000000e-07  10000.0  6.902032  1.431165e-05  6.901966   9.586976e-04
1   1.000000e-06  10000.0  5.174521  1.647792e-05  5.174487   6.659105e-04
2   1.000000e-05  10000.0  3.808481  2.365504e-05  3.808474   1.934408e-04
3   1.000000e-04  10000.0  2.741352  5.529015e-06  2.741350   4.408902e-05
[...]
 ───── 
  T24  
 ───── 
               x       Q2        eko     eko_error        LHA  percent_error
0   1.000000e-07  10000.0  59.510470  2.773462e-04  57.608373       3.301772
1   1.000000e-06  10000.0  31.573497  6.014980e-04  30.965502       1.963459
2   1.000000e-05  10000.0  16.751943  1.344763e-04  16.606857       0.873650

stupid question: are we comparing the right things? i.e. T24|eko can be computed from the table and the rotation is the right one?

@giacomomagni
Copy link
Collaborator Author

I think T24 looks okay for your screenshot. At low-x this is affected by the new splitting functions updates (the ones that cames after the benchmark tables). The 2 surprising points for me are the V, but there one might argue abs error is small, but still I don't have a good explanation why there do not match.

@felixhekhorn
Copy link
Contributor

How should we proceed? we live with it or should we investigate more? The thing is for some parts we can replicate numbers with great precision, so we should understand what happens to the rest - i.e. we are not crazy, there must be a reason, which maybe we can just live with

@giacomomagni
Copy link
Collaborator Author

I think we can live with this? You might want to keep track of this mismatch for the Valence in an issue maybe...
In any case these n3lo tables will need to be updated and if you'll still see a disagreement with the other groups in the valence than one can decide to investigate more...

@felixhekhorn
Copy link
Contributor

this comparison is a mess, but at least it shows only Pgg and Pgq should have changed - but not Pnsv ... (the reference is the last commit in #354) - see also here

In any case these n3lo tables will need to be updated and if you'll still see a disagreement with the other groups in the valence than one can decide to investigate more...

you mean change to the numbers we generate now? but then we know we will disagree with the other as I think you have the "official" set at the moment and we know they are fine ... or you mean the new ones might work just as well, when rotated to the LHA basis?

@giacomomagni
Copy link
Collaborator Author

I meant these LHA numbers from the pepar are no longer the most updated version, so you might expect some discrepancies with the current eko. Now we don't know why there is a discrepancy in valence (as there was no update there), but in any case when one will do again the benchmark exercise one could check if these valence numbers are also up-to-date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmarks Benchmark (or infrastructure) related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add LHA aN3LO to ekomark

3 participants