Skip to content

Conversation

volyrique
Copy link
Contributor

HPL spends the majority of its execution time inside the BLAS implementation, so these changes affect mainly slower processors such as in-order ones.

Note that link-time optimization (i.e. the -flto option) increases both build time and memory consumption during linking by a non-trivial amount. As a reminder, the -march=native parameter behaves differently on AArch64 and x86-64, for example, especially with older compilers such as GCC up to and including version 14, so ideally we would combine it with -mtune=native as futureproofing. However, in my experiments it didn't lead to any further significant performance difference, while all 20 runs that I tried failed the residual check, so I decided to omit it.

I benchmarked my changes on a Radxa Orion O6 board by doing 20 runs with the Qs parameter set to 12 and blis_configure_options - to cortexa57. Here are my results:

Revision Successful runs Median Gflops Standard error Minimum Gflops Maximum Gflops
2c2d455 7 88.03 0.22 87.15 89.02
My changes 10 89.52 0.20 88.46 90.49

In other words, an approximately 1.69% improvement. For comparison, on an AMD Ryzen 9 5900X-based machine with 64 GiB RAM there was no significant difference.

HPL spends the majority of its execution time inside the BLAS
implementation, so these changes affect mainly slower processors
such as in-order ones.

Signed-off-by: Anton Kirilov <[email protected]>
@volyrique
Copy link
Contributor Author

The HPL host seems to be intermittently inaccessible (I had the same issue locally), so the CI check failed, and I can't retrigger it.

@geerlingguy geerlingguy merged commit 41fce33 into geerlingguy:master Sep 17, 2025
3 of 4 checks passed
@geerlingguy
Copy link
Owner

@volyrique - Thanks! Looks like the server is stable now, at least. Merged the changes and please feel free to make any other suggestions, as I'm far from an expert on these clustering tools!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants