Epicure bsc gpu by loganoz · Pull Request #8 · loganoz/horses3d-gpu

loganoz · 2025-10-30T16:34:11Z

After today's meeting, I thought that this might be a good idea. I took epicure_bsc_gpu branch directly from their fork and brought it here, without accepting the pull request to Gerasimos_gpu branch in the main folder.

Jerryntk · 2025-10-30T16:43:18Z

In the split form templates please check the consistency of the signs and the initialisation of Qdot = 0 compared to the standard code. To make everything compatible for the mu solver, we had to adjust some integral operations. This is really important. The changes made to integrate the mu solver lead to the aforementioned changes in the ns solver.

loganoz · 2025-10-30T16:46:17Z

I think we should have splitforms test cases working before accepting this. Currently only standard form is checked.

Jerryntk · 2025-10-30T16:48:37Z

I would suggest to merge first the mesh related changes and on a second step the split form changes. I was holding them off because of the mu integration.

loganoz · 2025-10-30T16:49:03Z

You think mesh related are safe then?

Jerryntk · 2025-10-30T16:49:56Z

I havent checked them all. We need to check it and understand the changes, aside from verification through testing.

loganoz · 2025-10-30T16:51:43Z

100% agree

Jerryntk · 2025-10-30T16:52:11Z

I propose that at least one from the team should check it thoroughly to be aware exactly of what the changed code does.

Copilot

Pull Request Overview

This PR implements performance optimizations across several computational mesh processing routines, introducing algorithmic improvements, refactoring duplicated code through templates, and adding GPU acceleration support via OpenACC directives.

Optimizes node lookup in HOPR mesh reading by replacing O(n) linear search with O(1) hash map lookup
Refactors mesh partitioning to use mask-based node uniqueness detection instead of nested loops
Consolidates duplicated split volumetric contribution functions into a single template with preprocessor macros
Optimizes metric computation by reducing array dimensions and introducing local accumulator variables
Adds OpenACC directives for GPU acceleration of wall distance computations

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
RiemannSolvers_NS.f90	Exports two-point flux functions for use in split form template
Read_HDF5Mesh_HOPR.f90	Introduces reverse lookup map for O(1) node deduplication
MeshPartitioning.f90	Replaces nested loop node search with mask-based approach
MappedGeometry.f90	Refactors metric computation with optimized array dimensions and local accumulators
HexMesh.f90	Adds OpenACC acceleration for wall distance calculations
split_template.f90	New template file for generating specialized split form subroutines
SpatialDiscretization.f90	Replaces monolithic function with template-based approach using preprocessor directives

Comments suppressed due to low confidence (1)

Solver/src/libs/mesh/Read_HDF5Mesh_HOPR.f90:1238

The signature of 'InitNodeMap' is missing the 'NodeToHOPRMap' parameter that is being allocated within the subroutine body at line 1243. This parameter must be added to the subroutine signature with intent(inout) to match the usage in 'AddToNodeMap' and 'FinishNodeMap'.

   subroutine InitNodeMap (TempNodes , HOPRNodeMap, nUniqueNodes)
      implicit none
      !--------------------------------------------
      real(kind=RP), allocatable, intent(inout) :: TempNodes(:,:)
      integer      , allocatable, intent(inout) :: HOPRNodeMap(:)
      integer                   , intent(in)    :: nUniqueNodes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-03T07:19:45Z

Solver/src/libs/mesh/Read_HDF5Mesh_HOPR.f90

      integer, allocatable       :: HOPRNodeMap(:)       ! Map from the global node index of HORSES3D to the global node index of HOPR
      real(kind=RP), allocatable :: TempNodes(:,:)       ! Nodes read from file to be exported to self % nodes
      logical                    :: CurveCondition
+      integer, allocatable       :: NodeToHOPRMap(:)      


The variable 'NodeToHOPRMap' is declared locally in 'ConstructMesh_FromHDF5File_' but is used in module-level subroutines 'InitNodeMap', 'AddToNodeMap', and 'FinishNodeMap'. It should either be declared as a module-level variable or passed explicitly as a parameter to these subroutines. The current implementation will not compile because the variable is out of scope.

Copilot · 2025-11-03T07:19:45Z

Solver/src/libs/mesh/HexMesh.f90

+                        self % faces(eID) % geom % dWall(i, j) = sqrt(minimumDistance)
+                     end do
+                     !$acc end parallel loop
+                     !$acc update self(self % faces(eID) % geom % dWall(:j)) async(j)


Syntax error in OpenACC update directive: ':j' should be ':,j' to specify the array slice correctly. The current syntax is missing the comma separator for the array dimensions.

Suggested change

!$acc update self(self % faces(eID) % geom % dWall(:j)) async(j)

!$acc update self(self % faces(eID) % geom % dWall(:,j)) async(j)

Copilot · 2025-11-03T07:19:46Z

Solver/src/libs/mesh/MappedGeometry.f90

         real(kind=RP)  :: x(3)
+
+         real(kind=RP)  :: jGradXi_local(NDIM), jGradEta_local(NDIM), jGradZeta_local(NDIM), jacobian_local
+         real(kind=RP)  :: auxgrad_l1(NDIM), auxgrad_l2(NDIM), auxgrad_l3(NDIM)


The variables 'auxgrad_l1', 'auxgrad_l2', and 'auxgrad_l3' are declared but never used in the subroutine. These should be removed to avoid confusion and reduce memory allocation.

Suggested change

real(kind=RP) :: auxgrad_l1(NDIM), auxgrad_l2(NDIM), auxgrad_l3(NDIM)

Copilot · 2025-11-03T07:19:46Z

Solver/src/NavierStokesSolver/split_template.f90

+                      Q_acc = Q_acc + NodalStorage(mesh % elements(eID) % Nxyz(1)) % hatD(i,l) * mesh % elements(eID) % storage % contravariantFlux(eq,l,j,k,IX) + &
+                                      NodalStorage(mesh % elements(eID) % Nxyz(2)) % hatD(j,l) * mesh % elements(eID) % storage % contravariantFlux(eq,i,l,k,IY) + &
+                                      NodalStorage(mesh % elements(eID) % Nxyz(3)) % hatD(k,l) * mesh % elements(eID) % storage % contravariantFlux(eq,i,j,l,IZ)
+                   end do


Loop index mismatch: the loop variable 'l' iterates only up to 'mesh % elements(eID) % Nxyz(1)', but it's used to index contributions from all three dimensions (hatD for Ny and Nz as well). The contributions for IY and IZ directions should use separate loops with upper bounds Nxyz(2) and Nxyz(3) respectively, or the loop bound should be max(Nxyz(1), Nxyz(2), Nxyz(3)).

Suggested change

Q_acc = Q_acc + NodalStorage(mesh % elements(eID) % Nxyz(1)) % hatD(i,l) * mesh % elements(eID) % storage % contravariantFlux(eq,l,j,k,IX) + &

NodalStorage(mesh % elements(eID) % Nxyz(2)) % hatD(j,l) * mesh % elements(eID) % storage % contravariantFlux(eq,i,l,k,IY) + &

NodalStorage(mesh % elements(eID) % Nxyz(3)) % hatD(k,l) * mesh % elements(eID) % storage % contravariantFlux(eq,i,j,l,IZ)

end do

Q_acc = Q_acc + NodalStorage(mesh % elements(eID) % Nxyz(1)) % hatD(i,l) * mesh % elements(eID) % storage % contravariantFlux(eq,l,j,k,IX)

end do

!$acc loop seq

do l = 0, mesh % elements(eID) % Nxyz(2)

Q_acc = Q_acc + NodalStorage(mesh % elements(eID) % Nxyz(2)) % hatD(j,l) * mesh % elements(eID) % storage % contravariantFlux(eq,i,l,k,IY)

end do

!$acc loop seq

do l = 0, mesh % elements(eID) % Nxyz(3)

Q_acc = Q_acc + NodalStorage(mesh % elements(eID) % Nxyz(3)) % hatD(k,l) * mesh % elements(eID) % storage % contravariantFlux(eq,i,j,l,IZ)

end do

end do

…HOPR hdf5 meshes Co-authored-by: zalbanob <171132623+zalbanob@users.noreply.github.com> Co-authored-by: Gonzalo Rubio <g.rubio@upm.es>

zalbanob and others added 2 commits April 20, 2025 02:47

Optimizations

5057d1e

Added HOPR Optimization

dc8c4fe

loganoz requested review from AbbBallout, Dhueper, Jerryntk, hmarbona and oscarmarino October 30, 2025 16:34

loganoz requested a review from Copilot November 3, 2025 07:17

Copilot AI reviewed Nov 3, 2025

View reviewed changes

amrueda added a commit that referenced this pull request Nov 6, 2025

Cherry-picked changes made by Epicure BSC (#8) that are relevant for …

9e360c3

…HOPR hdf5 meshes Co-authored-by: zalbanob <171132623+zalbanob@users.noreply.github.com> Co-authored-by: Gonzalo Rubio <g.rubio@upm.es>

amrueda added a commit that referenced this pull request Nov 6, 2025

Attempt 2 to fix NodeToHOPRMap problem introduced in #8

adb05a8

amrueda added a commit that referenced this pull request Nov 6, 2025

Attempt 3 to fix NodeToHOPRMap problem introduced in #8

ba71f90

amrueda mentioned this pull request Nov 6, 2025

Epicure optimizations for mesh read-in #22

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epicure bsc gpu#8

Epicure bsc gpu#8
loganoz wants to merge 2 commits intomainfrom
epicure_bsc_gpu

loganoz commented Oct 30, 2025

Uh oh!

Jerryntk commented Oct 30, 2025

Uh oh!

loganoz commented Oct 30, 2025

Uh oh!

Jerryntk commented Oct 30, 2025

Uh oh!

loganoz commented Oct 30, 2025

Uh oh!

Jerryntk commented Oct 30, 2025

Uh oh!

loganoz commented Oct 30, 2025

Uh oh!

Jerryntk commented Oct 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 3, 2025

Uh oh!

Copilot AI Nov 3, 2025

Uh oh!

Copilot AI Nov 3, 2025

Uh oh!

Copilot AI Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	!$acc update self(self % faces(eID) % geom % dWall(:j)) async(j)
	!$acc update self(self % faces(eID) % geom % dWall(:,j)) async(j)

-                      Q_acc = Q_acc + NodalStorage(mesh % elements(eID) % Nxyz(1)) % hatD(i,l) * mesh % elements(eID) % storage % contravariantFlux(eq,l,j,k,IX) + &
-                                      NodalStorage(mesh % elements(eID) % Nxyz(2)) % hatD(j,l) * mesh % elements(eID) % storage % contravariantFlux(eq,i,l,k,IY) + &
-                                      NodalStorage(mesh % elements(eID) % Nxyz(3)) % hatD(k,l) * mesh % elements(eID) % storage % contravariantFlux(eq,i,j,l,IZ)
-                   end do
+                      Q_acc = Q_acc + NodalStorage(mesh % elements(eID) % Nxyz(1)) % hatD(i,l) * mesh % elements(eID) % storage % contravariantFlux(eq,l,j,k,IX)
+                   end do
+                   !$acc loop seq
+                   do l = 0, mesh % elements(eID) % Nxyz(2)
+                      Q_acc = Q_acc + NodalStorage(mesh % elements(eID) % Nxyz(2)) % hatD(j,l) * mesh % elements(eID) % storage % contravariantFlux(eq,i,l,k,IY)
+                   end do
+                   !$acc loop seq
+                   do l = 0, mesh % elements(eID) % Nxyz(3)
+                      Q_acc = Q_acc + NodalStorage(mesh % elements(eID) % Nxyz(3)) % hatD(k,l) * mesh % elements(eID) % storage % contravariantFlux(eq,i,j,l,IZ)
+                   end do
+                   end do

Conversation

loganoz commented Oct 30, 2025

Uh oh!

Jerryntk commented Oct 30, 2025

Uh oh!

loganoz commented Oct 30, 2025

Uh oh!

Jerryntk commented Oct 30, 2025

Uh oh!

loganoz commented Oct 30, 2025

Uh oh!

Jerryntk commented Oct 30, 2025

Uh oh!

loganoz commented Oct 30, 2025

Uh oh!

Jerryntk commented Oct 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants