Conversation
|
In the split form templates please check the consistency of the signs and the initialisation of Qdot = 0 compared to the standard code. To make everything compatible for the mu solver, we had to adjust some integral operations. This is really important. The changes made to integrate the mu solver lead to the aforementioned changes in the ns solver. |
|
I think we should have splitforms test cases working before accepting this. Currently only standard form is checked. |
|
I would suggest to merge first the mesh related changes and on a second step the split form changes. I was holding them off because of the mu integration. |
|
You think mesh related are safe then? |
|
I havent checked them all. We need to check it and understand the changes, aside from verification through testing. |
|
100% agree |
|
I propose that at least one from the team should check it thoroughly to be aware exactly of what the changed code does. |
There was a problem hiding this comment.
Pull Request Overview
This PR implements performance optimizations across several computational mesh processing routines, introducing algorithmic improvements, refactoring duplicated code through templates, and adding GPU acceleration support via OpenACC directives.
- Optimizes node lookup in HOPR mesh reading by replacing O(n) linear search with O(1) hash map lookup
- Refactors mesh partitioning to use mask-based node uniqueness detection instead of nested loops
- Consolidates duplicated split volumetric contribution functions into a single template with preprocessor macros
- Optimizes metric computation by reducing array dimensions and introducing local accumulator variables
- Adds OpenACC directives for GPU acceleration of wall distance computations
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| RiemannSolvers_NS.f90 | Exports two-point flux functions for use in split form template |
| Read_HDF5Mesh_HOPR.f90 | Introduces reverse lookup map for O(1) node deduplication |
| MeshPartitioning.f90 | Replaces nested loop node search with mask-based approach |
| MappedGeometry.f90 | Refactors metric computation with optimized array dimensions and local accumulators |
| HexMesh.f90 | Adds OpenACC acceleration for wall distance calculations |
| split_template.f90 | New template file for generating specialized split form subroutines |
| SpatialDiscretization.f90 | Replaces monolithic function with template-based approach using preprocessor directives |
Comments suppressed due to low confidence (1)
Solver/src/libs/mesh/Read_HDF5Mesh_HOPR.f90:1238
- The signature of 'InitNodeMap' is missing the 'NodeToHOPRMap' parameter that is being allocated within the subroutine body at line 1243. This parameter must be added to the subroutine signature with intent(inout) to match the usage in 'AddToNodeMap' and 'FinishNodeMap'.
subroutine InitNodeMap (TempNodes , HOPRNodeMap, nUniqueNodes)
implicit none
!--------------------------------------------
real(kind=RP), allocatable, intent(inout) :: TempNodes(:,:)
integer , allocatable, intent(inout) :: HOPRNodeMap(:)
integer , intent(in) :: nUniqueNodes
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| integer, allocatable :: HOPRNodeMap(:) ! Map from the global node index of HORSES3D to the global node index of HOPR | ||
| real(kind=RP), allocatable :: TempNodes(:,:) ! Nodes read from file to be exported to self % nodes | ||
| logical :: CurveCondition | ||
| integer, allocatable :: NodeToHOPRMap(:) |
There was a problem hiding this comment.
The variable 'NodeToHOPRMap' is declared locally in 'ConstructMesh_FromHDF5File_' but is used in module-level subroutines 'InitNodeMap', 'AddToNodeMap', and 'FinishNodeMap'. It should either be declared as a module-level variable or passed explicitly as a parameter to these subroutines. The current implementation will not compile because the variable is out of scope.
| self % faces(eID) % geom % dWall(i, j) = sqrt(minimumDistance) | ||
| end do | ||
| !$acc end parallel loop | ||
| !$acc update self(self % faces(eID) % geom % dWall(:j)) async(j) |
There was a problem hiding this comment.
Syntax error in OpenACC update directive: ':j' should be ':,j' to specify the array slice correctly. The current syntax is missing the comma separator for the array dimensions.
| !$acc update self(self % faces(eID) % geom % dWall(:j)) async(j) | |
| !$acc update self(self % faces(eID) % geom % dWall(:,j)) async(j) |
| real(kind=RP) :: x(3) | ||
|
|
||
| real(kind=RP) :: jGradXi_local(NDIM), jGradEta_local(NDIM), jGradZeta_local(NDIM), jacobian_local | ||
| real(kind=RP) :: auxgrad_l1(NDIM), auxgrad_l2(NDIM), auxgrad_l3(NDIM) |
There was a problem hiding this comment.
The variables 'auxgrad_l1', 'auxgrad_l2', and 'auxgrad_l3' are declared but never used in the subroutine. These should be removed to avoid confusion and reduce memory allocation.
| real(kind=RP) :: auxgrad_l1(NDIM), auxgrad_l2(NDIM), auxgrad_l3(NDIM) |
| Q_acc = Q_acc + NodalStorage(mesh % elements(eID) % Nxyz(1)) % hatD(i,l) * mesh % elements(eID) % storage % contravariantFlux(eq,l,j,k,IX) + & | ||
| NodalStorage(mesh % elements(eID) % Nxyz(2)) % hatD(j,l) * mesh % elements(eID) % storage % contravariantFlux(eq,i,l,k,IY) + & | ||
| NodalStorage(mesh % elements(eID) % Nxyz(3)) % hatD(k,l) * mesh % elements(eID) % storage % contravariantFlux(eq,i,j,l,IZ) | ||
| end do |
There was a problem hiding this comment.
Loop index mismatch: the loop variable 'l' iterates only up to 'mesh % elements(eID) % Nxyz(1)', but it's used to index contributions from all three dimensions (hatD for Ny and Nz as well). The contributions for IY and IZ directions should use separate loops with upper bounds Nxyz(2) and Nxyz(3) respectively, or the loop bound should be max(Nxyz(1), Nxyz(2), Nxyz(3)).
| Q_acc = Q_acc + NodalStorage(mesh % elements(eID) % Nxyz(1)) % hatD(i,l) * mesh % elements(eID) % storage % contravariantFlux(eq,l,j,k,IX) + & | |
| NodalStorage(mesh % elements(eID) % Nxyz(2)) % hatD(j,l) * mesh % elements(eID) % storage % contravariantFlux(eq,i,l,k,IY) + & | |
| NodalStorage(mesh % elements(eID) % Nxyz(3)) % hatD(k,l) * mesh % elements(eID) % storage % contravariantFlux(eq,i,j,l,IZ) | |
| end do | |
| Q_acc = Q_acc + NodalStorage(mesh % elements(eID) % Nxyz(1)) % hatD(i,l) * mesh % elements(eID) % storage % contravariantFlux(eq,l,j,k,IX) | |
| end do | |
| !$acc loop seq | |
| do l = 0, mesh % elements(eID) % Nxyz(2) | |
| Q_acc = Q_acc + NodalStorage(mesh % elements(eID) % Nxyz(2)) % hatD(j,l) * mesh % elements(eID) % storage % contravariantFlux(eq,i,l,k,IY) | |
| end do | |
| !$acc loop seq | |
| do l = 0, mesh % elements(eID) % Nxyz(3) | |
| Q_acc = Q_acc + NodalStorage(mesh % elements(eID) % Nxyz(3)) % hatD(k,l) * mesh % elements(eID) % storage % contravariantFlux(eq,i,j,l,IZ) | |
| end do | |
| end do |
…HOPR hdf5 meshes Co-authored-by: zalbanob <171132623+zalbanob@users.noreply.github.com> Co-authored-by: Gonzalo Rubio <g.rubio@upm.es>
After today's meeting, I thought that this might be a good idea. I took epicure_bsc_gpu branch directly from their fork and brought it here, without accepting the pull request to Gerasimos_gpu branch in the main folder.