Fix/geometry scheme proc boundary by HendriceH · Pull Request #528 · exasim-project/NeoN

HendriceH · 2026-05-29T20:23:14Z

Motivation

WHAT is it, WHY it is needed. 😘

github-actions · 2026-05-29T20:25:23Z

Thank you for your PR, here are some useful tips:

If this is your first PR, please add yourself to the AUTHORS.md file
Use labels to change the workflow behavior. Set auto-fix to automatically apply fixes, full-ci for a comprehensive set of tests and sanitizer runs, ready-for-review to show that you're done and need a review.
I deployed a test documentation to https://exasim-project.com/NeoN/Build_PR_528
For full-ci builds benchmark results are available at https://github.com/exasim-project/NeoFOAM-BenchmarkData/528
Also check if the GPU pipelines on https://gitlab-ce.lrz.de/greole/neon/-/jobs are successful.

Copilot

Pull request overview

This PR fixes and expands finite-volume geometry scheme handling around processor boundaries, especially for distributed meshes and face-normal-gradient consumers.

Changes:

Adds BasicGeometryScheme regression tests, including MPI processor-boundary coverage.
Updates geometry scheme calculations for deltaCoeffs, nonOrthDeltaCoeffs, and correction vectors, including MPI neighbour exchanges.
Aligns uncorrected face-normal gradients with orthogonal deltaCoeffs and adjusts distributed tests to CPU-only execution.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`test/finiteVolume/CMakeLists.txt`	Adds stencil tests to the finite-volume test suite.
`test/finiteVolume/cellCentred/stencil/CMakeLists.txt`	Defines serial and MPI geometry-scheme tests.
`test/finiteVolume/cellCentred/stencil/basicGeometryScheme.cpp`	Adds analytical serial geometry-scheme regression tests.
`test/finiteVolume/cellCentred/stencil/basicGeometryScheme_distributed.cpp`	Adds MPI processor-boundary geometry checks.
`test/finiteVolume/cellCentred/auxiliary/coNum_distributed.cpp`	Restricts distributed Courant test to CPU executor.
`test/distributed/partitioning.cpp`	Restricts distributed partitioning test to CPU executor.
`test/distributed/operator.cpp`	Restricts distributed operator test to CPU executor.
`src/mesh/unstructured/distributedUnstructuredMesh.cpp`	Places partitioned 1D mesh slabs in global coordinates.
`src/finiteVolume/cellCentred/stencil/geometryScheme.cpp`	Passes precomputed non-orthogonal coefficients into correction-vector updates.
`src/finiteVolume/cellCentred/stencil/basicGeometryScheme.cpp`	Updates geometry coefficient and correction-vector producer logic, including MPI exchanges.
`src/finiteVolume/cellCentred/faceNormalGradient/uncorrected.cpp`	Switches uncorrected gradients to use orthogonal `deltaCoeffs`.
`src/finiteVolume/cellCentred/faceNormalGradient/limitedCorrected.cpp`	Adds Vec3 processor-boundary parity and zeros correction boundary data.
`src/finiteVolume/cellCentred/faceNormalGradient/corrected.cpp`	Adds Vec3 processor-boundary parity and zeros correction boundary data.
`include/NeoN/finiteVolume/cellCentred/stencil/geometryScheme.hpp`	Updates geometry scheme factory API and coefficient documentation.
`include/NeoN/finiteVolume/cellCentred/stencil/basicGeometryScheme.hpp`	Documents BasicGeometryScheme assumptions and updated API.
`include/NeoN/finiteVolume/cellCentred/faceNormalGradient/uncorrected.hpp`	Exposes orthogonal `deltaCoeffs` from uncorrected gradients.
`include/NeoN/finiteVolume/cellCentred/faceNormalGradient/faceNormalGradient.hpp`	Documents processor-boundary face-normal-gradient semantics.

Comments suppressed due to low confidence (1)

src/finiteVolume/cellCentred/stencil/basicGeometryScheme.cpp:431

Processor-boundary nonOrthDeltaCoeffs no longer applies the same over-relaxed clamp used for internal and physical-boundary faces. On a highly skewed processor face this can divide by a very small projected distance even though nonOrthDeltaClamp * |Cnei - Cown| should bound the denominator, so the MPI path can produce much larger coefficients than the serial/internal path for the same geometry.

        const auto dNei = exchangeProcOwnerDistance(exec, mesh_);
        const auto dNeiView = dNei.view();
        parallelFor(
            exec,
            {0, nProcBoundaryFaces},
            NEON_LAMBDA(const localIdx procFacei) {
                const localIdx bfi = nBoundaryFaces + procFacei;
                const Vec3 n = (1.0 / bFaceAreas[bfi]) * bFaceNormals[bfi];
                const Vec3 co = cellCenters[surfFaceCells[bfi]];
                const scalar dOwn = std::abs(n & (bFaceCenters[bfi] - co));
                nonOrthDeltaCoeffB[bfi] =
                    1.0 / std::max(dOwn + dNeiView[procFacei], scalar(ROOTVSMALL));

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+ *   - processor-boundary deltaCoeffs/nonOrthDeltaCoeffs use the face-normal-projected
+ *     owner+neighbour distance (exact on orthogonal proc faces); the non-orthogonal correction
+ *     is not applied at processor faces (see FaceNormalGradient docs).


+    // Processor-boundary semantics (review N3/N4): at processor faces every scheme
+    // (uncorrected, corrected, limitedCorrected) applies the orthogonal part only,
+    //   snGrad = deltaCoeffs * (phiNei - phiOwn),
+    // and the scalar and Vec3 code paths are kept in parity. The non-orthogonal
+    // correction across rank boundaries is NOT applied: the producer
+    // (BasicGeometryScheme) leaves processor-face correction vectors at zero, so
+    // producer and consumer are self-consistent. Recovering the proc non-orth
+    // correction needs the neighbour cell centre exchanged and is deferred.


+        // Processor-boundary parity with the scalar path (review N3). Applies the
+        // orthogonal part only — the non-orthogonal correction at proc faces is
+        // deferred (review N4): the producer leaves proc-corrVec at zero, so this
+        // is self-consistent. See FaceNormalGradient header for the contract.


+        // Processor-boundary parity with the scalar path (review N3). Orthogonal
+        // part only; non-orth proc correction deferred (review N4). The limiter
+        // applies only to the internal correction, so proc faces reduce to the
+        // uncorrected form exactly as in the scalar path.


Adds a 3D-uniform-mesh analytical regression test for BasicGeometryScheme, the first direct stencil test on a non-1D mesh. Asserts weights (0.5 internal, 1.0 boundary), deltaCoeffs (1/h internal, 2/h boundary), nonOrthDeltaCoeffs (== deltaCoeffs on an orthogonal mesh), and zero non-orth correction. Closes the highest-leverage test gap from REVIEW_basicGeometrySchemes.md (T1 / H5): every prior stencil test ran on create1DUniformMesh, the unique mesh on which geometric-scheme bugs are invisible. Refs: T1, H5

…ed, safety + dedup Producer-side remediation from REVIEW_basicGeometrySchemes.md: - N1/N2: uncorrected snGrad now uses the orthogonal deltaCoeffs (1/|d|) for both its value (computeFaceNormalGrad) and its deltaCoeffs() accessor, matching OpenFOAM's uncorrectedSnGrad. This wires the previously-dead GeometryScheme::deltaCoeffs() to a real consumer. No-op on orthogonal/1D meshes. - GEOM-03 (H1-remainder): updateDeltaCoeffs now writes processor-boundary entries (projected dOwn+dNei; exact on orthogonal proc faces, documented limitation on skewed). Previously left at zero -> zero diffusive flux across ranks. - H2: corrected the nonOrthDeltaCoeff doxygen formula to 1/(faceNormal . d). - H3: ROOTVSMALL floor added to the non-orth std::max(orthoDist, 0.05|d|) guards. - M3: updateNonOrthCorrectionVec3s reads the precomputed nonOrthDeltaCoeffs field. - M4: removed the dead fill(nonOrthDeltaCoeffs, 0) before a full overwrite. - M2: removed [[maybe_unused]] from parameters that are used. Refs: N1, N2, H1, H2, H3, M2, M3, M4

…cit contracts snGrad consumer remediation from REVIEW_basicGeometrySchemes.md: - N3: the Vec3 paths of computeCorrectedFaceNormalGrad and computeLimitedCorrectedFaceNormalGrad now contain the processor-boundary loop their scalar siblings already had. snGrad of a VolumeField<Vec3> (e.g. U) now writes proc-face entries instead of leaving them zero under MPI. - N4: documented the processor-boundary contract in the FaceNormalGradient interface — all schemes apply the orthogonal part only at proc faces; the non-orth proc correction is deferred and the producer leaves proc-corrVec zero, so producer and consumer are self-consistent. - N5: computeCorrectionTerm / computeLimitedCorrectionTerm now explicitly zero corrField.boundaryData() (both scalar and Vec3) rather than relying on the field's zero-init, so a future Laplacian RHS change cannot read undefined data. - N6: the producer zeroes nonOrthCorrectionVec3s over the proc range too, making the 'corrVec == 0 on all boundary faces' contract explicit. Refs: N3, N4, N5, N6

…U-only MPI - T2: basicGeometryScheme_distributed (MPI_SIZE 3) asserts processor-boundary weights (0.5), deltaCoeffs (1/h), nonOrthDeltaCoeffs (1/h) and zero corrVec on a partitioned 1D mesh. Fails against the pre-fix code (proc deltaCoeffs == 0) and guards the GEOM-03 / H1 remainder. - T3: sheared (non-orthogonal) cube test asserting deltaCoeffs and nonOrthDeltaCoeffs genuinely diverge and the non-orth correction is non-zero (review N2). - T5: identity guard that 'uncorrected' exposes the orthogonal deltaCoeffs, not nonOrthDeltaCoeffs (pins the N1/N2 decision). - One GPU on this machine: every distributed test now uses CPUExecutor only. Retrofitted coNum_distributed, distributed/operator and distributed/partitioning off allAvailableExecutor (which included the GPU). Refs: T2, T3, T5, H1

Cleanup tier of REVIEW_basicGeometrySchemes.md (+ predecessor M/L items): - H4: exchangeProcOwnerDistance computes the owner projected distance on the device and copies only the nProcFaces exchanged scalars to the host, instead of a full mesh.cellCenters() D->H copy (GB-scale on industrial meshes) plus four boundary- array host copies. - M5: typed mpi::isend<scalar>/irecv<scalar> with a dedicated tag (not raw-byte char sends on tag 0). - M6: the exchange result is allocated directly on the executor (no SerialExecutor round-trip); L7: early-return when there are no processor faces. - L1: the 0.05 over-relaxed clamp is now the named constant nonOrthDeltaClamp. - L2: updateNonOrthDeltaCoeffs boundary loop indexes the compressed boundary mesh, matching updateDeltaCoeffs/updateWeights (avoids OF-full-vs-compressed mismatch). - M7/L6: dropped the dead UnstructuredMesh& parameter from GeometrySchemeFactory. - M8: justified the std::abs on projections in updateWeights. - L4: BasicGeometryScheme + overrides marked final; L8: class-level doxygen. - M1/L3: GeometryScheme::reset() no longer logs a warning every update. The resize(0) of face/cell centres is RETAINED — it is an intentional device-memory optimisation (free geometry the scheme no longer needs), now documented, with the const_cast justified. Refs: H4, M1, M5, M6, M7, M8, L1-L8

…l-centre exchange (v2a) Producer side of the v2 proc-boundary work: - Add exchangeProcNeighbourCellCentre: halo-exchanges the owner cell centre across processor boundaries so each rank knows the neighbouring cell centre Cnei. - updateDeltaCoeffs proc faces now use the exact euclidean 1/|Cnei - Cown| (OpenFOAM's coupled deltaCoeffs) instead of the face-normal-projected distance — exact on non-orthogonal processor faces. Makes uncorrected snGrad / the Laplacian OF-consistent across rank boundaries on skewed meshes. - updateNonOrthCorrectionVec3s now computes the processor-face correction vector (n - delta*nonOrthDeltaCoeff) instead of zeroing it — the producer half of the non-orthogonal proc correction (N4). Non-zero only on non-orthogonal proc faces. - create1DUniformMeshPart now places each rank's slab at its true global position (xOffset = rank*rightBoundary). A faithful decomposition shares one global coordinate frame (as decomposePar produces), which the cross-rank coordinate exchange requires; previously every rank sat at [0, rightBoundary]. Only positions shift, so fluxes/volumes/Courant/gradients/projected distances are unchanged. T2 (now global-coord) asserts the exact proc deltaCoeffs; coNum/operator/partitioning unaffected. Consumer side (applying the proc correction in corrected/limitedCorrected) follows in v2b. Refs: v2a, N4 (producer)

…revert N2) OpenFOAM's uncorrectedSnGrad returns mesh().nonOrthDeltaCoeffs() (1/(n.d)), not the orthogonal deltaCoeffs (1/|d|). The review's N2 premise -- that OF uses 1/|d| -- was verified false against the OF source (uncorrectedSnGrad.H), so the Phase 4 N2 change that rewired 'uncorrected' to deltaCoeffs was a regression on non-orthogonal meshes. This restores NeoN's original OF-faithful behaviour in both the accessor and the kernel. Updates test T5 to assert the corrected wiring: uncorrected.deltaCoeffs() == GeometryScheme::nonOrthDeltaCoeffs() (and distinct from the orthogonal deltaCoeffs). Refs: N2, T5

…mer (N1) After the N2 revert every snGrad scheme (uncorrected/corrected/limitedCorrected) uses nonOrthDeltaCoeffs, leaving GeometryScheme::deltaCoeffs() (orthogonal 1/|d|) without a production consumer. Per the N1 decision it stays computed and exposed for API completeness and potential future consumers; document that explicitly so the dead-looking accessor is not mistaken for a wiring bug. Refs: N1

…rected/limited snGrad (v2b) corrected and limitedCorrected snGrad now apply the full non-orthogonal correction (corrVec . interpolate(grad)) at processor faces, matching OpenFOAM, instead of the orthogonal-only proc treatment. The interpolated cell gradient needs the neighbour cell gradient across the rank boundary, which is halo-exchanged via the new templated processorGradientHalo.hpp (Vec3 + Tensor). Guarded by NF_WITH_MPI_SUPPORT. Serial + NeoN MPI regression green; matches OpenFOAM on serial non-orthogonal meshes. Distributed OF-validation on a non-orthogonal decomposed mesh is pending a pre-existing distributed proc-halo (comm-pattern) fix tracked separately; the NeoFOAM snGrad_distributed test documents and skips that case for now. Refs: N4, v2b

greole · 2026-05-30T08:46:54Z

+        );
+    }
+    if (!requests.empty())
+        MPI_Waitall(static_cast<int>(requests.size()), requests.data(), MPI_STATUSES_IGNORE);


No raw MPI calls here, they should use the corresponding MPI/operator implementation

…cotch) Three compounding defects broke the distributed VolumeField proc halo under scotch / non-X-split decompositions where a rank owns more than one proc patch: - BoundaryData::value() calls waitAll(); setProcBoundaryValue seeded the owner cell through value(), so seeding proc patch N+1 eagerly drained and cleared patch N's still-in-flight exchange -> the 2nd proc patch on each rank never received (halo == owner). Add a non-draining valueNoWait() seed accessor and a single post-loop waitAll() drain in correctBoundaryConditions (batch post-then-wait). A deterministic per-patch MPI tag is added as hardening. - gaussGreenGrad::computeGradTensor (Vec3/Tensor path) was missing the processor-face flux loop the scalar path has -> corrected/limited Vec3 snGrad was wrong at proc-adjacent cells. Add the proc-face contribution. - distributedUnstructuredMesh::computeCommunicationPattern: accumulate sendCounts[targetRank] += / displacements (was assignment) so multiple patches to the same neighbour rank no longer collide. Validated via test_snGrad_distributed on the sheared scotch mesh, 3 ranks, CPU.

…nd docs - Add mpi::waitAll (pointer+count and std::vector overloads) to core/mpi and replace the raw MPI_Waitall calls in processorGradientHalo and basicGeometryScheme with it, so consumer code goes through the NeoN MPI operator wrappers (MPI_Waitall now appears only inside the wrapper itself). - Update the BasicGeometryScheme class documentation to match current behaviour: processor deltaCoeffs are 1/|Cnei-Cown| via the halo-exchanged neighbour cell centre (exact on non-orthogonal processor faces), and the non-orthogonal correction is applied at processor faces. - Reword comments to be framework-neutral: drop OpenFOAM / fvPatch references in favour of general finite-volume descriptions. The NeoN backend does not depend on OpenFOAM. No behaviour change; mpi::waitAll wraps the identical MPI_Waitall. Tests: snGrad_distributed (scotch, 3 ranks) plus serial snGrad/stencils/gaussGreenDiv all pass.

BDF1 and BDF2 boundary kernels now compute the ddt (Rhie-Chow temporal) flux correction on processor faces. - Use compressed mesh.boundaryMesh().faceNormals() indexed by bfi, not the OF-full mesh.faceNormals()[nInternalFaces+bfi] (wrong face when empty/wedge patches are present). - Loop over {0, nBoundaryFaces + nProcBoundaryFaces} to include the processor tail of boundaryData().value(), which was previously left unwritten -> garbage proc-face fluxCorr fed phiHbyA and caused an unbounded Courant runaway on distributed runs. fluxCorr processor patch is a 'calculated' BC (no-op correctBoundaryCondition), so the locally computed proc-tail value survives correctBoundaryConditions(). Adapted for the split internal/boundary SurfaceField storage (post fa4fdf1); not the outdated full-storage form on feat/gpu-distributed. Verified on tutorials/tiltedCube (snappyHexMesh tilted cube + layers, scotch-4): removes the parallel-vs-serial Courant runaway.

setProcBoundaryValue for a SurfaceField wrongly used volume semantics: value[i] = internalVector[faceCells[i]]; internalVector() of a SurfaceField is FACE data (size nInternalFaces) while faceCells[i] is an owner-CELL id, so this read an unrelated internal face's flux and shipped it to the neighbour, corrupting the exchanged proc-tail on any rank with >= 2 processor patches (introduced by 4389a5c as a verbatim copy of the volume processor BC). The surface proc-tail already holds the correct local face value (written by flux()/updateFaceVelocity()/interpolation/constructFrom), so the BC must not touch value_ at all -- only normalise the unused mixed-BC fields (refGrad/valueFraction/refValue). value_ is not accessed, so no pending proc-patch exchange is drained. Verified on tutorials/tiltedCube (serial-CPU vs parallel-CPU oracle, executor drift proven zero): step-1 max Courant now matches serial exactly for scotch and hierarchical cuts (was 1.7-6.2x); np=2 single-patch near-exact. Distributed ctests pass 4/4.

…edBCs rAU/HbyA are built with createExtrapolatedBCs, which assigned an 'extrapolated' BC to every patch including the processor (coupled) patches. An extrapolated processor-tail holds the owner cell value, not the neighbour halo, so linear interpolation of rAU to processor faces degenerated to w*own + (1-w)*own = own. The interpolated rAUf then differed across each shared processor face, making the assembled pressure matrix non-symmetric; CG converged to a wrong pressure with small continuity (distributed neoIcoFoam divergence on cuts with two or more processor patches per rank). createExtrapolatedBCs now assigns the 'processor' BC (neighbour-halo exchange) to the trailing processor patches and 'extrapolated' only to physical patches. No-op in serial. Verified on tiltedCube (np4 scotch): processor-face off-diagonal coupling symmetry 1.14 -> 2e-16; per-cell pressure error versus serial 0.59 -> 0.028 at step 1; serial-vs-parallel trajectory now bounded and decaying. All 10 distributed ctests pass.

greole requested a review from Copilot May 30, 2026 04:35

Copilot started reviewing on behalf of greole May 30, 2026 04:35 View session

Copilot AI reviewed May 30, 2026

View reviewed changes

HendriceH added 9 commits May 30, 2026 09:46

HendriceH force-pushed the fix/geometrySchemeProcBoundary branch from 10b5d5a to 833a8bc Compare May 30, 2026 07:46

greole reviewed May 30, 2026

View reviewed changes

HendriceH added 4 commits May 30, 2026 22:40

HendriceH changed the base branch from enh/distributedGko to develop May 31, 2026 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/geometry scheme proc boundary#528

Fix/geometry scheme proc boundary#528
HendriceH wants to merge 14 commits into
developfrom
fix/geometrySchemeProcBoundary

HendriceH commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

greole May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

HendriceH commented May 29, 2026

Motivation

Uh oh!

github-actions Bot commented May 29, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

greole May 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants