Fix/geometry scheme proc boundary#528
Open
HendriceH wants to merge 14 commits into
Open
Conversation
|
Thank you for your PR, here are some useful tips:
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes and expands finite-volume geometry scheme handling around processor boundaries, especially for distributed meshes and face-normal-gradient consumers.
Changes:
- Adds BasicGeometryScheme regression tests, including MPI processor-boundary coverage.
- Updates geometry scheme calculations for
deltaCoeffs,nonOrthDeltaCoeffs, and correction vectors, including MPI neighbour exchanges. - Aligns uncorrected face-normal gradients with orthogonal
deltaCoeffsand adjusts distributed tests to CPU-only execution.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
test/finiteVolume/CMakeLists.txt |
Adds stencil tests to the finite-volume test suite. |
test/finiteVolume/cellCentred/stencil/CMakeLists.txt |
Defines serial and MPI geometry-scheme tests. |
test/finiteVolume/cellCentred/stencil/basicGeometryScheme.cpp |
Adds analytical serial geometry-scheme regression tests. |
test/finiteVolume/cellCentred/stencil/basicGeometryScheme_distributed.cpp |
Adds MPI processor-boundary geometry checks. |
test/finiteVolume/cellCentred/auxiliary/coNum_distributed.cpp |
Restricts distributed Courant test to CPU executor. |
test/distributed/partitioning.cpp |
Restricts distributed partitioning test to CPU executor. |
test/distributed/operator.cpp |
Restricts distributed operator test to CPU executor. |
src/mesh/unstructured/distributedUnstructuredMesh.cpp |
Places partitioned 1D mesh slabs in global coordinates. |
src/finiteVolume/cellCentred/stencil/geometryScheme.cpp |
Passes precomputed non-orthogonal coefficients into correction-vector updates. |
src/finiteVolume/cellCentred/stencil/basicGeometryScheme.cpp |
Updates geometry coefficient and correction-vector producer logic, including MPI exchanges. |
src/finiteVolume/cellCentred/faceNormalGradient/uncorrected.cpp |
Switches uncorrected gradients to use orthogonal deltaCoeffs. |
src/finiteVolume/cellCentred/faceNormalGradient/limitedCorrected.cpp |
Adds Vec3 processor-boundary parity and zeros correction boundary data. |
src/finiteVolume/cellCentred/faceNormalGradient/corrected.cpp |
Adds Vec3 processor-boundary parity and zeros correction boundary data. |
include/NeoN/finiteVolume/cellCentred/stencil/geometryScheme.hpp |
Updates geometry scheme factory API and coefficient documentation. |
include/NeoN/finiteVolume/cellCentred/stencil/basicGeometryScheme.hpp |
Documents BasicGeometryScheme assumptions and updated API. |
include/NeoN/finiteVolume/cellCentred/faceNormalGradient/uncorrected.hpp |
Exposes orthogonal deltaCoeffs from uncorrected gradients. |
include/NeoN/finiteVolume/cellCentred/faceNormalGradient/faceNormalGradient.hpp |
Documents processor-boundary face-normal-gradient semantics. |
Comments suppressed due to low confidence (1)
src/finiteVolume/cellCentred/stencil/basicGeometryScheme.cpp:431
- Processor-boundary
nonOrthDeltaCoeffsno longer applies the same over-relaxed clamp used for internal and physical-boundary faces. On a highly skewed processor face this can divide by a very small projected distance even thoughnonOrthDeltaClamp * |Cnei - Cown|should bound the denominator, so the MPI path can produce much larger coefficients than the serial/internal path for the same geometry.
const auto dNei = exchangeProcOwnerDistance(exec, mesh_);
const auto dNeiView = dNei.view();
parallelFor(
exec,
{0, nProcBoundaryFaces},
NEON_LAMBDA(const localIdx procFacei) {
const localIdx bfi = nBoundaryFaces + procFacei;
const Vec3 n = (1.0 / bFaceAreas[bfi]) * bFaceNormals[bfi];
const Vec3 co = cellCenters[surfFaceCells[bfi]];
const scalar dOwn = std::abs(n & (bFaceCenters[bfi] - co));
nonOrthDeltaCoeffB[bfi] =
1.0 / std::max(dOwn + dNeiView[procFacei], scalar(ROOTVSMALL));
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+25
to
+27
| * - processor-boundary deltaCoeffs/nonOrthDeltaCoeffs use the face-normal-projected | ||
| * owner+neighbour distance (exact on orthogonal proc faces); the non-orthogonal correction | ||
| * is not applied at processor faces (see FaceNormalGradient docs). |
Comment on lines
+52
to
+59
| // Processor-boundary semantics (review N3/N4): at processor faces every scheme | ||
| // (uncorrected, corrected, limitedCorrected) applies the orthogonal part only, | ||
| // snGrad = deltaCoeffs * (phiNei - phiOwn), | ||
| // and the scalar and Vec3 code paths are kept in parity. The non-orthogonal | ||
| // correction across rank boundaries is NOT applied: the producer | ||
| // (BasicGeometryScheme) leaves processor-face correction vectors at zero, so | ||
| // producer and consumer are self-consistent. Recovering the proc non-orth | ||
| // correction needs the neighbour cell centre exchanged and is deferred. |
Comment on lines
+157
to
+160
| // Processor-boundary parity with the scalar path (review N3). Applies the | ||
| // orthogonal part only — the non-orthogonal correction at proc faces is | ||
| // deferred (review N4): the producer leaves proc-corrVec at zero, so this | ||
| // is self-consistent. See FaceNormalGradient header for the contract. |
Comment on lines
+181
to
+184
| // Processor-boundary parity with the scalar path (review N3). Orthogonal | ||
| // part only; non-orth proc correction deferred (review N4). The limiter | ||
| // applies only to the internal correction, so proc faces reduce to the | ||
| // uncorrected form exactly as in the scalar path. |
Adds a 3D-uniform-mesh analytical regression test for BasicGeometryScheme, the first direct stencil test on a non-1D mesh. Asserts weights (0.5 internal, 1.0 boundary), deltaCoeffs (1/h internal, 2/h boundary), nonOrthDeltaCoeffs (== deltaCoeffs on an orthogonal mesh), and zero non-orth correction. Closes the highest-leverage test gap from REVIEW_basicGeometrySchemes.md (T1 / H5): every prior stencil test ran on create1DUniformMesh, the unique mesh on which geometric-scheme bugs are invisible. Refs: T1, H5
…ed, safety + dedup Producer-side remediation from REVIEW_basicGeometrySchemes.md: - N1/N2: uncorrected snGrad now uses the orthogonal deltaCoeffs (1/|d|) for both its value (computeFaceNormalGrad) and its deltaCoeffs() accessor, matching OpenFOAM's uncorrectedSnGrad. This wires the previously-dead GeometryScheme::deltaCoeffs() to a real consumer. No-op on orthogonal/1D meshes. - GEOM-03 (H1-remainder): updateDeltaCoeffs now writes processor-boundary entries (projected dOwn+dNei; exact on orthogonal proc faces, documented limitation on skewed). Previously left at zero -> zero diffusive flux across ranks. - H2: corrected the nonOrthDeltaCoeff doxygen formula to 1/(faceNormal . d). - H3: ROOTVSMALL floor added to the non-orth std::max(orthoDist, 0.05|d|) guards. - M3: updateNonOrthCorrectionVec3s reads the precomputed nonOrthDeltaCoeffs field. - M4: removed the dead fill(nonOrthDeltaCoeffs, 0) before a full overwrite. - M2: removed [[maybe_unused]] from parameters that are used. Refs: N1, N2, H1, H2, H3, M2, M3, M4
…cit contracts snGrad consumer remediation from REVIEW_basicGeometrySchemes.md: - N3: the Vec3 paths of computeCorrectedFaceNormalGrad and computeLimitedCorrectedFaceNormalGrad now contain the processor-boundary loop their scalar siblings already had. snGrad of a VolumeField<Vec3> (e.g. U) now writes proc-face entries instead of leaving them zero under MPI. - N4: documented the processor-boundary contract in the FaceNormalGradient interface — all schemes apply the orthogonal part only at proc faces; the non-orth proc correction is deferred and the producer leaves proc-corrVec zero, so producer and consumer are self-consistent. - N5: computeCorrectionTerm / computeLimitedCorrectionTerm now explicitly zero corrField.boundaryData() (both scalar and Vec3) rather than relying on the field's zero-init, so a future Laplacian RHS change cannot read undefined data. - N6: the producer zeroes nonOrthCorrectionVec3s over the proc range too, making the 'corrVec == 0 on all boundary faces' contract explicit. Refs: N3, N4, N5, N6
…U-only MPI - T2: basicGeometryScheme_distributed (MPI_SIZE 3) asserts processor-boundary weights (0.5), deltaCoeffs (1/h), nonOrthDeltaCoeffs (1/h) and zero corrVec on a partitioned 1D mesh. Fails against the pre-fix code (proc deltaCoeffs == 0) and guards the GEOM-03 / H1 remainder. - T3: sheared (non-orthogonal) cube test asserting deltaCoeffs and nonOrthDeltaCoeffs genuinely diverge and the non-orth correction is non-zero (review N2). - T5: identity guard that 'uncorrected' exposes the orthogonal deltaCoeffs, not nonOrthDeltaCoeffs (pins the N1/N2 decision). - One GPU on this machine: every distributed test now uses CPUExecutor only. Retrofitted coNum_distributed, distributed/operator and distributed/partitioning off allAvailableExecutor (which included the GPU). Refs: T2, T3, T5, H1
Cleanup tier of REVIEW_basicGeometrySchemes.md (+ predecessor M/L items): - H4: exchangeProcOwnerDistance computes the owner projected distance on the device and copies only the nProcFaces exchanged scalars to the host, instead of a full mesh.cellCenters() D->H copy (GB-scale on industrial meshes) plus four boundary- array host copies. - M5: typed mpi::isend<scalar>/irecv<scalar> with a dedicated tag (not raw-byte char sends on tag 0). - M6: the exchange result is allocated directly on the executor (no SerialExecutor round-trip); L7: early-return when there are no processor faces. - L1: the 0.05 over-relaxed clamp is now the named constant nonOrthDeltaClamp. - L2: updateNonOrthDeltaCoeffs boundary loop indexes the compressed boundary mesh, matching updateDeltaCoeffs/updateWeights (avoids OF-full-vs-compressed mismatch). - M7/L6: dropped the dead UnstructuredMesh& parameter from GeometrySchemeFactory. - M8: justified the std::abs on projections in updateWeights. - L4: BasicGeometryScheme + overrides marked final; L8: class-level doxygen. - M1/L3: GeometryScheme::reset() no longer logs a warning every update. The resize(0) of face/cell centres is RETAINED — it is an intentional device-memory optimisation (free geometry the scheme no longer needs), now documented, with the const_cast justified. Refs: H4, M1, M5, M6, M7, M8, L1-L8
…l-centre exchange (v2a) Producer side of the v2 proc-boundary work: - Add exchangeProcNeighbourCellCentre: halo-exchanges the owner cell centre across processor boundaries so each rank knows the neighbouring cell centre Cnei. - updateDeltaCoeffs proc faces now use the exact euclidean 1/|Cnei - Cown| (OpenFOAM's coupled deltaCoeffs) instead of the face-normal-projected distance — exact on non-orthogonal processor faces. Makes uncorrected snGrad / the Laplacian OF-consistent across rank boundaries on skewed meshes. - updateNonOrthCorrectionVec3s now computes the processor-face correction vector (n - delta*nonOrthDeltaCoeff) instead of zeroing it — the producer half of the non-orthogonal proc correction (N4). Non-zero only on non-orthogonal proc faces. - create1DUniformMeshPart now places each rank's slab at its true global position (xOffset = rank*rightBoundary). A faithful decomposition shares one global coordinate frame (as decomposePar produces), which the cross-rank coordinate exchange requires; previously every rank sat at [0, rightBoundary]. Only positions shift, so fluxes/volumes/Courant/gradients/projected distances are unchanged. T2 (now global-coord) asserts the exact proc deltaCoeffs; coNum/operator/partitioning unaffected. Consumer side (applying the proc correction in corrected/limitedCorrected) follows in v2b. Refs: v2a, N4 (producer)
…revert N2) OpenFOAM's uncorrectedSnGrad returns mesh().nonOrthDeltaCoeffs() (1/(n.d)), not the orthogonal deltaCoeffs (1/|d|). The review's N2 premise -- that OF uses 1/|d| -- was verified false against the OF source (uncorrectedSnGrad.H), so the Phase 4 N2 change that rewired 'uncorrected' to deltaCoeffs was a regression on non-orthogonal meshes. This restores NeoN's original OF-faithful behaviour in both the accessor and the kernel. Updates test T5 to assert the corrected wiring: uncorrected.deltaCoeffs() == GeometryScheme::nonOrthDeltaCoeffs() (and distinct from the orthogonal deltaCoeffs). Refs: N2, T5
…mer (N1) After the N2 revert every snGrad scheme (uncorrected/corrected/limitedCorrected) uses nonOrthDeltaCoeffs, leaving GeometryScheme::deltaCoeffs() (orthogonal 1/|d|) without a production consumer. Per the N1 decision it stays computed and exposed for API completeness and potential future consumers; document that explicitly so the dead-looking accessor is not mistaken for a wiring bug. Refs: N1
…rected/limited snGrad (v2b) corrected and limitedCorrected snGrad now apply the full non-orthogonal correction (corrVec . interpolate(grad)) at processor faces, matching OpenFOAM, instead of the orthogonal-only proc treatment. The interpolated cell gradient needs the neighbour cell gradient across the rank boundary, which is halo-exchanged via the new templated processorGradientHalo.hpp (Vec3 + Tensor). Guarded by NF_WITH_MPI_SUPPORT. Serial + NeoN MPI regression green; matches OpenFOAM on serial non-orthogonal meshes. Distributed OF-validation on a non-orthogonal decomposed mesh is pending a pre-existing distributed proc-halo (comm-pattern) fix tracked separately; the NeoFOAM snGrad_distributed test documents and skips that case for now. Refs: N4, v2b
10b5d5a to
833a8bc
Compare
greole
reviewed
May 30, 2026
| ); | ||
| } | ||
| if (!requests.empty()) | ||
| MPI_Waitall(static_cast<int>(requests.size()), requests.data(), MPI_STATUSES_IGNORE); |
Contributor
There was a problem hiding this comment.
No raw MPI calls here, they should use the corresponding MPI/operator implementation
…cotch) Three compounding defects broke the distributed VolumeField proc halo under scotch / non-X-split decompositions where a rank owns more than one proc patch: - BoundaryData::value() calls waitAll(); setProcBoundaryValue seeded the owner cell through value(), so seeding proc patch N+1 eagerly drained and cleared patch N's still-in-flight exchange -> the 2nd proc patch on each rank never received (halo == owner). Add a non-draining valueNoWait() seed accessor and a single post-loop waitAll() drain in correctBoundaryConditions (batch post-then-wait). A deterministic per-patch MPI tag is added as hardening. - gaussGreenGrad::computeGradTensor (Vec3/Tensor path) was missing the processor-face flux loop the scalar path has -> corrected/limited Vec3 snGrad was wrong at proc-adjacent cells. Add the proc-face contribution. - distributedUnstructuredMesh::computeCommunicationPattern: accumulate sendCounts[targetRank] += / displacements (was assignment) so multiple patches to the same neighbour rank no longer collide. Validated via test_snGrad_distributed on the sheared scotch mesh, 3 ranks, CPU.
…nd docs - Add mpi::waitAll (pointer+count and std::vector overloads) to core/mpi and replace the raw MPI_Waitall calls in processorGradientHalo and basicGeometryScheme with it, so consumer code goes through the NeoN MPI operator wrappers (MPI_Waitall now appears only inside the wrapper itself). - Update the BasicGeometryScheme class documentation to match current behaviour: processor deltaCoeffs are 1/|Cnei-Cown| via the halo-exchanged neighbour cell centre (exact on non-orthogonal processor faces), and the non-orthogonal correction is applied at processor faces. - Reword comments to be framework-neutral: drop OpenFOAM / fvPatch references in favour of general finite-volume descriptions. The NeoN backend does not depend on OpenFOAM. No behaviour change; mpi::waitAll wraps the identical MPI_Waitall. Tests: snGrad_distributed (scotch, 3 ranks) plus serial snGrad/stencils/gaussGreenDiv all pass.
BDF1 and BDF2 boundary kernels now compute the ddt (Rhie-Chow temporal)
flux correction on processor faces.
- Use compressed mesh.boundaryMesh().faceNormals() indexed by bfi, not the
OF-full mesh.faceNormals()[nInternalFaces+bfi] (wrong face when empty/wedge
patches are present).
- Loop over {0, nBoundaryFaces + nProcBoundaryFaces} to include the processor
tail of boundaryData().value(), which was previously left unwritten -> garbage
proc-face fluxCorr fed phiHbyA and caused an unbounded Courant runaway on
distributed runs.
fluxCorr processor patch is a 'calculated' BC (no-op correctBoundaryCondition),
so the locally computed proc-tail value survives correctBoundaryConditions().
Adapted for the split internal/boundary SurfaceField storage (post fa4fdf1);
not the outdated full-storage form on feat/gpu-distributed.
Verified on tutorials/tiltedCube (snappyHexMesh tilted cube + layers, scotch-4):
removes the parallel-vs-serial Courant runaway.
setProcBoundaryValue for a SurfaceField wrongly used volume semantics: value[i] = internalVector[faceCells[i]]; internalVector() of a SurfaceField is FACE data (size nInternalFaces) while faceCells[i] is an owner-CELL id, so this read an unrelated internal face's flux and shipped it to the neighbour, corrupting the exchanged proc-tail on any rank with >= 2 processor patches (introduced by 4389a5c as a verbatim copy of the volume processor BC). The surface proc-tail already holds the correct local face value (written by flux()/updateFaceVelocity()/interpolation/constructFrom), so the BC must not touch value_ at all -- only normalise the unused mixed-BC fields (refGrad/valueFraction/refValue). value_ is not accessed, so no pending proc-patch exchange is drained. Verified on tutorials/tiltedCube (serial-CPU vs parallel-CPU oracle, executor drift proven zero): step-1 max Courant now matches serial exactly for scotch and hierarchical cuts (was 1.7-6.2x); np=2 single-patch near-exact. Distributed ctests pass 4/4.
…edBCs rAU/HbyA are built with createExtrapolatedBCs, which assigned an 'extrapolated' BC to every patch including the processor (coupled) patches. An extrapolated processor-tail holds the owner cell value, not the neighbour halo, so linear interpolation of rAU to processor faces degenerated to w*own + (1-w)*own = own. The interpolated rAUf then differed across each shared processor face, making the assembled pressure matrix non-symmetric; CG converged to a wrong pressure with small continuity (distributed neoIcoFoam divergence on cuts with two or more processor patches per rank). createExtrapolatedBCs now assigns the 'processor' BC (neighbour-halo exchange) to the trailing processor patches and 'extrapolated' only to physical patches. No-op in serial. Verified on tiltedCube (np4 scotch): processor-face off-diagonal coupling symmetry 1.14 -> 2e-16; per-cell pressure error versus serial 0.59 -> 0.028 at step 1; serial-vs-parallel trajectory now bounded and decaying. All 10 distributed ctests pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
WHAT is it, WHY it is needed. 😘