Skip to content

Fix/geometry scheme proc boundary#528

Open
HendriceH wants to merge 14 commits into
developfrom
fix/geometrySchemeProcBoundary
Open

Fix/geometry scheme proc boundary#528
HendriceH wants to merge 14 commits into
developfrom
fix/geometrySchemeProcBoundary

Conversation

@HendriceH
Copy link
Copy Markdown
Collaborator

Motivation

WHAT is it, WHY it is needed. 😘

@github-actions
Copy link
Copy Markdown

Thank you for your PR, here are some useful tips:

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes and expands finite-volume geometry scheme handling around processor boundaries, especially for distributed meshes and face-normal-gradient consumers.

Changes:

  • Adds BasicGeometryScheme regression tests, including MPI processor-boundary coverage.
  • Updates geometry scheme calculations for deltaCoeffs, nonOrthDeltaCoeffs, and correction vectors, including MPI neighbour exchanges.
  • Aligns uncorrected face-normal gradients with orthogonal deltaCoeffs and adjusts distributed tests to CPU-only execution.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/finiteVolume/CMakeLists.txt Adds stencil tests to the finite-volume test suite.
test/finiteVolume/cellCentred/stencil/CMakeLists.txt Defines serial and MPI geometry-scheme tests.
test/finiteVolume/cellCentred/stencil/basicGeometryScheme.cpp Adds analytical serial geometry-scheme regression tests.
test/finiteVolume/cellCentred/stencil/basicGeometryScheme_distributed.cpp Adds MPI processor-boundary geometry checks.
test/finiteVolume/cellCentred/auxiliary/coNum_distributed.cpp Restricts distributed Courant test to CPU executor.
test/distributed/partitioning.cpp Restricts distributed partitioning test to CPU executor.
test/distributed/operator.cpp Restricts distributed operator test to CPU executor.
src/mesh/unstructured/distributedUnstructuredMesh.cpp Places partitioned 1D mesh slabs in global coordinates.
src/finiteVolume/cellCentred/stencil/geometryScheme.cpp Passes precomputed non-orthogonal coefficients into correction-vector updates.
src/finiteVolume/cellCentred/stencil/basicGeometryScheme.cpp Updates geometry coefficient and correction-vector producer logic, including MPI exchanges.
src/finiteVolume/cellCentred/faceNormalGradient/uncorrected.cpp Switches uncorrected gradients to use orthogonal deltaCoeffs.
src/finiteVolume/cellCentred/faceNormalGradient/limitedCorrected.cpp Adds Vec3 processor-boundary parity and zeros correction boundary data.
src/finiteVolume/cellCentred/faceNormalGradient/corrected.cpp Adds Vec3 processor-boundary parity and zeros correction boundary data.
include/NeoN/finiteVolume/cellCentred/stencil/geometryScheme.hpp Updates geometry scheme factory API and coefficient documentation.
include/NeoN/finiteVolume/cellCentred/stencil/basicGeometryScheme.hpp Documents BasicGeometryScheme assumptions and updated API.
include/NeoN/finiteVolume/cellCentred/faceNormalGradient/uncorrected.hpp Exposes orthogonal deltaCoeffs from uncorrected gradients.
include/NeoN/finiteVolume/cellCentred/faceNormalGradient/faceNormalGradient.hpp Documents processor-boundary face-normal-gradient semantics.
Comments suppressed due to low confidence (1)

src/finiteVolume/cellCentred/stencil/basicGeometryScheme.cpp:431

  • Processor-boundary nonOrthDeltaCoeffs no longer applies the same over-relaxed clamp used for internal and physical-boundary faces. On a highly skewed processor face this can divide by a very small projected distance even though nonOrthDeltaClamp * |Cnei - Cown| should bound the denominator, so the MPI path can produce much larger coefficients than the serial/internal path for the same geometry.
        const auto dNei = exchangeProcOwnerDistance(exec, mesh_);
        const auto dNeiView = dNei.view();
        parallelFor(
            exec,
            {0, nProcBoundaryFaces},
            NEON_LAMBDA(const localIdx procFacei) {
                const localIdx bfi = nBoundaryFaces + procFacei;
                const Vec3 n = (1.0 / bFaceAreas[bfi]) * bFaceNormals[bfi];
                const Vec3 co = cellCenters[surfFaceCells[bfi]];
                const scalar dOwn = std::abs(n & (bFaceCenters[bfi] - co));
                nonOrthDeltaCoeffB[bfi] =
                    1.0 / std::max(dOwn + dNeiView[procFacei], scalar(ROOTVSMALL));

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +25 to +27
* - processor-boundary deltaCoeffs/nonOrthDeltaCoeffs use the face-normal-projected
* owner+neighbour distance (exact on orthogonal proc faces); the non-orthogonal correction
* is not applied at processor faces (see FaceNormalGradient docs).
Comment on lines +52 to +59
// Processor-boundary semantics (review N3/N4): at processor faces every scheme
// (uncorrected, corrected, limitedCorrected) applies the orthogonal part only,
// snGrad = deltaCoeffs * (phiNei - phiOwn),
// and the scalar and Vec3 code paths are kept in parity. The non-orthogonal
// correction across rank boundaries is NOT applied: the producer
// (BasicGeometryScheme) leaves processor-face correction vectors at zero, so
// producer and consumer are self-consistent. Recovering the proc non-orth
// correction needs the neighbour cell centre exchanged and is deferred.
Comment on lines +157 to +160
// Processor-boundary parity with the scalar path (review N3). Applies the
// orthogonal part only — the non-orthogonal correction at proc faces is
// deferred (review N4): the producer leaves proc-corrVec at zero, so this
// is self-consistent. See FaceNormalGradient header for the contract.
Comment on lines +181 to +184
// Processor-boundary parity with the scalar path (review N3). Orthogonal
// part only; non-orth proc correction deferred (review N4). The limiter
// applies only to the internal correction, so proc faces reduce to the
// uncorrected form exactly as in the scalar path.
HendriceH added 9 commits May 30, 2026 09:46
Adds a 3D-uniform-mesh analytical regression test for BasicGeometryScheme,
the first direct stencil test on a non-1D mesh. Asserts weights (0.5 internal,
1.0 boundary), deltaCoeffs (1/h internal, 2/h boundary), nonOrthDeltaCoeffs
(== deltaCoeffs on an orthogonal mesh), and zero non-orth correction. Closes
the highest-leverage test gap from REVIEW_basicGeometrySchemes.md (T1 / H5):
every prior stencil test ran on create1DUniformMesh, the unique mesh on which
geometric-scheme bugs are invisible.

Refs: T1, H5
…ed, safety + dedup

Producer-side remediation from REVIEW_basicGeometrySchemes.md:

- N1/N2: uncorrected snGrad now uses the orthogonal deltaCoeffs (1/|d|) for both
  its value (computeFaceNormalGrad) and its deltaCoeffs() accessor, matching
  OpenFOAM's uncorrectedSnGrad. This wires the previously-dead
  GeometryScheme::deltaCoeffs() to a real consumer. No-op on orthogonal/1D meshes.
- GEOM-03 (H1-remainder): updateDeltaCoeffs now writes processor-boundary entries
  (projected dOwn+dNei; exact on orthogonal proc faces, documented limitation on
  skewed). Previously left at zero -> zero diffusive flux across ranks.
- H2: corrected the nonOrthDeltaCoeff doxygen formula to 1/(faceNormal . d).
- H3: ROOTVSMALL floor added to the non-orth std::max(orthoDist, 0.05|d|) guards.
- M3: updateNonOrthCorrectionVec3s reads the precomputed nonOrthDeltaCoeffs field.
- M4: removed the dead fill(nonOrthDeltaCoeffs, 0) before a full overwrite.
- M2: removed [[maybe_unused]] from parameters that are used.

Refs: N1, N2, H1, H2, H3, M2, M3, M4
…cit contracts

snGrad consumer remediation from REVIEW_basicGeometrySchemes.md:

- N3: the Vec3 paths of computeCorrectedFaceNormalGrad and
  computeLimitedCorrectedFaceNormalGrad now contain the processor-boundary loop
  their scalar siblings already had. snGrad of a VolumeField<Vec3> (e.g. U) now
  writes proc-face entries instead of leaving them zero under MPI.
- N4: documented the processor-boundary contract in the FaceNormalGradient
  interface — all schemes apply the orthogonal part only at proc faces; the
  non-orth proc correction is deferred and the producer leaves proc-corrVec zero,
  so producer and consumer are self-consistent.
- N5: computeCorrectionTerm / computeLimitedCorrectionTerm now explicitly zero
  corrField.boundaryData() (both scalar and Vec3) rather than relying on the
  field's zero-init, so a future Laplacian RHS change cannot read undefined data.
- N6: the producer zeroes nonOrthCorrectionVec3s over the proc range too, making
  the 'corrVec == 0 on all boundary faces' contract explicit.

Refs: N3, N4, N5, N6
…U-only MPI

- T2: basicGeometryScheme_distributed (MPI_SIZE 3) asserts processor-boundary
  weights (0.5), deltaCoeffs (1/h), nonOrthDeltaCoeffs (1/h) and zero corrVec on a
  partitioned 1D mesh. Fails against the pre-fix code (proc deltaCoeffs == 0) and
  guards the GEOM-03 / H1 remainder.
- T3: sheared (non-orthogonal) cube test asserting deltaCoeffs and nonOrthDeltaCoeffs
  genuinely diverge and the non-orth correction is non-zero (review N2).
- T5: identity guard that 'uncorrected' exposes the orthogonal deltaCoeffs, not
  nonOrthDeltaCoeffs (pins the N1/N2 decision).
- One GPU on this machine: every distributed test now uses CPUExecutor only.
  Retrofitted coNum_distributed, distributed/operator and distributed/partitioning
  off allAvailableExecutor (which included the GPU).

Refs: T2, T3, T5, H1
Cleanup tier of REVIEW_basicGeometrySchemes.md (+ predecessor M/L items):

- H4: exchangeProcOwnerDistance computes the owner projected distance on the device
  and copies only the nProcFaces exchanged scalars to the host, instead of a full
  mesh.cellCenters() D->H copy (GB-scale on industrial meshes) plus four boundary-
  array host copies.
- M5: typed mpi::isend<scalar>/irecv<scalar> with a dedicated tag (not raw-byte char
  sends on tag 0).
- M6: the exchange result is allocated directly on the executor (no SerialExecutor
  round-trip); L7: early-return when there are no processor faces.
- L1: the 0.05 over-relaxed clamp is now the named constant nonOrthDeltaClamp.
- L2: updateNonOrthDeltaCoeffs boundary loop indexes the compressed boundary mesh,
  matching updateDeltaCoeffs/updateWeights (avoids OF-full-vs-compressed mismatch).
- M7/L6: dropped the dead UnstructuredMesh& parameter from GeometrySchemeFactory.
- M8: justified the std::abs on projections in updateWeights.
- L4: BasicGeometryScheme + overrides marked final; L8: class-level doxygen.
- M1/L3: GeometryScheme::reset() no longer logs a warning every update. The
  resize(0) of face/cell centres is RETAINED — it is an intentional device-memory
  optimisation (free geometry the scheme no longer needs), now documented, with the
  const_cast justified.

Refs: H4, M1, M5, M6, M7, M8, L1-L8
…l-centre exchange (v2a)

Producer side of the v2 proc-boundary work:

- Add exchangeProcNeighbourCellCentre: halo-exchanges the owner cell centre across
  processor boundaries so each rank knows the neighbouring cell centre Cnei.
- updateDeltaCoeffs proc faces now use the exact euclidean 1/|Cnei - Cown|
  (OpenFOAM's coupled deltaCoeffs) instead of the face-normal-projected distance —
  exact on non-orthogonal processor faces. Makes uncorrected snGrad / the Laplacian
  OF-consistent across rank boundaries on skewed meshes.
- updateNonOrthCorrectionVec3s now computes the processor-face correction vector
  (n - delta*nonOrthDeltaCoeff) instead of zeroing it — the producer half of the
  non-orthogonal proc correction (N4). Non-zero only on non-orthogonal proc faces.

- create1DUniformMeshPart now places each rank's slab at its true global position
  (xOffset = rank*rightBoundary). A faithful decomposition shares one global
  coordinate frame (as decomposePar produces), which the cross-rank coordinate
  exchange requires; previously every rank sat at [0, rightBoundary]. Only positions
  shift, so fluxes/volumes/Courant/gradients/projected distances are unchanged.

T2 (now global-coord) asserts the exact proc deltaCoeffs; coNum/operator/partitioning
unaffected. Consumer side (applying the proc correction in corrected/limitedCorrected)
follows in v2b.

Refs: v2a, N4 (producer)
…revert N2)

OpenFOAM's uncorrectedSnGrad returns mesh().nonOrthDeltaCoeffs() (1/(n.d)), not the
orthogonal deltaCoeffs (1/|d|). The review's N2 premise -- that OF uses 1/|d| -- was
verified false against the OF source (uncorrectedSnGrad.H), so the Phase 4 N2 change
that rewired 'uncorrected' to deltaCoeffs was a regression on non-orthogonal meshes.
This restores NeoN's original OF-faithful behaviour in both the accessor and the kernel.

Updates test T5 to assert the corrected wiring: uncorrected.deltaCoeffs() ==
GeometryScheme::nonOrthDeltaCoeffs() (and distinct from the orthogonal deltaCoeffs).

Refs: N2, T5
…mer (N1)

After the N2 revert every snGrad scheme (uncorrected/corrected/limitedCorrected) uses
nonOrthDeltaCoeffs, leaving GeometryScheme::deltaCoeffs() (orthogonal 1/|d|) without a
production consumer. Per the N1 decision it stays computed and exposed for API
completeness and potential future consumers; document that explicitly so the dead-looking
accessor is not mistaken for a wiring bug.

Refs: N1
…rected/limited snGrad (v2b)

corrected and limitedCorrected snGrad now apply the full non-orthogonal correction
(corrVec . interpolate(grad)) at processor faces, matching OpenFOAM, instead of the
orthogonal-only proc treatment. The interpolated cell gradient needs the neighbour cell
gradient across the rank boundary, which is halo-exchanged via the new templated
processorGradientHalo.hpp (Vec3 + Tensor). Guarded by NF_WITH_MPI_SUPPORT.

Serial + NeoN MPI regression green; matches OpenFOAM on serial non-orthogonal meshes.
Distributed OF-validation on a non-orthogonal decomposed mesh is pending a pre-existing
distributed proc-halo (comm-pattern) fix tracked separately; the NeoFOAM snGrad_distributed
test documents and skips that case for now.

Refs: N4, v2b
@HendriceH HendriceH force-pushed the fix/geometrySchemeProcBoundary branch from 10b5d5a to 833a8bc Compare May 30, 2026 07:46
);
}
if (!requests.empty())
MPI_Waitall(static_cast<int>(requests.size()), requests.data(), MPI_STATUSES_IGNORE);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No raw MPI calls here, they should use the corresponding MPI/operator implementation

HendriceH added 4 commits May 30, 2026 22:40
…cotch)

Three compounding defects broke the distributed VolumeField proc halo under
scotch / non-X-split decompositions where a rank owns more than one proc patch:

- BoundaryData::value() calls waitAll(); setProcBoundaryValue seeded the owner
  cell through value(), so seeding proc patch N+1 eagerly drained and cleared
  patch N's still-in-flight exchange -> the 2nd proc patch on each rank never
  received (halo == owner). Add a non-draining valueNoWait() seed accessor and
  a single post-loop waitAll() drain in correctBoundaryConditions (batch
  post-then-wait). A deterministic per-patch MPI tag is added as hardening.
- gaussGreenGrad::computeGradTensor (Vec3/Tensor path) was missing the
  processor-face flux loop the scalar path has -> corrected/limited Vec3 snGrad
  was wrong at proc-adjacent cells. Add the proc-face contribution.
- distributedUnstructuredMesh::computeCommunicationPattern: accumulate
  sendCounts[targetRank] += / displacements (was assignment) so multiple
  patches to the same neighbour rank no longer collide.

Validated via test_snGrad_distributed on the sheared scotch mesh, 3 ranks, CPU.
…nd docs

- Add mpi::waitAll (pointer+count and std::vector overloads) to core/mpi and
  replace the raw MPI_Waitall calls in processorGradientHalo and
  basicGeometryScheme with it, so consumer code goes through the NeoN MPI
  operator wrappers (MPI_Waitall now appears only inside the wrapper itself).
- Update the BasicGeometryScheme class documentation to match current
  behaviour: processor deltaCoeffs are 1/|Cnei-Cown| via the halo-exchanged
  neighbour cell centre (exact on non-orthogonal processor faces), and the
  non-orthogonal correction is applied at processor faces.
- Reword comments to be framework-neutral: drop OpenFOAM / fvPatch references
  in favour of general finite-volume descriptions. The NeoN backend does not
  depend on OpenFOAM.

No behaviour change; mpi::waitAll wraps the identical MPI_Waitall. Tests:
snGrad_distributed (scotch, 3 ranks) plus serial snGrad/stencils/gaussGreenDiv
all pass.
BDF1 and BDF2 boundary kernels now compute the ddt (Rhie-Chow temporal)
flux correction on processor faces.

- Use compressed mesh.boundaryMesh().faceNormals() indexed by bfi, not the
  OF-full mesh.faceNormals()[nInternalFaces+bfi] (wrong face when empty/wedge
  patches are present).
- Loop over {0, nBoundaryFaces + nProcBoundaryFaces} to include the processor
  tail of boundaryData().value(), which was previously left unwritten -> garbage
  proc-face fluxCorr fed phiHbyA and caused an unbounded Courant runaway on
  distributed runs.

fluxCorr processor patch is a 'calculated' BC (no-op correctBoundaryCondition),
so the locally computed proc-tail value survives correctBoundaryConditions().
Adapted for the split internal/boundary SurfaceField storage (post fa4fdf1);
not the outdated full-storage form on feat/gpu-distributed.

Verified on tutorials/tiltedCube (snappyHexMesh tilted cube + layers, scotch-4):
removes the parallel-vs-serial Courant runaway.
setProcBoundaryValue for a SurfaceField wrongly used volume semantics:
  value[i] = internalVector[faceCells[i]];
internalVector() of a SurfaceField is FACE data (size nInternalFaces) while
faceCells[i] is an owner-CELL id, so this read an unrelated internal face's
flux and shipped it to the neighbour, corrupting the exchanged proc-tail on
any rank with >= 2 processor patches (introduced by 4389a5c as a verbatim
copy of the volume processor BC).

The surface proc-tail already holds the correct local face value (written by
flux()/updateFaceVelocity()/interpolation/constructFrom), so the BC must not
touch value_ at all -- only normalise the unused mixed-BC fields
(refGrad/valueFraction/refValue). value_ is not accessed, so no pending
proc-patch exchange is drained.

Verified on tutorials/tiltedCube (serial-CPU vs parallel-CPU oracle, executor
drift proven zero): step-1 max Courant now matches serial exactly for scotch
and hierarchical cuts (was 1.7-6.2x); np=2 single-patch near-exact. Distributed
ctests pass 4/4.
@HendriceH HendriceH changed the base branch from enh/distributedGko to develop May 31, 2026 15:13
…edBCs

rAU/HbyA are built with createExtrapolatedBCs, which assigned an
'extrapolated' BC to every patch including the processor (coupled)
patches. An extrapolated processor-tail holds the owner cell value, not
the neighbour halo, so linear interpolation of rAU to processor faces
degenerated to w*own + (1-w)*own = own. The interpolated rAUf then
differed across each shared processor face, making the assembled
pressure matrix non-symmetric; CG converged to a wrong pressure with
small continuity (distributed neoIcoFoam divergence on cuts with two or
more processor patches per rank).

createExtrapolatedBCs now assigns the 'processor' BC (neighbour-halo
exchange) to the trailing processor patches and 'extrapolated' only to
physical patches. No-op in serial.

Verified on tiltedCube (np4 scotch): processor-face off-diagonal
coupling symmetry 1.14 -> 2e-16; per-cell pressure error versus serial
0.59 -> 0.028 at step 1; serial-vs-parallel trajectory now bounded and
decaying. All 10 distributed ctests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants