-
Notifications
You must be signed in to change notification settings - Fork 607
Ifpack2/Tpetra: fix #14529 and re-do of #14337 #14616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
CDash for AT1 results [Only accessible from Sandia networks] |
|
Your PR updated files that did not respect package formatting settings. Patchdiff --git a/packages/tpetra/core/src/Tpetra_Map_decl.hpp b/packages/tpetra/core/src/Tpetra_Map_decl.hpp
index 2658c20c..ce9dc24e 100644
--- a/packages/tpetra/core/src/Tpetra_Map_decl.hpp
+++ b/packages/tpetra/core/src/Tpetra_Map_decl.hpp
@@ -1092,123 +1092,125 @@ class Map : public Teuchos::Describable {
const global_ordinal_type indexBase,
const Teuchos::RCP<const Teuchos::Comm<int>>& comm);
- //! Copy the local map from device to host, if it's not on host already
- void lazyPushToHost() const;
-
- //! The communicator over which this Map is distributed.
- Teuchos::RCP<const Teuchos::Comm<int> > comm_;
-
- //! The index base for global indices in this Map.
- global_ordinal_type indexBase_;
-
- /// \brief The total number of global indices in this Map over all
- /// processes in its communicator \c comm (see above).
- global_size_t numGlobalElements_;
-
- //! The number of global indices owned by this process.
- size_t numLocalElements_;
-
- //! The min global index owned by this process.
- global_ordinal_type minMyGID_;
-
- //! The max global index owned by this process.
- global_ordinal_type maxMyGID_;
-
- /// \brief The min global index in this Map over all processes in
- /// its communicator \c comm (see above).
- global_ordinal_type minAllGID_;
-
- /// \brief The max global index in this Map over all processes in
- /// its communicator \c comm (see above).
- global_ordinal_type maxAllGID_;
-
- /// \brief First contiguous GID.
- ///
- /// This is only set if the Map was created using the
- /// noncontiguous constructor. In that case, if the calling
- /// process owns at least one GID, this will always equal that
- /// first GID in the list of GIDs given to the constructor.
- global_ordinal_type firstContiguousGID_;
-
- /// \brief Last contiguous GID.
- ///
- /// This is only set if the Map was created using the
- /// noncontiguous constructor. In that case, if the calling
- /// process owns at least one GID, this will always equal the last
- /// GID (inclusive) that forms an initial sequence of contiguous
- /// GIDs, in the list of GIDs given to the constructor.
- ///
- /// For example, if the list is [42, 43, 44, 45, 100, 1001],
- /// firstContiguousGID_ will be 42 and lastContiguousGID_ will be
- /// 45. If the list is [42, 100, 1001, 1002, 1003],
- /// firstContiguousGID_ will be 42 and lastContiguousGID_ will
- /// also be 42.
- global_ordinal_type lastContiguousGID_;
-
- /// \brief Whether the range of global indices is uniform.
- ///
- /// This is only true if the Map was constructed using the first
- /// (uniform contiguous) constructor or a nonmember constructor
- /// that calls it.
- bool uniform_;
-
- //! Whether the range of global indices are contiguous and ordered.
- bool contiguous_;
-
- /// \brief Whether this map's global indices are distributed
- /// (true), or locally replicated (false), over its communicator
- /// \c comm (see above).
- ///
- /// This is true if the Map is globally distributed, and false
- /// otherwise (if the Map is locally replicated). See the
- /// documentation of isDistributed() for a definition of these two
- /// mutually exclusive terms.
- bool distributed_;
-
- /// \brief A mapping from local IDs to global IDs.
- ///
- /// By definition, this mapping is local; it only contains global
- /// IDs owned by this process. This mapping is created in two
- /// cases:
- ///
- /// <ol>
- /// <li> It is always created for a noncontiguous Map, in the
- /// noncontiguous version of the Map constructor.</li>
- /// <li> In getLocalElementList(), on demand (if it wasn't created
- /// before).</li>
- /// </ol>
- ///
- /// The potential for on-demand creation is why this member datum
- /// is declared "mutable". Note that other methods, such as
- /// describe(), may invoke getLocalElementList().
- ///
- /// To clarify: If this is empty, then it could be either that the
- /// Map is contiguous (meaning that we don't need to store all the
- /// global indices explicitly), or that the Map really does
- /// contain zero indices on the calling process.
- ///
- /// This has LayoutLeft so that we can call Kokkos::deep_copy to
- /// copy this between any two Kokkos Devices. Otherwise, the
- /// Devices might have different default layouts, thus forbidding
- /// a deep_copy. We use LayoutLeft instead of LayoutRight because
- /// LayoutRight is the default on non-CUDA Devices, and we want to
- /// make sure we catch assignment or copying from the default to
- /// the nondefault layout.
- mutable Kokkos::View<const global_ordinal_type*,
- Kokkos::LayoutLeft,
- device_type> lgMap_;
-
- /// \brief Host View of lgMap_.
- ///
- /// This is allocated along with lgMap_, on demand (lazily), by
- /// getLocalElementList() (which see). It is also used by
- /// getGlobalElement() (which is a host method, and therefore
- /// requires a host View) if necessary (only noncontiguous Maps
- /// need this).
+ //! Copy the local map from device to host, if it's not on host already
+ void lazyPushToHost() const;
+
+ //! The communicator over which this Map is distributed.
+ Teuchos::RCP<const Teuchos::Comm<int>> comm_;
+
+ //! The index base for global indices in this Map.
+ global_ordinal_type indexBase_;
+
+ /// \brief The total number of global indices in this Map over all
+ /// processes in its communicator \c comm (see above).
+ global_size_t numGlobalElements_;
+
+ //! The number of global indices owned by this process.
+ size_t numLocalElements_;
+
+ //! The min global index owned by this process.
+ global_ordinal_type minMyGID_;
+
+ //! The max global index owned by this process.
+ global_ordinal_type maxMyGID_;
+
+ /// \brief The min global index in this Map over all processes in
+ /// its communicator \c comm (see above).
+ global_ordinal_type minAllGID_;
+
+ /// \brief The max global index in this Map over all processes in
+ /// its communicator \c comm (see above).
+ global_ordinal_type maxAllGID_;
+
+ /// \brief First contiguous GID.
+ ///
+ /// This is only set if the Map was created using the
+ /// noncontiguous constructor. In that case, if the calling
+ /// process owns at least one GID, this will always equal that
+ /// first GID in the list of GIDs given to the constructor.
+ global_ordinal_type firstContiguousGID_;
+
+ /// \brief Last contiguous GID.
+ ///
+ /// This is only set if the Map was created using the
+ /// noncontiguous constructor. In that case, if the calling
+ /// process owns at least one GID, this will always equal the last
+ /// GID (inclusive) that forms an initial sequence of contiguous
+ /// GIDs, in the list of GIDs given to the constructor.
+ ///
+ /// For example, if the list is [42, 43, 44, 45, 100, 1001],
+ /// firstContiguousGID_ will be 42 and lastContiguousGID_ will be
+ /// 45. If the list is [42, 100, 1001, 1002, 1003],
+ /// firstContiguousGID_ will be 42 and lastContiguousGID_ will
+ /// also be 42.
+ global_ordinal_type lastContiguousGID_;
+
+ /// \brief Whether the range of global indices is uniform.
+ ///
+ /// This is only true if the Map was constructed using the first
+ /// (uniform contiguous) constructor or a nonmember constructor
+ /// that calls it.
+ bool uniform_;
+
+ //! Whether the range of global indices are contiguous and ordered.
+ bool contiguous_;
+
+ /// \brief Whether this map's global indices are distributed
+ /// (true), or locally replicated (false), over its communicator
+ /// \c comm (see above).
+ ///
+ /// This is true if the Map is globally distributed, and false
+ /// otherwise (if the Map is locally replicated). See the
+ /// documentation of isDistributed() for a definition of these two
+ /// mutually exclusive terms.
+ bool distributed_;
+
+ /// \brief A mapping from local IDs to global IDs.
+ ///
+ /// By definition, this mapping is local; it only contains global
+ /// IDs owned by this process. This mapping is created in two
+ /// cases:
+ ///
+ /// <ol>
+ /// <li> It is always created for a noncontiguous Map, in the
+ /// noncontiguous version of the Map constructor.</li>
+ /// <li> In getLocalElementList(), on demand (if it wasn't created
+ /// before).</li>
+ /// </ol>
+ ///
+ /// The potential for on-demand creation is why this member datum
+ /// is declared "mutable". Note that other methods, such as
+ /// describe(), may invoke getLocalElementList().
+ ///
+ /// To clarify: If this is empty, then it could be either that the
+ /// Map is contiguous (meaning that we don't need to store all the
+ /// global indices explicitly), or that the Map really does
+ /// contain zero indices on the calling process.
+ ///
+ /// This has LayoutLeft so that we can call Kokkos::deep_copy to
+ /// copy this between any two Kokkos Devices. Otherwise, the
+ /// Devices might have different default layouts, thus forbidding
+ /// a deep_copy. We use LayoutLeft instead of LayoutRight because
+ /// LayoutRight is the default on non-CUDA Devices, and we want to
+ /// make sure we catch assignment or copying from the default to
+ /// the nondefault layout.
+ mutable Kokkos::View<const global_ordinal_type*,
+ Kokkos::LayoutLeft,
+ device_type>
+ lgMap_;
+
+ /// \brief Host View of lgMap_.
+ ///
+ /// This is allocated along with lgMap_, on demand (lazily), by
+ /// getLocalElementList() (which see). It is also used by
+ /// getGlobalElement() (which is a host method, and therefore
+ /// requires a host View) if necessary (only noncontiguous Maps
+ /// need this).
#ifndef SWIG
- mutable Kokkos::View<const global_ordinal_type*,
- Kokkos::LayoutLeft,
- Kokkos::HostSpace> lgMapHost_;
+ mutable Kokkos::View<const global_ordinal_type*,
+ Kokkos::LayoutLeft,
+ Kokkos::HostSpace>
+ lgMapHost_;
#endif
//! Type of a mapping from global IDs to local IDs.More details about our use of clang-format and other tools can be found in the wiki. |
6e21ba1 to
bc656bf
Compare
|
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_CompSim
Jenkins Parameters
Using Repos:
Pull Request Author: brian-kelley |
|
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_CompSim
Jenkins Parameters
|
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing. |
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_CompSim
Jenkins Parameters
Using Repos:
Pull Request Author: brian-kelley |
|
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: PR_gcc-openmpi-openmp
Jenkins Parameters
Build InformationTest Name: PR_CompSim
Jenkins Parameters
|
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ csiefer2 ]! |
|
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
1 similar comment
|
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
|
@trilinos/framework The cuda12 and cuda12-uvm tests keep timing out (6 hour limit) during the build. This is from the last cuda12 run. It looks like the build gets close to the end but then hangs for 4 hours (21:19 to 01:20). |
|
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
1 similar comment
|
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
|
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
|
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
3 similar comments
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
|
All Jobs Finished; status = PASSED, However PR is now STALE, and must be retested. Set the AT: RETEST Label to force retest.... |
That appears to be triggered by this change, I don't see any other timeouts in the recent results. It's generally unlikely for the build to take that long without a compiler hang given the use of |
Work around parallel_scan issue; move BTD symbolic phase to device again. And make Tpetra::Map::lazyPushToHost private again. Signed-off-by: Brian Kelley <[email protected]>
bc656bf to
f149e25
Compare
|
I replaced this with #14665 to see if the hang would go away, but it still happened. |
Tpetra::Map::lazyPushToHostprivate again.@trilinos/ifpack2
@trilinos/tpetra
Motivation
Putting symbolic on device improves performance. This also resolves the failures SPARC saw from #14337 .
Related Issues