-
Notifications
You must be signed in to change notification settings - Fork 275
Add exclusive SLURM option to select RTs on Ursa // fix per-timestep restarts for ATM #2992 // Fix compilation warnings and update WW3 #3000 #2979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
I still want to do more testing on the MSU machines before this is marked as ready for the queue. |
|
We've had occasional issues on Hercules with long (4 hour+) runtimes for the full regression test suite, which is why I wanted to do more testing. I'm going to leave the settings as is for Hercules. If the long runtimes on Hercules continue, we can turn that option off and turn it on for the specific tests when running on Hercules (like this PR is doing for Ursa). Feel free to comment or make other suggestions. |
Sure, I can get both of those into this PR and have it ready by the time Dusan's PR is processed. |
Commit Queue Requirements:
test_changes.listindicates which tests, if any, are changed by this PR. Committest_changes.list, even if it is empty.Description:
#2979
This PR adds an option that can be configured to add a node exclusive option to PBS or SLURM when running regression tests.
The option
EXCLUSIVE_NODEScan be set to true or false to add the appropriate option to the PBS or SLURM job card.For now, Ursa specific checks will be added to certain tests. Some machines that will always have the exclusive option turned on will continue to do so.
This was based off of work done by Dusan that found that the
conus13kmandregionalregression tests ran to completion more often if the node running the test was not running other jobs on unused cores (i.e., you should use the exclusive option). This did not 100% resolve the issue but it's a step in the right direction.An option in SLURM for consecutive nodes was also added. This was found to speed up some runs in GW (https://github.com/NOAA-EMC/global-workflow/pull/4123/files#diff-f0b943b553ef1734f72f6660f02fcf8d692d324e6cf64ad490f002aa6b9bcd12L12). This won't affect most tests since they run on a single node and are not resource intensive, but it should have an effect on RTs or machines that checkout multiple nodes.
#2992
Re-enables broken per-timestep restarts for ATM
#3000
This PR addresses multiple compiler warnings and update WW3 to the current develop branch.
A full set of Intel and GNU UFSWM RT tests was performed. Three changes were observed due to run-time timeouts.
NOTE: These changes do not involve the wave model.
Commit Message:
Priority:
Git Tracking
UFSWM:
Sub component Pull Requests:
UFSWM Blocking Dependencies:
Documentation:
Changes
Regression Test Changes (Please commit test_changes.list):
Input data Changes:
Library Changes/Upgrades:
Testing Log: