Skip to content

Conversation

@DeniseWorthen
Copy link
Collaborator

@DeniseWorthen DeniseWorthen commented Nov 3, 2025

Commit Queue Requirements:

  • This PR addresses a relevant WM issue (if not, create an issue).
  • All subcomponent pull requests (if any) have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines), preferably on Ursa (Derecho or Hercules are acceptable alternatives). Exceptions: documentation-only PRs, CI-only PRs, etc.
    • Commit log file w/full results from RT suite run (if applicable).
    • Verify that test_changes.list indicates which tests, if any, are changed by this PR. Commit test_changes.list, even if it is empty.
  • Fill out all sections of this template.

Description:

Commit Message:

* UFSWM - add MOM6 configuration variables to attributes in ufs.configure
* UFSWM - use same time-string format for all MOM6 diag fields in diag_tables
* UFSWM - update CDEPS share code for optional arguments
  * MOM6 - add output logging module to MOM6 NUOPC cap

Priority:

  • Normal

Git Tracking

UFSWM:

Sub component Pull Requests:

UFSWM Blocking Dependencies:

  • None

Documentation:

  • Documentation update NOT required.
    • Explanation: added module in MOM6 cap includes doxygenated code

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Updates/Changes Baselines: cpld_control_gfsv17_iau will change because the MOM6 history filename has changed.

Input data Changes:

  • None.

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • GaeaC6
    • Derecho
    • Ursa
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Nov 21, 2025

@aerorahul @DavidHuber-NOAA

I have a case run with the C1152 test on WCOSS2. Could you please see whether the logs seem to be reporting the correct info?

/lfs/h2/emc/nems/noscrub/denise.worthen/FV3_RT/rt_3162848/cpld_control_c1152_v17_intel

MOM6 output log files are in YYYYMMDD.HH0000.mom6.06h and restart logs are in YYYYMMDD.HH0000.mom6.res

I had the optional debug_outputlog set on for this case, so what is happening w/ the logging can be seen by grepping either outputlog_run or outputlog_restart.

For example,

173345:MOM_cap:(outputlog_run) fname ./MOM6_OUTPUT/ocn_2025_01_16_09_00.nc  2025-01-16T17:30:00  2025-01-16T18:00:00 checkflag  T use_filesize  T         933377959               1
173346:MOM_cap:(outputlog_run)    ./MOM6_OUTPUT/ocn_2025_01_16_09_00.nc exists 2025-01-16T17:30:00  2025-01-16T18:00:00 not complete, chkflag  T       933377959       933377959
173858:MOM_cap:(outputlog_run)    ./MOM6_OUTPUT/ocn_2025_01_16_09_00.nc exists 2025-01-16T18:00:00  2025-01-16T18:30:00     complete, chkflag  F       933377959      1219551143 

reports that the 16_09_00 file first gets created at when the alarm rings at 16T18:00:00 (ie, the ending time of this advance). It has a size of 933377959 and an unlimited dimension size of 1.

Since the unlimited dimension==1 at creation, a second criteria is needed (the file size) to determine file completion, which is noted in the use_filesize T value. The chkflag=T, so that the file will be checked for completion on each subsequent advance.

At the model advance from 16T17:30:00->16T18:00:00, the file has the same size as at creation (this is same advance as when the file is created, so the file should be the same size!), so checking needs to continue (chkflag T).

At the model advance from 16T18:00:00->6T18:30:00, the file shows a filesize of 1219551143, the file is marked complete chkflag=F and no further checking of this particular file will occur.

I'll note that when the DATM configurations are used, the file always has an unlimited dimension size of 0 when created. It is not necessary to use the filesize. I cannot explain the difference in how FMS is opening/creating the files in the two cases.

@gspetro-NOAA
Copy link
Collaborator

@gspetro-NOAA I've updated. I made one last change to the sfs diag table a few days ago. Do you want me to run the RTs, or maybe just the SFS RTs?

I think just SFS would be fine.

@DeniseWorthen
Copy link
Collaborator Author

The 3 SFS tests passed on Hercules.

@FernandoAndrade-NOAA FernandoAndrade-NOAA added jenkins-ort run ORT testing In Testing The PR that is currently in testing stages labels Dec 11, 2025
@epic-cicd-jenkins epic-cicd-jenkins removed the jenkins-ort run ORT testing label Dec 12, 2025
@BrianCurtis-NOAA BrianCurtis-NOAA removed the On Deck This is the next PR in line for testing/merge. label Dec 12, 2025
@FernandoAndrade-NOAA
Copy link
Collaborator

Note: GaeaC6 will likely need to be skipped due to the persistent unknown libfabric/1.20.1 error. It is now occurring on nodes that previously ran RTs without issue. Coordination with sysadmins is ongoing.

The control_p8_ugwpv1_tempo_aerosol_hail intel test on Ursa in the previous PR ran into an OOM error that passed on a rerun. It seems to be more persistent this time around, rerunning one more time.

@FernandoAndrade-NOAA
Copy link
Collaborator

Ursa is persistently failing with OOM on the control_p8_ugwpv1_tempo_aerosol_hail intel test @jkbk2004 @gspetro-NOAA fyi

@FernandoAndrade-NOAA
Copy link
Collaborator

Please refer to /scratch3/NAGAPE/epic/Fernando.Andrade-maldonado/stmp/RT_RUNDIRS/Fernando.Andrade-maldonado/FV3_RT/rt_1805382/control_p8_ugwpv1_tempo_aerosol_hail_intel/ for the most recent run directory on Ursa

@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Dec 12, 2025

The tempo_hail test has a reduced TPN. Have you tried upping it?

if [[ $MACHINE_ID = gaeac6 ]] || [[ $MACHINE_ID = ursa ]]; then
   export TPN=144
fi

@jkbk2004
Copy link
Collaborator

We can start merging process.

@jiandewang
Copy link
Collaborator

MOM6 merged, hash # 41f39db

@jkbk2004 jkbk2004 removed the In Testing The PR that is currently in testing stages label Dec 15, 2025
@jkbk2004 jkbk2004 merged commit 4487910 into ufs-community:develop Dec 15, 2025
13 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in UFS model infrastructure Q1FY2026 Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Baseline Updates Current baselines will be updated. MOM There are changes to the MOM6 component repository.

Projects

Archived in project

8 participants