Motivation
Following the successful OpenMP parallelization of the RRTMGP-SW block loop in GEOS_SolarGridComp.F90 (PR #86, 1.5-1.75x speedup at 2 threads), this issue tracks the same work for the RRTMGP-LW block loop in GEOS_IrradGridComp.F90.
The target loop is do b = 1, nBlocks in LW_Driver (line 2778). Each iteration is fully independent -- it operates on a non-overlapping column slice colS:colE of the input/output arrays and uses thread-local scratch objects. RRTMGP is confirmed thread-safe by its developers.
Approach
Mirror the Solar approach exactly:
- Step 1 -- Uniform per-block alloc/dealloc: replace the two-phase partial-block pattern with ceiling division so every iteration is structurally identical.
- Step 2a -- Extract
compute_lw_aer_optics
- Step 2b -- Extract
compute_lw_cloud_optics_mcica
- Step 2c -- Extract
compute_lw_gas_optics
- Step 2d -- Extract
compute_lw_rte
- Step 3 -- Assemble
PROCESS_RRTMGP_LW_BLOCK wrapper
- Step 4 -- Add
!$OMP PARALLEL DO SCHEDULE(DYNAMIC)
Each step is a separate commit. Steps 1-3 are pure refactoring (bit-identical results expected). Step 4 introduces threading (results may differ due to McICA stochasticity).
Key LW differences vs Solar
- Up to 4
rte_lw calls per block (clean clear-sky, clean all-sky, dirty clear-sky, dirty all-sky)
- Both
ty_fluxes_broadband and ty_fluxes_byband flux structs used
- Surface Jacobians (
flux_up_Jac, dfupdts_*)
dirty_optical_props is a per-block copy of clean_optical_props + aerosol increment
- Planck source function (
ty_source_func_lw sources) is a per-block object
- PRNG seeding uses direct global grid math (
jBeg, iBeg, IM_World) rather than lookup arrays
Branch
feature/tclune/lw_openmp off feature/sdrabenh/gcm_v12
Motivation
Following the successful OpenMP parallelization of the RRTMGP-SW block loop in
GEOS_SolarGridComp.F90(PR #86, 1.5-1.75x speedup at 2 threads), this issue tracks the same work for the RRTMGP-LW block loop inGEOS_IrradGridComp.F90.The target loop is
do b = 1, nBlocksinLW_Driver(line 2778). Each iteration is fully independent -- it operates on a non-overlapping column slicecolS:colEof the input/output arrays and uses thread-local scratch objects. RRTMGP is confirmed thread-safe by its developers.Approach
Mirror the Solar approach exactly:
compute_lw_aer_opticscompute_lw_cloud_optics_mcicacompute_lw_gas_opticscompute_lw_rtePROCESS_RRTMGP_LW_BLOCKwrapper!$OMP PARALLEL DO SCHEDULE(DYNAMIC)Each step is a separate commit. Steps 1-3 are pure refactoring (bit-identical results expected). Step 4 introduces threading (results may differ due to McICA stochasticity).
Key LW differences vs Solar
rte_lwcalls per block (clean clear-sky, clean all-sky, dirty clear-sky, dirty all-sky)ty_fluxes_broadbandandty_fluxes_bybandflux structs usedflux_up_Jac,dfupdts_*)dirty_optical_propsis a per-block copy ofclean_optical_props+ aerosol incrementty_source_func_lw sources) is a per-block objectjBeg,iBeg,IM_World) rather than lookup arraysBranch
feature/tclune/lw_openmpofffeature/sdrabenh/gcm_v12