Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
820c49a
Install snap-run
Simon-van-Diepen Mar 4, 2026
ba1560a
Switch private repo to SSH based resolving
Simon-van-Diepen Mar 4, 2026
8049a3e
Fix path to snappyutil
Simon-van-Diepen Mar 4, 2026
8ba6510
Add array functionality in scheduler, snap_preparation, snap_run jobs…
Simon-van-Diepen Mar 12, 2026
6806ee8
Add SNAP filter, make job definitions uniform, update docs
Simon-van-Diepen Mar 24, 2026
accdc96
Add last docs
Simon-van-Diepen Mar 24, 2026
237beda
Bugfix for nonlinear dictionary key extraction
Simon-van-Diepen Mar 24, 2026
683a572
Fix missing quotes in scheduler hook, make snap default param file fi…
Simon-van-Diepen Mar 25, 2026
a2286d5
Format tracks properly
Simon-van-Diepen Mar 25, 2026
13aee65
Remove debug print, remove dashes from date input
Simon-van-Diepen Mar 25, 2026
3ff3ac7
Fix missing space, filter on directories
Simon-van-Diepen Mar 25, 2026
62d9014
Increase allowed memory for snap prep
Simon-van-Diepen Mar 25, 2026
c14528a
Find the correct start and end date|
Simon-van-Diepen Mar 25, 2026
cf493b5
Actually load the start end mother dates
Simon-van-Diepen Mar 25, 2026
e9628ec
Fix if statement, remove unnecessary parameter file requests
Simon-van-Diepen Mar 25, 2026
c22192a
Constrain snap_run to rome cluster, increase memory limit by factor o…
Simon-van-Diepen Mar 25, 2026
8590094
Close if statement
Simon-van-Diepen Mar 25, 2026
548422b
Remove spaces around equals
Simon-van-Diepen Mar 25, 2026
edcd8e2
make cache and VM size dependent on nCPUs, switch to znap output
Simon-van-Diepen Mar 26, 2026
90973b1
Add ZNAP output format, make email work with array jobs
Simon-van-Diepen Mar 26, 2026
f8d8814
Add SNAP_cleanup job to unzip the resulting zipped archives
Simon-van-Diepen Mar 26, 2026
3e0b550
ZNAPs are directories, not files
Simon-van-Diepen Mar 26, 2026
5b99aaa
Update changelog, alphabetize preparation.py
Simon-van-Diepen Mar 26, 2026
1cc5449
Actually remove the tmp file
Simon-van-Diepen Mar 26, 2026
d68a247
Update docs, untrack pptx modifier file, remove timers
Simon-van-Diepen Mar 30, 2026
e4ed9c0
Fix global memory setting to 12GB, add toggle for rome constraint
Simon-van-Diepen Mar 30, 2026
39a3c56
Remove Rome constraint on snap_run
Simon-van-Diepen Mar 30, 2026
2027ab2
Only read WKT if it is not a dryrun (as it may not exist in dryruns)
Simon-van-Diepen Mar 30, 2026
0b50fa7
Switch to SNAP v13, SNAP now auto unzips
Simon-van-Diepen Mar 31, 2026
8c81a71
Remove last references to SNAP 12
Simon-van-Diepen Mar 31, 2026
d9e6f0e
Fix echo statements to reflect correct memory
Simon-van-Diepen Apr 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ TEST*.xml

# Generated by MacOS
.DS_Store
docs/assets/~$*.pptx

# Generated by Windows
Thumbs.db
Expand Down
22 changes: 21 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,27 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
-

-->
## [v3.0.13](https://github.com/TUDelftGeodesy/caroline/tree/main) (04-Mar-2026, [diff](https://github.com/TUDelftGeodesy/caroline/compare/cecb5d0e408d5dfe79e3f7cfa162a649c5fdda19...main))
## [v3.1.0](https://github.com/TUDelftGeodesy/caroline/tree/main) (04-Mar-2026, [diff](https://github.com/TUDelftGeodesy/caroline/compare/95f66a6c273473ba6e0906925a5eee60d2696484...main))

### Added:
- `snap_preparation`, `snap_run`, and `snap_cleanup` jobs
- Support for jobarray submission, including the determination of how many jobs the jobarray will need
- Jobarray and subjob definitions in [the glossary](docs/glossary.md)
- Jobarray [development documentation](docs/development.md)
- Support for the Rome cluster which has 16GB memory per core instead of 12GB
- Temporary storage directory in the [config](config/spider-config.yaml)
- `general:workflow:filters:s1-coregistration-mode` key, allowing for toggling between `"doris"` and `"snap"`
- In [job-definitions.yaml](config/job-definitions.yaml), the keys `job-array:run-as-array` (`True`/`False`) and `job-array:njobs-in-array-function` (used to determine how many jobs the jobarray will need)
- `snap_toolbox` plugin (clone from [snap-coregistration](https://github.com/TUDelftGeodesy/snap-coregistration))
- [snap-coregistration](https://github.com/TUDelftGeodesy/snap-coregistration) dependency
- `snap/12.0.0` module system requirement
- Clarification on how to detect and debug faulty orbit data being ingested from [step.esa.int](https://step.esa.int/auxdata/orbits/Sentinel-1/RESORB/)

### Fixed:
- `extract_all_values_and_paths_from_dictionary` in [utils.py](caroline/utils.py) no longer returns a faulty path when multiple keys are present at a location which is not the base of the dictionary tree
- General memory limit now set to 12GB per core instead of 8GB

## [v3.0.13](https://github.com/TUDelftGeodesy/caroline/tree/95f66a6c273473ba6e0906925a5eee60d2696484) (04-Mar-2026, [diff](https://github.com/TUDelftGeodesy/caroline/compare/cecb5d0e408d5dfe79e3f7cfa162a649c5fdda19...95f66a6c273473ba6e0906925a5eee60d2696484))

### Fixed:
- `doris_cleanup`, `portal_upload`, `tarball`, and `email` no longer request `--qos=long` from the SLURM system on the `short` partition.
Expand Down
133 changes: 133 additions & 0 deletions caroline/jobarray_preparation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
"""All functions in this file aim at figuring out how many jobs are necessary for a job array submission."""

import glob
import os

from caroline.config import get_config
from caroline.io import read_parameter_file, write_run_file
from caroline.utils import format_process_folder

CONFIG_PARAMETERS = get_config()
JOB_DEFINITIONS = get_config(
f"{CONFIG_PARAMETERS['CAROLINE_INSTALL_DIRECTORY']}/config/job-definitions.yaml", flatten=False
)["jobs"]


def jobarray_preparation_scheduler_hook(parameter_file: str, njobs_function: str) -> int:
"""Allow the scheduler to access all functions in this file via job-definitions.yaml without ruff complaining.

This function acts as a hook for the scheduler in `scheduler.py`. Its purpose is to be called by the scheduler with
two arguments: the parameter file that is being scheduled, and the name of one of the functions in this file, which
it has read from `config/job_definitions.yaml` (the `job-array:njobs-in-array-function` key). This function will
evaluate which function it is, call that function with the parameter file argument, and return the results. This
way all functions in this file are accessible to the scheduler without having to modify the scheduler when a new
job requiring array computation is added to Caroline.

Parameters
----------
parameter_file: str
Full path to the parameter file
njobs_function: str
Name of the function to be called, which needs to exist in this file.

Returns
-------
int
The number of jobs necessary for the array
"""
return eval(f"{njobs_function}('{parameter_file}')")


def njobs_snap_run(parameter_file: str) -> int:
"""Figure out how many jobs are necessary in the array to successfully run the job run_snap.

Parameters
----------
parameter_file: str
Full path to the parameter file

Returns
-------
int
The number of jobs necessary for the array
"""
search_parameters = [
"general:tracks:track",
"general:tracks:asc_dsc",
"general:input-data:sensor",
"general:timeframe:start",
"general:timeframe:end",
"general:timeframe:mother",
]
out_parameters = read_parameter_file(parameter_file, search_parameters)
if len(out_parameters["general:tracks:track"]) > 1:
raise ValueError(f"Expected single track, got {out_parameters['general:tracks:track']}!")

track_fmt = (
f"{out_parameters['general:input-data:sensor'].lower()}_{out_parameters['general:tracks:asc_dsc'][0]}_"
f"t{out_parameters['general:tracks:track'][0]:0>3d}"
)

snap_directory = format_process_folder(
parameter_file=parameter_file,
job_description=JOB_DEFINITIONS["snap_preparation"],
track=out_parameters["general:tracks:track"][0],
)

other_parameters = {
"track": out_parameters["general:tracks:track"][0],
"snap-output-path": snap_directory,
"dry_run": "1",
"track_formatted": track_fmt,
}

# and the start, end, and mother dates
images = glob.glob(f"{CONFIG_PARAMETERS['SLC_BASE_DIRECTORY']}/{track_fmt}/IW_SLC__1SDV_VVVH/2*")
images = [eval(image.split("/")[-1]) for image in images]

start_date = eval(out_parameters["general:timeframe:start"].replace("-", ""))
end_date = eval(out_parameters["general:timeframe:end"].replace("-", ""))
mother_date = eval(out_parameters["general:timeframe:mother"].replace("-", ""))

# then select and format the start, end, and master dates
other_parameters["start_date"] = str(min([image for image in images if image >= start_date]))
other_parameters["start_date"] = (
f"{other_parameters['start_date'][:4]}-"
f"{other_parameters['start_date'][4:6]}-"
f"{other_parameters['start_date'][6:]}"
)
other_parameters["end_date"] = str(max([image for image in images if image <= end_date]))
other_parameters["end_date"] = (
f"{other_parameters['end_date'][:4]}-"
f"{other_parameters['end_date'][4:6]}-"
f"{other_parameters['end_date'][6:]}"
)
other_parameters["mother_date"] = str(min([image for image in images if image >= mother_date]))
other_parameters["mother_date"] = (
f"{other_parameters['mother_date'][:4]}-"
f"{other_parameters['mother_date'][4:6]}-"
f"{other_parameters['mother_date'][6:]}"
)

write_run_file(
save_path=f"{CONFIG_PARAMETERS['TEMPORARY_STORAGE_DIRECTORY']}/njobs_run_snap.sh",
template_path=f"{CONFIG_PARAMETERS['CAROLINE_INSTALL_DIRECTORY']}/templates/snap/generate-snap-graphs.sh",
asc_dsc=out_parameters["general:tracks:asc_dsc"][0],
track=out_parameters["general:tracks:track"][0],
parameter_file=parameter_file,
parameter_file_parameters=[
"snap:general:AoI-name",
],
config_parameters=[
"caroline_work_directory",
"caroline_virtual_environment_directory",
"caroline_install_directory",
"slc_base_directory",
],
other_parameters=other_parameters,
)

njobs = os.popen(f"cd {CONFIG_PARAMETERS['TEMPORARY_STORAGE_DIRECTORY']}; " "bash njobs_run_snap.sh; ").read()
os.system(f"rm -rf {CONFIG_PARAMETERS['TEMPORARY_STORAGE_DIRECTORY']}/njobs_run_snap.sh")

return int(njobs)
218 changes: 218 additions & 0 deletions caroline/preparation.py
Original file line number Diff line number Diff line change
Expand Up @@ -1918,6 +1918,224 @@ def prepare_s1_download(parameter_file: str, do_track: int | list | None = None)
exit(5) # Make the code exit with a non-zero exit code so the next steps won't run


def prepare_snap_permissions(parameter_file: str, do_track: int | list | None = None) -> None:
"""Unzip and then remove the zipped ZNAP archives.

Parameters
----------
parameter_file: str
Absolute path to the parameter file.
do_track: int | list | None, optional
Track number, or list of track numbers, of the track(s) to prepare. `None` (default) prepares all tracks in
the parameter file
"""
search_parameters = [
"general:tracks:track",
"general:tracks:asc_dsc",
]
out_parameters = read_parameter_file(parameter_file, search_parameters)

tracks = out_parameters["general:tracks:track"]

for track in range(len(tracks)):
if isinstance(do_track, int):
if tracks[track] != do_track:
continue
elif isinstance(do_track, list):
if tracks[track] not in do_track:
continue

snap_directory = format_process_folder(
parameter_file=parameter_file, job_description=JOB_DEFINITIONS["snap_run"], track=tracks[track]
)

znaps = glob.glob(f"{snap_directory}/*-coreg.znap")
for zf in znaps:
os.system(f"chmod 775 {zf}; chmod 775 {zf}/*; chmod 775 {zf}/*/*; chmod 775 {zf}/*/*/*")


def prepare_snap_preparation(parameter_file: str, do_track: int | list | None = None) -> None:
"""Set up the directories and run files for SNAP preparation.

Parameters
----------
parameter_file: str
Absolute path to the parameter file.
do_track: int | list | None, optional
Track number, or list of track numbers, of the track(s) to prepare. `None` (default) prepares all tracks in
the parameter file
"""
search_parameters = [
"snap:general:AoI-name",
"snap:general:directory",
"general:tracks:track",
"general:tracks:asc_dsc",
"general:input-data:sensor",
"general:shape-file:aoi-name",
"general:shape-file:directory",
"general:timeframe:start",
"general:timeframe:end",
"general:timeframe:mother",
]
out_parameters = read_parameter_file(parameter_file, search_parameters)

tracks = out_parameters["general:tracks:track"]
asc_dsc = out_parameters["general:tracks:asc_dsc"]

shapefile_name = (
f"{out_parameters['general:shape-file:directory']}/{out_parameters['general:shape-file:aoi-name']}_shape.shp"
)

for track in range(len(tracks)):
if isinstance(do_track, int):
if tracks[track] != do_track:
continue
elif isinstance(do_track, list):
if tracks[track] not in do_track:
continue

snap_directory = format_process_folder(
parameter_file=parameter_file, job_description=JOB_DEFINITIONS["snap_preparation"], track=tracks[track]
)

os.makedirs(snap_directory, exist_ok=True)

track_fmt = f"{out_parameters['general:input-data:sensor'].lower()}_{asc_dsc[track]}_t{tracks[track]:0>3d}"

other_parameters = {
"track": tracks[track],
"snap-output-path": snap_directory,
"dry_run": "0",
"track_formatted": track_fmt,
}

# and the start, end, and mother dates
images = glob.glob(f"{CONFIG_PARAMETERS['SLC_BASE_DIRECTORY']}/{track_fmt}/IW_SLC__1SDV_VVVH/2*")
images = [eval(image.split("/")[-1]) for image in images]

start_date = eval(out_parameters["general:timeframe:start"].replace("-", ""))
end_date = eval(out_parameters["general:timeframe:end"].replace("-", ""))
mother_date = eval(out_parameters["general:timeframe:mother"].replace("-", ""))

# then select and format the start, end, and master dates
other_parameters["start_date"] = str(min([image for image in images if image >= start_date]))
other_parameters["start_date"] = (
f"{other_parameters['start_date'][:4]}-"
f"{other_parameters['start_date'][4:6]}-"
f"{other_parameters['start_date'][6:]}"
)
other_parameters["end_date"] = str(max([image for image in images if image <= end_date]))
other_parameters["end_date"] = (
f"{other_parameters['end_date'][:4]}-"
f"{other_parameters['end_date'][4:6]}-"
f"{other_parameters['end_date'][6:]}"
)
other_parameters["mother_date"] = str(min([image for image in images if image >= mother_date]))
other_parameters["mother_date"] = (
f"{other_parameters['mother_date'][:4]}-"
f"{other_parameters['mother_date'][4:6]}-"
f"{other_parameters['mother_date'][6:]}"
)

# generate the WKT file
write_run_file(
save_path=f"{snap_directory}/aoi.wkt",
template_path=f"{CONFIG_PARAMETERS['CAROLINE_INSTALL_DIRECTORY']}/templates/snap/aoi.wkt",
asc_dsc=asc_dsc[track],
track=tracks[track],
parameter_file=parameter_file,
other_parameters={"wkt_string": convert_shp_to_wkt(shapefile_name)},
)

# generate generate-snap-graphs.sh
write_run_file(
save_path=f"{snap_directory}/generate-snap-graphs.sh",
template_path=f"{CONFIG_PARAMETERS['CAROLINE_INSTALL_DIRECTORY']}/templates/snap/generate-snap-graphs.sh",
asc_dsc=asc_dsc[track],
track=tracks[track],
parameter_file=parameter_file,
parameter_file_parameters=[
"snap:general:AoI-name",
],
config_parameters=[
"caroline_work_directory",
"caroline_virtual_environment_directory",
"caroline_install_directory",
"slc_base_directory",
],
other_parameters=other_parameters,
)

write_directory_contents(
snap_directory,
filename=f'dir_contents{JOB_DEFINITIONS["snap_preparation"]["directory-contents-file-appendix"]}.txt',
)


def prepare_snap_run(parameter_file: str, do_track: int | list | None = None) -> None:
"""Set up the directories and run files for SNAP preparation.

Parameters
----------
parameter_file: str
Absolute path to the parameter file.
do_track: int | list | None, optional
Track number, or list of track numbers, of the track(s) to prepare. `None` (default) prepares all tracks in
the parameter file
"""
search_parameters = [
"snap:general:AoI-name",
"snap:general:directory",
"general:tracks:track",
"general:tracks:asc_dsc",
"general:input-data:sensor",
]
out_parameters = read_parameter_file(parameter_file, search_parameters)

tracks = out_parameters["general:tracks:track"]
asc_dsc = out_parameters["general:tracks:asc_dsc"]

for track in range(len(tracks)):
if isinstance(do_track, int):
if tracks[track] != do_track:
continue
elif isinstance(do_track, list):
if tracks[track] not in do_track:
continue

snap_directory = format_process_folder(
parameter_file=parameter_file, job_description=JOB_DEFINITIONS["snap_run"], track=tracks[track]
)

if "--constraint=rome" in JOB_DEFINITIONS["snap_run"]["sbatch-args"]:
rome_constrained = "1"
else:
rome_constrained = "0"

os.makedirs(snap_directory, exist_ok=True)

# generate run-snap-graph.sh
write_run_file(
save_path=f"{snap_directory}/run-snap-graph.sh",
template_path=f"{CONFIG_PARAMETERS['CAROLINE_INSTALL_DIRECTORY']}/templates/snap/run-snap-graph.sh",
asc_dsc=asc_dsc[track],
track=tracks[track],
parameter_file=parameter_file,
parameter_file_parameters=["snap:general:AoI-name"],
config_parameters=["caroline_work_directory", "caroline_virtual_environment_directory"],
other_parameters={
"track": tracks[track],
"snap-output-path": snap_directory,
"rome-constrained": rome_constrained,
},
)

write_directory_contents(
snap_directory,
filename=f'dir_contents{JOB_DEFINITIONS["snap_run"]["directory-contents-file-appendix"]}.txt',
)


def prepare_stm_generation(parameter_file: str, do_track: int | list | None = None) -> None:
"""Set up the directories and run files for STM generation.

Expand Down
Loading
Loading