From 597ac6f780ff6184be7b327b1f3f540ce7e88b6e Mon Sep 17 00:00:00 2001 From: Gwangmu Lee Date: Thu, 21 Aug 2025 16:07:56 +0200 Subject: [PATCH 1/4] Add an explicit warning to --environment as /home/gwangmu/.vimclipSBATCH --- docs/software/container-engine/known-issue.md | 25 +++++++++++++++++++ docs/software/container-engine/run.md | 5 ++-- 2 files changed, 27 insertions(+), 3 deletions(-) diff --git a/docs/software/container-engine/known-issue.md b/docs/software/container-engine/known-issue.md index 7569df1b..478084e9 100644 --- a/docs/software/container-engine/known-issue.md +++ b/docs/software/container-engine/known-issue.md @@ -52,3 +52,28 @@ Mounting individual home directories (usually located on the `/users` filesystem It is generally NOT recommended to mount home folders inside containers, due to the risk of exposing personal data to programs inside the container. Defining a mount related to `/users` in the EDF should only be done when there is a specific reason to do so, and the container image being deployed is trusted. + +[](){#ref-ce-why-no-sbatch-env} +## Why `--environment` as `#SBATCH` is discouraged + +Due to how Slurm works, when using `--environment` as an `#SBATCH` option, the entire content of the SBATCH script is executed within a container created by the EDF file. This may cause several counterintuitive implications that can lead to subtle and hard-to-diagnose failures. The following are a few known issues associated with `--environment` in SBATCH. + + - **Slurm availability in the container**: In some cases, CE does not inject essential Slurm components in containers, which result in crashes on basic Slurm operations (e.g., `srun`) inside the SBATCH script. Even if they were injected, it's not guaranteed to cover the complete feature set of Slurm. + + - **The execution context is not the host system**: Since the entire SBATCH script runs inside a container (shaped with EDF), all commands in the script are affected by the environment defined by EDF. This primarily includes filesystem mounts, where any directories not explicitly mounted in EDF are invisible to all commands inside the SBATCH script. + + - **Nested use of `--environment`**: `--environment` in the SBATCH script _and_ for a `srun` command results in entering the EDF environment twice, causing unexpected errors due to double-entering containers. + +For these reasons, we encourage using `--environment` for each `srun` as shown below. + +```bash +#!/bin/bash +#SBATCH --cpus-per-task=4 +... +srun --environment=my_edf echo 'this' +... +srun --environment=my_edf echo 'that' +... +``` + +As the use of `--environment` as an `#SBATCH` option is reserved for highly customized workflows, users should have a high level of proficiency and a full understanding of the risk to encounter cryptic behaviors. Should users encounter a problem while using `--environment` as `#SBATCH`, it's recommended to move `--environment` from `#SBATCH` to each `srun` and see if the problem disappears. diff --git a/docs/software/container-engine/run.md b/docs/software/container-engine/run.md index 6d399d91..9f3df0df 100644 --- a/docs/software/container-engine/run.md +++ b/docs/software/container-engine/run.md @@ -42,9 +42,8 @@ Use `--environment` with the Slurm command (e.g., `srun` or `salloc`): Specifying the `--environment` option with an `#SBATCH` option is **experimental**. Such usage is discouraged as it may result in unexpected behaviors. -!!! note - Specifying `--environment` with `#SBATCH` will put the entire batch script inside the containerized environment, requiring the Slurm hook to use any Slurm commands within the batch script (e.g., `srun` or `scontrol`). - The hook is controlled by the `ENROOT_SLURM_HOOK` environment variable and activated by default on most vClusters. +!!! warning + The use of `--environment` as an `#SBATCH` option is reserved for highly customized workflows, and it may result in several **counterintuitive, hard-to-diagnose failures**. See [Why `--environment` as `#SBATCH` is discouraged][ref-ce-why-no-sbatch-env] for details. [](){#ref-ce-edf-search-path} ### EDF search path From 65236ea55593be83641fc6e5bbbfe6c08257adbd Mon Sep 17 00:00:00 2001 From: gwangmu <43980405+gwangmu@users.noreply.github.com> Date: Fri, 22 Aug 2025 09:49:24 +0200 Subject: [PATCH 2/4] Update docs/software/container-engine/known-issue.md Co-authored-by: Ben Cumming --- docs/software/container-engine/known-issue.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/software/container-engine/known-issue.md b/docs/software/container-engine/known-issue.md index 478084e9..04314fcb 100644 --- a/docs/software/container-engine/known-issue.md +++ b/docs/software/container-engine/known-issue.md @@ -56,7 +56,8 @@ Defining a mount related to `/users` in the EDF should only be done when there i [](){#ref-ce-why-no-sbatch-env} ## Why `--environment` as `#SBATCH` is discouraged -Due to how Slurm works, when using `--environment` as an `#SBATCH` option, the entire content of the SBATCH script is executed within a container created by the EDF file. This may cause several counterintuitive implications that can lead to subtle and hard-to-diagnose failures. The following are a few known issues associated with `--environment` in SBATCH. +Due to how Slurm works, when using `--environment` as an `#SBATCH` option, the entire contents of the SBATCH script is executed within a container created by the EDF file. +This can lead to subtle and hard-to-diagnose failures, some of which are described below. - **Slurm availability in the container**: In some cases, CE does not inject essential Slurm components in containers, which result in crashes on basic Slurm operations (e.g., `srun`) inside the SBATCH script. Even if they were injected, it's not guaranteed to cover the complete feature set of Slurm. From b2b55b0575d2180e20f27ab2a51a6f2bfafe9d58 Mon Sep 17 00:00:00 2001 From: Gwangmu Lee Date: Thu, 28 Aug 2025 17:33:34 +0200 Subject: [PATCH 3/4] Reflect comments --- docs/software/container-engine/known-issue.md | 23 ++++--------------- docs/software/container-engine/run.md | 18 +++++++++++++-- 2 files changed, 21 insertions(+), 20 deletions(-) diff --git a/docs/software/container-engine/known-issue.md b/docs/software/container-engine/known-issue.md index 04314fcb..d1e7038e 100644 --- a/docs/software/container-engine/known-issue.md +++ b/docs/software/container-engine/known-issue.md @@ -56,25 +56,12 @@ Defining a mount related to `/users` in the EDF should only be done when there i [](){#ref-ce-why-no-sbatch-env} ## Why `--environment` as `#SBATCH` is discouraged -Due to how Slurm works, when using `--environment` as an `#SBATCH` option, the entire contents of the SBATCH script is executed within a container created by the EDF file. -This can lead to subtle and hard-to-diagnose failures, some of which are described below. +The use of `--environment` as `#SBATCH` is known to cause **unexpected behaviors** and is exclusively reserved for highly customized workflows. This is because `--environment` as `#SBATCH` puts the entire SBATCH script in a container from the EDF file. The following are a few known associated issues. - - **Slurm availability in the container**: In some cases, CE does not inject essential Slurm components in containers, which result in crashes on basic Slurm operations (e.g., `srun`) inside the SBATCH script. Even if they were injected, it's not guaranteed to cover the complete feature set of Slurm. + - **Slurm availability in a container**: Either Slurm components are not completely injected inside a container, or injected Slurm components do not function properly. - - **The execution context is not the host system**: Since the entire SBATCH script runs inside a container (shaped with EDF), all commands in the script are affected by the environment defined by EDF. This primarily includes filesystem mounts, where any directories not explicitly mounted in EDF are invisible to all commands inside the SBATCH script. + - **Non-host execution context**: Since the SBATCH script runs inside a container, most host resources are inaccessible by default unless EDF explicitly exposes them. Affected resources include: filesystems, devices, system resources, container hooks, etc. - - **Nested use of `--environment`**: `--environment` in the SBATCH script _and_ for a `srun` command results in entering the EDF environment twice, causing unexpected errors due to double-entering containers. + - **Nested use of `--environment`**: running `srun --environment` in `#SBATCH --environment` results in double-entering EDF containers, causing unexpected errors in the underlying container runtime. -For these reasons, we encourage using `--environment` for each `srun` as shown below. - -```bash -#!/bin/bash -#SBATCH --cpus-per-task=4 -... -srun --environment=my_edf echo 'this' -... -srun --environment=my_edf echo 'that' -... -``` - -As the use of `--environment` as an `#SBATCH` option is reserved for highly customized workflows, users should have a high level of proficiency and a full understanding of the risk to encounter cryptic behaviors. Should users encounter a problem while using `--environment` as `#SBATCH`, it's recommended to move `--environment` from `#SBATCH` to each `srun` and see if the problem disappears. +To avoid any unexpected confusion, users are advised **not** to use `--environment` as `#SBATCH`. If users encounter a problem while using this, it's recommended to move `--environment` from `#SBATCH` to each `srun` and see if the problem disappears. diff --git a/docs/software/container-engine/run.md b/docs/software/container-engine/run.md index 9f3df0df..090487b4 100644 --- a/docs/software/container-engine/run.md +++ b/docs/software/container-engine/run.md @@ -34,11 +34,25 @@ Use `--environment` with the Slurm command (e.g., `srun` or `salloc`): #SBATCH --job-name=edf-example #SBATCH --time=00:01:00 ... - - # Run job step srun --environment=ubuntu cat /etc/os-release ``` +Multiple Slurm commands may have different EDF environments; this is useful when a single environment is not feasible due to the compatibility issues between programs. + +!!! example "`srun`s with different EDFs" + ```bash + #!/bin/bash + #SBATCH --job-name=edf-example + #SBATCH --time=00:01:00 + ... + srun --environment=env1 ... # (1)! + ... + srun --environment=env2 ... # (2)! + ``` + + 1. Assuming `env1.toml` is at `EDF_PATH`. See [EDF search path][ref-ce-edf-search-path] below. + 2. Assuming `env2.toml` is at `EDF_PATH`. See [EDF search path][ref-ce-edf-search-path] below. + Specifying the `--environment` option with an `#SBATCH` option is **experimental**. Such usage is discouraged as it may result in unexpected behaviors. From 3ffa14d0839fecfc5f2f13720038fb218ef1277a Mon Sep 17 00:00:00 2001 From: Gwangmu Lee Date: Thu, 28 Aug 2025 17:48:40 +0200 Subject: [PATCH 4/4] Reflect comments --- docs/software/container-engine/run.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/software/container-engine/run.md b/docs/software/container-engine/run.md index 090487b4..4b51af1d 100644 --- a/docs/software/container-engine/run.md +++ b/docs/software/container-engine/run.md @@ -37,7 +37,7 @@ Use `--environment` with the Slurm command (e.g., `srun` or `salloc`): srun --environment=ubuntu cat /etc/os-release ``` -Multiple Slurm commands may have different EDF environments; this is useful when a single environment is not feasible due to the compatibility issues between programs. +Multiple Slurm commands may have different EDF environments; this is useful when a single environment is not feasible due to compatibility issues or keep EDF files modular. !!! example "`srun`s with different EDFs" ```bash