Hi,
first, thank you for providing this profile. It's extremely useful.
I'm having a problem to launch multiple jobs at the same time. For example, I want to launch 5 jobs at the same time, each with 64 cores.
If I run snakemake --cores 64, I find that the jobs get launched sequentially rather than in parallel. I understand that this is because I requested a "maximum" of 64 cores, and thus if a job takes that up, I can only run one at a time.
Now, I wrote a function that is passed to the rules threads directive which multiples the workflow.cores by, say, 0.2. So I can pass snakemake --cores 320 and each rule will be allocated 64 cores. However, I am finding that somehow this is getting "squared". What happens is:
- the Snakemake STDOUT (what is shown in the screen) shows the correct number of threads:
rule map_reads:
input: output/mapping/H/catalogue.mmi, output/qc/merged/H_S003_R1.fq.gz, output/qc/merged/H_S003_R2.fq.gz
output: output/mapping/bam/H/H_S003.map.bam
log: output/logs/mapping/map_reads/H-H_S003.log
jobid: 247
benchmark: output/benchmarks/mapping/map_reads/H-H_S003.txt
reason: Missing output files: output/mapping/bam/H/H_S003.map.bam
wildcards: binning_group=H, sample=H_S003
threads: 64
resources: tmpdir=/tmp, mem_mb=149952
minimap2 -t 64 > output/mapping/bam/H/H_S003.map.bam
Submitted job 247 with external jobid '35894421'.
That looks fine. I want this rule to be launched with 64 cores, and when I do this, 5 instances of the rule get launched at the same time.
When I open the job's SLURM log, however, I find that this value of 64 is passed as the "Provided cores" to the job, and thus is multiplied again by 0.2.
Contents of slurm-35894421.out:
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 64
Rules claiming more threads will be scaled down.
Select jobs to execute...
rule map_reads:
input: output/mapping/H/catalogue.mmi, output/qc/merged/H_S003_R1.fq.gz, output/qc/merged/H_S003_R2.fq.gz
output: output/mapping/bam/H/H_S003.map.bam
log: output/logs/mapping/map_reads/H-H_S003.log
jobid: 247
benchmark: output/benchmarks/mapping/map_reads/H-H_S003.txt
reason: Missing output files: output/mapping/bam/H/H_S003.map.bam
wildcards: binning_group=H, sample=H_S003
threads: 13
resources: tmpdir=/tmp, mem_mb=149952
minimap2 -t 13 > output/mapping/bam/H/H_S003.map.bam
Even worse, my job is allocating 64 cores, but only using 13 (64 * 0.2, rounded). It's really weird to me that the Snakemake output shows the "correct" value, but the SLURM log shows the "real" value that was used, i.e. why do they differ?
I am trying to understand what am I doing wrong. Setting a breakpoint on my function used to get the number of threads, the workflow.cores variable is always what I pass to the command line (320), never what shows in the SLURM log.
I tried add a nodes: 5 or a jobs: 5 keys to the profile config.yaml but it doesn't do any good. Is there anything I can modify in the profile to make sure that I can launch as many parallel jobs as I can?
Please let me know what other information I can provide. Thank you very much.
Best,
V
Hi,
first, thank you for providing this profile. It's extremely useful.
I'm having a problem to launch multiple jobs at the same time. For example, I want to launch 5 jobs at the same time, each with 64 cores.
If I run
snakemake --cores 64, I find that the jobs get launched sequentially rather than in parallel. I understand that this is because I requested a "maximum" of 64 cores, and thus if a job takes that up, I can only run one at a time.Now, I wrote a function that is passed to the rules
threadsdirective which multiples theworkflow.coresby, say,0.2. So I can passsnakemake --cores 320and each rule will be allocated 64 cores. However, I am finding that somehow this is getting "squared". What happens is:That looks fine. I want this rule to be launched with 64 cores, and when I do this, 5 instances of the rule get launched at the same time.
When I open the job's SLURM log, however, I find that this value of 64 is passed as the "Provided cores" to the job, and thus is multiplied again by
0.2.Contents of
slurm-35894421.out:Even worse, my job is allocating 64 cores, but only using 13 (
64 * 0.2, rounded). It's really weird to me that the Snakemake output shows the "correct" value, but the SLURM log shows the "real" value that was used, i.e. why do they differ?I am trying to understand what am I doing wrong. Setting a breakpoint on my function used to get the number of threads, the
workflow.coresvariable is always what I pass to the command line (320), never what shows in the SLURM log.I tried add a
nodes: 5or ajobs: 5keys to the profileconfig.yamlbut it doesn't do any good. Is there anything I can modify in the profile to make sure that I can launch as many parallel jobs as I can?Please let me know what other information I can provide. Thank you very much.
Best,
V