Skip to content

Provided cores differ to what's passed in the CLI #97

@vinisalazar

Description

@vinisalazar

Hi,

first, thank you for providing this profile. It's extremely useful.

I'm having a problem to launch multiple jobs at the same time. For example, I want to launch 5 jobs at the same time, each with 64 cores.

If I run snakemake --cores 64, I find that the jobs get launched sequentially rather than in parallel. I understand that this is because I requested a "maximum" of 64 cores, and thus if a job takes that up, I can only run one at a time.

Now, I wrote a function that is passed to the rules threads directive which multiples the workflow.cores by, say, 0.2. So I can pass snakemake --cores 320 and each rule will be allocated 64 cores. However, I am finding that somehow this is getting "squared". What happens is:

  • the Snakemake STDOUT (what is shown in the screen) shows the correct number of threads:
rule map_reads:
    input: output/mapping/H/catalogue.mmi, output/qc/merged/H_S003_R1.fq.gz, output/qc/merged/H_S003_R2.fq.gz
    output: output/mapping/bam/H/H_S003.map.bam
    log: output/logs/mapping/map_reads/H-H_S003.log
    jobid: 247
    benchmark: output/benchmarks/mapping/map_reads/H-H_S003.txt
    reason: Missing output files: output/mapping/bam/H/H_S003.map.bam
    wildcards: binning_group=H, sample=H_S003
    threads: 64
    resources: tmpdir=/tmp, mem_mb=149952


    minimap2 -t 64 > output/mapping/bam/H/H_S003.map.bam

Submitted job 247 with external jobid '35894421'.

That looks fine. I want this rule to be launched with 64 cores, and when I do this, 5 instances of the rule get launched at the same time.

When I open the job's SLURM log, however, I find that this value of 64 is passed as the "Provided cores" to the job, and thus is multiplied again by 0.2.

Contents of slurm-35894421.out:

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 64
Rules claiming more threads will be scaled down.
Select jobs to execute...

rule map_reads:
    input: output/mapping/H/catalogue.mmi, output/qc/merged/H_S003_R1.fq.gz, output/qc/merged/H_S003_R2.fq.gz
    output: output/mapping/bam/H/H_S003.map.bam
    log: output/logs/mapping/map_reads/H-H_S003.log
    jobid: 247
    benchmark: output/benchmarks/mapping/map_reads/H-H_S003.txt
    reason: Missing output files: output/mapping/bam/H/H_S003.map.bam
    wildcards: binning_group=H, sample=H_S003
    threads: 13
    resources: tmpdir=/tmp, mem_mb=149952


    minimap2 -t 13 > output/mapping/bam/H/H_S003.map.bam

Even worse, my job is allocating 64 cores, but only using 13 (64 * 0.2, rounded). It's really weird to me that the Snakemake output shows the "correct" value, but the SLURM log shows the "real" value that was used, i.e. why do they differ?

I am trying to understand what am I doing wrong. Setting a breakpoint on my function used to get the number of threads, the workflow.cores variable is always what I pass to the command line (320), never what shows in the SLURM log.

I tried add a nodes: 5 or a jobs: 5 keys to the profile config.yaml but it doesn't do any good. Is there anything I can modify in the profile to make sure that I can launch as many parallel jobs as I can?

Please let me know what other information I can provide. Thank you very much.

Best,
V

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions