Skip to content

Better parallelization for metrics/cms #80

@LuLeom

Description

@LuLeom

metrics/cms is a computationally demanding metric which requires parallelization.

As of now (PR #79) I am using the library BiocParallel which allows me to specify the following (line 41 script.R):

cms<- cms(
...
BPPARAM = MulticoreParam(workers = 8)

Although, this is an arbitrary number of workers hardcoded by me.
I believe there could be a better strategy to dynamically set this value when running the full pipeline on the cloud.

I was thinking of something like

cores_avail <- parallel::detectCores() - 1 #leaving one core free
cores_to_use <- min(trsh, cores_avail)
cms<- cms(
...
BPPARAM = MulticoreParam(workers = cores_to_use)

However, I do not know the effect of trying to use all available cores (except one) when multiple nextflow workflows are running, so we might need a maximum threshold trsh of cores to use (?).

Any input is appreciated!

Metadata

Metadata

Labels

help wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions