This tutorial was created for julia v1.6. There is an R
version
available as well in this github repo. It goes over details of Hoffman2
jobs specific to R as well as running similar simulations in R.
For basic information on Hoffman2, including submitting jobs, resources
available, transfering files, and other general information, refer to
the general README.md file found at the initial page of the github
repo.
There are several versions of Julia available on Hoffman2. To see the versions currently available, type:
module avail julia
To load a module, say julia version 1.6, for use type:
module load julia/1.6
If you are going to need packages installed for your use on Hoffman2,
load julia using julia and then install the packges. Note: This should
be done on a compute node as compiling julia and libraries can take
quite a bit of resources. Therefore, you should use qrsh, discussed in
the general guide. Computing power is limited on login nodes so you
should not run any analyses on the login node.
For most analyses/jobs you’d like to run on Hoffman2, you should use the
qsub command. This submits a batch job to the queue (scheduler). The
type of file you qsub has to have a specific format (shell script).
cat submit.sh
## #!/bin/bash
## #$ -cwd #uses current working directory
## # error = Merged with joblog
## #$ -o joblog.$JOB_ID #creates a file called joblog.jobidnumber to write to.
## #$ -j y
## #$ -l h_rt=0:30:00,h_data=2G #requests 30 minutes, 2GB of data (per core)
## #$ -pe shared 2 #requests 2 cores
## # Email address to notify
## #$ -M $USER@mail #don't change this line, finds your email in the system
## # Notify when
## #$ -m bea #sends you an email (b) when the job begins (e) when job ends (a) when job is aborted (error)
##
## # load the job environment:
## . /u/local/Modules/default/init/modules.sh
## module load julia/1.6 #loads julia/1.6 for use
##
## # run julia code
## echo 'Running runSim.jl for n = 100' #prints this quote to joblog.jobidnumber
## julia runSim.jl 100 100 123 "Normal()" > output.$JOB_ID 2>&1 # seed n reps seed distribition
## # command-line arguments: number of samples, number of repetitions, seed, command to create n samples
## # outputs any text (stdout and stderr) to output.$JOB_ID
To send this script to the scheduler to run on a compute node, you would simply type:
qsub submit.sh
For some analyses, you may want to do things interactively instead of
just submitting jobs. The qrsh command is for loading you onto an
interactive compute node.
Typing qrsh on the Hoffman2 login node will submit q request for an
interactive session. By default, the session will run for two hours and
the physical memory alotted will be 1GB.
To request more, you can use the commmand
qrsh -l h_rt=4:00:00,h_data=4G
This will request a four hour session where the maximum physical memory is 4GB.
If you’d like to use more than one CPU core for your job, add
-pe shared # to the end. Note, the amount of memory requested will be
for each core. For example, if you’d like to request 4 CPU cores, each
with 2GB of memory for a total of 8GB for 5 hours, run:
qrsh -l h_rt=5:00:00,h_data=2G -pe shared 4
The more time and memory you request, the longer you will have to wait for an interactive compute node to become available to you. It’s normal to wait a few minutes to get an interactive session.
For more advanced options you can use
qrsh -help
Once the interactive session loads, you can use
module load julia/1.6
julia
to load julia v1.6.
The maximum time for a session is 24 hours unless you’re working in a
group that owns their compute nodes. So do not have an h_rt value
greated than h_rt=24:00:00.
Different compute nodes have different amounts of memory. There are fewer nodes with lots of memory, so the larger the amount of memory you’re requesting the longer you will have to wait for the job to start running. If you request too much, the job may never run.
Requesting more than 4 cores for an interactive session can possibly take a long time for the interactive session to start.
The runSim.jl runs a simulation study to compare two
methods for estimating mean: est_mean_prime and est_mean_avg. In
each replicate, it generates a random vector of sample size n, from
distribution dist, and using seed seed. There are reps replicates.
Values of n, dist, seed and reps are to be defined by the user.
oFile is the file to save the results under. Simulation results are
written to a CSV file, outfile.
cat runSim.jl
## using DelimitedFiles, Distributions, Primes, Random, Statistics
## n = parse(Int, ARGS[1])
## reps = parse(Int, ARGS[2])
## seed = parse(Int, ARGS[3])
## if length(ARGS) < 4
## d = Normal()
## else
## d = eval(Meta.parse(ARGS[4]))
## end
##
## """
## est_mean_prime(x)
##
## Estimate mean by averaging prime-indexed data in `x`.
## """
## function est_mean_prime(x::Vector{<:Real})
## s, c = zero(eltype(x)), 0
## for i in eachindex(x)
## if Primes.isprime(i)
## s += x[i]
## c += 1
## end
## end
## s / c
## end
##
## """
## est_mean_avg(x)
##
## Estimate mean by sample average.
## """
## function est_mean_avg(x::Vector{<:Real})
## mean(x)
## end
##
## function compare_methods(n::Int, d::ContinuousUnivariateDistribution)
## x = rand(d, n)
## est_mean_avg(x), est_mean_prime(x)
## end
##
## # Simulate `reps` replicates of sample size `n` from distribution `d` using seed `s`
## simres = zeros(reps, 2)
## Random.seed!(seed)
## for r in 1:reps
## x = rand(d, n)
## simres[r, 1] = est_mean_avg(x)
## simres[r, 2] = est_mean_prime(x)
## end
##
## outfile = "simresults/n_$(n)_reps_$(reps)_dist_$(d).txt"
## DelimitedFiles.writedlm(outfile, simres, ",")
## # may add an optional zip step here to reduce storage
To run this simulation from command line, user needs to pass values for
n, reps,seed, and dist, in that order. dist is a string that
contains a command to initialize a distribution. For example,
module load julia/1.6
julia runSim.jl 100 100 123 "Normal()"
We can see the results have been written to the txt file.
head simresults/n_100_reps_100_dist_Normal\{Float64\}\(μ\=0.0\,\ σ\=1.0\).txt
## 0.036692077201688614,0.06410204168833543
## 0.10929809874083732,0.31435190199240665
## -0.04052745259955622,-0.0837351522409666
## 0.04151985695102858,0.0023132346304908322
## 0.05656696361430833,0.13782916923366778
## -0.061773946470394095,-0.09831073727533834
## -0.0028930093438773374,-0.06760720427643079
## -0.036945842876762294,0.055500952666579166
## 0.16861614873909925,-0.10980532477330993
## -0.012577896614038043,0.1523025560690629
If you experience an error, you can take a look at the output.#### file that was generated. This files indicates any output generated in julia.
Alternatively, as mentioned before. You can use a .sh script like the
submit.sh and submit the job using
qsub submit.sh
In many projects, we vary the values of different simulation factors such as sample size, generative model, and so on. We can write a jobarray script to do this.
On a cluster, each simulation can be submitted in a jobarray, which can
be spread across different compute nodes. The syntax depends on the
scheduling system. On UCLA’s Hoffman2 cluster, qsub is used. In
submit_array.sh, we submit sample sizes n (100,
200, …, 500) and generative models given as a command-line argument
(standard normal by default, and it can be anything else) we can submit
the jobs using qsub.
cat submit_array.sh
## #!/bin/bash
## #$ -cwd #uses current working directory
## # error = Merged with joblog
## #$ -o joblog.$JOB_ID.$TASK_ID #creates a file called joblog.jobidnumber.taskidnumber to write to.
## #$ -j y
## #$ -l h_rt=0:30:00,h_data=2G #requests 30 minutes, 2GB of data (per core)
## #$ -pe shared 2 #requests 2 cores
## # Email address to notify
## #$ -M $USER@mail #don't change this line, finds your email in the system
## # Notify when
## #$ -m bea #sends you an email (b) when the job begins (e) when job ends (a) when job is aborted (error)
## #$ -t 100-500:100 # 100 to 500, with step size of 100
##
## # load the job environment:
## . /u/local/Modules/default/init/modules.sh
## module load julia/1.6 #loads julia/1.6 for use
##
## echo ${SGE_TASK_ID}
## echo $1
## # run julia code
## echo Running runSim.jl for n = ${SGE_TASK_ID} #prints this quote to joblog.jobidnumber
## julia runSim.jl ${SGE_TASK_ID} 100 123 $1 > output.$JOB_ID.${SGE_TASK_ID} 2>&1 # n reps seed distribition
So on the cluster we just need to run the following on an interactive compute node
qsub submit_array.sh
One may try different distributions. Examples of 5-df T distribution and
1-df T distributions are given in run_arrays.sh.
cat run_arrays.sh
## qsub submit_array.sh # defaults to Normal()
## qsub submit_array.sh "Normal()"
## qsub submit_array.sh "TDist(5)"
## qsub submit_array.sh "TDist(1)"
You can submit these jobs by just running this script:
bash run_arrays.sh
You can check on the state of your current jobs by running:
myjob
This command tells you the status of your job in the queue qw for
queued and waiting, r for running. It also tells you the number of
cores requested for the job, when the job was requested, and which nodes
the job is running on.
To check the output files generated after the jobs have run:
ls simresults/*.txt
## simresults/n_100_reps_100_dist_Normal{Float64}(μ=0.0, σ=1.0).txt
## simresults/n_100_reps_100_dist_TDist{Float64}(ν=1.0).txt
## simresults/n_100_reps_100_dist_TDist{Float64}(ν=5.0).txt
## simresults/n_10_reps_10_dist_Normal{Float64}(μ=0.0, σ=1.0).txt
## simresults/n_200_reps_100_dist_Normal{Float64}(μ=0.0, σ=1.0).txt
## simresults/n_200_reps_100_dist_TDist{Float64}(ν=1.0).txt
## simresults/n_200_reps_100_dist_TDist{Float64}(ν=5.0).txt
## simresults/n_300_reps_100_dist_Normal{Float64}(μ=0.0, σ=1.0).txt
## simresults/n_300_reps_100_dist_TDist{Float64}(ν=1.0).txt
## simresults/n_300_reps_100_dist_TDist{Float64}(ν=5.0).txt
## simresults/n_400_reps_100_dist_Normal{Float64}(μ=0.0, σ=1.0).txt
## simresults/n_400_reps_100_dist_TDist{Float64}(ν=1.0).txt
## simresults/n_400_reps_100_dist_TDist{Float64}(ν=5.0).txt
## simresults/n_500_reps_100_dist_Normal{Float64}(μ=0.0, σ=1.0).txt
## simresults/n_500_reps_100_dist_TDist{Float64}(ν=1.0).txt
## simresults/n_500_reps_100_dist_TDist{Float64}(ν=5.0).txt
To use Jupyter Notebook interactively in Hoffman2, follow the instructions linked here
Note, to use Julia in Jupyter notebook, you will need to make sure you
have installed the IJulia package in the version of julia that you
would like to use – to use julia v1.6, login to Hoffman2, use the
qrsh command to get an interactive compute note, then load julia
1.1.0, and launch julia and install the IJulia package.
Julia jobs may randomly crash on Hoffman. There are 3 common reasons. 1.
Insufficient virtual memory. Even if your program require little
physical memory, it may have larger than expected virtual memory
requirement. See how much virtual memory you
need.
Adding the -l exclusive flag bypasses virtual memory checking. 2. If
you are using Julia from the system-wide installation via
module load julia (or downloaded the prebuilt Julia by yourself from
Julia’s download web site), it is recommended to do so on Intel nodes,
as required by the prebuilt Julia. To do so, add the option
-l arch=intel* to qsub, or -l arch=intel\* to qrsh. (The escape
character \ is necessary in qrsh since it is issued from the command
prompt.)
Credit goes to Shiao-Ching Huang (@schuang) for pointing out 2.
Sometimes, you might want to use two-dimensional indexes for running jobs. For example, splitting each of 22 human chromosomes into 16 chunks allowed the runs for TrajGWAS project to finish in a reasonable time. Here is an example job script:
cat index2d_example.sh
## #!/bin/bash
## #$ -cwd
## # error = Merged with joblog
## #$ -o joblog.$JOB_ID.$TASK_ID
## #$ -j y
## #$ -pe shared 2
## #$ -l h_rt=3:00:00,h_data=8G,arch=intel*
## # Email address to notify
## ##$ -M $USER@mail
## # Notify when
## #$ -m a
## # Job array indexes
## #$ -t 1-352:1
##
## ## For use on UCLA's Hoffman2 cluster (Job scheduler: Univa Grid Engine)
##
## NCHUNKS=16
## CHUNKIDX=$(( (${SGE_TASK_ID} - 1) % ${NCHUNKS} + 1 ))
## CHR=$(( (${SGE_TASK_ID} - 1) / ${NCHUNKS} + 1))
##
## PROJECTDIR=... # intentionally omitted
## BGENDIR=... # intentionally omitted
## FITTED_NULL=... # intentionally omitted
## PVALFILE=... # intentionally omitted
##
## . /u/local/Modules/default/init/modules.sh
## echo $CHUNKIDX
## echo $CHR
## echo $PVALFILE
## module load julia/1.5.4
## /usr/bin/time -o ${PROJECTDIR}/times/t-${SGE_TASK_ID} julia --project=${PROJECTDIR} ${PROJECTDIR}/scoretest.jl ${BGENDIR} ${CHR} ${FITTED_NULL} ${PVALFILE} ${CHUNKIDX} ${NCHUNKS}

