You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 9, 2023. It is now read-only.
This section describes how to build and upload templates and artifacts to use in a customized deployment. Once uploaded, the locations of the templates and artifacts are used when deploying the Nextflow on AWS Batch solution (see [Customized Deployment](custom-deploy.md))
4
+
5
+
## Building a Custom Distribution
6
+
7
+
This step involves building a distribution of templates and artifacts from the solution's source code.
8
+
9
+
First, create a local clone of the [Genomics Workflows on AWS](https://github.com/aws-samples/aws-genomics-workflows) source code. The code base contains several directories:
10
+
11
+
*`_scripts/`: Shell scripts for building and uploading the customized distribution of templates and artifacts
12
+
*`docs/`: Source code for the documentation, written in [MarkDown](https://markdownguide.org) for the [MkDocs](https://mkdocs.org) publishing platform. This documentation may be modified, expanded, and contributed in the same way as source code.
13
+
*`src/`: Source code for the components of the solution:
14
+
*`containers/`: CodeBuild buildspec files for building AWS-specific container images and pushing them to ECR
15
+
* `_common/`
16
+
* `build.sh`: A generic build script that first builds a base image for a container, then builds an AWS specific image
17
+
* `entrypoint.aws.sh`: A generic entrypoint script that wraps a call to a binary tool in the container with handlers data staging from/to S3
18
+
* `nextflow/`
19
+
* `Dockerfile`
20
+
* `nextflow.aws.sh`: Docker entrypoint script to execute the Nextflow workflow on AWS Batch
21
+
*`ebs-autoscale/`
22
+
*`get-amazon-ebs-autoscale.sh`: Script to retrieve and install [Amazon EBS Autoscale](https://github.com/awslabs/amazon-ebs-autoscale)
23
+
*`ecs-additions/`: Scripts to be installed on ECS host instances to support the distribution
24
+
*`awscli-shim.sh`: Installed as `/opt/aws-cli/bin/aws` and mounted onto the container, allows container images without full glibc to use the AWS CLI v2 through supplied shared libraries (especially libz) and `LD_LIBRARY_PATH`.
25
+
*`ecs-additions-common.sh`: Utility script to install `fetch_and_run.sh`, Nextflow and Cromwell shims, and swap space
26
+
*`ecs-additions-cromwell-linux2-worker.sh`:
27
+
*`ecs-additions-cromwell.sh`:
28
+
*`ecs-additions-nextflow.sh`:
29
+
*`ecs-additions-step-functions.sh`:
30
+
*`fetch_and_run.sh`: Uses AWS CLI to download and run scripts and zip files from S3
31
+
*`provision.sh`: Appended to the userdata in the launch template created by [gwfcore-launch-template](custom-deploy.md): Starts SSM Agent, ECS Agent, Docker; runs `get-amazon-ebs-autoscale.sh`, `ecs-additions-common.sh` and orchestrator-specific `ecs-additions-` scripts.
32
+
*`lambda/`: Lambda functions to create, modify or delete ECR registries or CodeBuild jobs
33
+
*`templates/`: CloudFormation templates for the solution stack, as described in [Customized Deployment](custom-deploy.md)
34
+
35
+
## Deploying a Custom Distribution
36
+
37
+
The script `_scripts/deploy.sh` will create a custom distribution of artifacts and templates from files in the source tree, then upload this distribution to an S3 bucket. It will optionally also build and deploy a static documentation site from the Markdown documentation files. Its usage is:
--site-bucket BUCKET Deploy documentation site to BUCKET
46
+
--asset-bucket BUCKET Deploy assets to BUCKET
47
+
--asset-profile PROFILE Use PROFILE for AWS CLI commands
48
+
--deploy-region REGION Deploy in region REGION
49
+
--public Deploy to public bucket with '--acl public-read' (Default false)
50
+
--verbose Display more output
51
+
STAGE 'test' or 'production'
52
+
```
53
+
54
+
When running this script from the command line, use the value `test` for the stage. This will deploy the templates and artifacts into a directory `test` in your deployment bucket:
55
+
56
+
```
57
+
$ aws s3 ls s3://my-deployment-bucket/test/
58
+
PRE artifacts/
59
+
PRE templates/
60
+
```
61
+
62
+
Use these values when deploying a customized installation, as described in [Customized Deployment](custom-deploy.md), sections 'Artifacts and Nested Stacks' and 'Nextflow'. In the example from above, the values to use would be:
The use of `production` for stage is reserved for deployments from a Travis CI/CD environment; this usage will deploy into a subdirectory named after the current release tag.
Deployments of the 'Nextflow on AWS Batch' solution are based on nested CloudFormation templates, and on artifacts comprising scripts, software packages, and configuration files. The templates and artifacts are stored in S3 buckets, and their S3 URLs are used when launching the top-level template and as parameters to that template's deployment.
4
+
5
+
## VPC
6
+
The quick start link deploys the [AWS VPC Quickstart](https://aws.amazon.com/quickstart/architecture/vpc/), which creates a VPC with up to 4 Availability Zones, each with a public subnet and a private subnet with NAT Gateway access to the Internet.
7
+
8
+
## Genomics Workflow Core
9
+
This quick start link deploys the CloudFormation template `gwfcore-root.template.yaml` for the Genomics Workflow Core (GWFCore) from the [Genomics Workflows on AWS](https://github.com/aws-samples/aws-genomics-workflows) solution. This template launches a number of nested templates, as shown below:
10
+
11
+
* Root Stack __gwfcore-root__ - Top level template for Genomics Workflow Core
12
+
* S3 Stack __gwfcore-s3__ - S3 bucket (new or existing) for storing analysis results
13
+
* IAM Stack __gwfcore-iam__ - Creates IAM roles to use with AWS Batch scalable genomics workflow environment
14
+
* Code Stack __gwfcore-code__ - Creates AWS CodeCommit repos and CodeBuild projects for Genomics Workflows Core assets and artifacts
15
+
* Launch Template Stack __gwfcore-launch-template__ - Creates an EC2 Launch Template for AWS Batch based genomics workflows
16
+
* Batch Stack __gwfcore-batch__ - Deploys resource for a AWS Batch environment that is suitable for genomics, including default and high-priority JobQueues
17
+
18
+
### Root Stack
19
+
The quick start solution links to the CloudFormation console, where the 'Amazon S3 URL' field is prefilled with the S3 URL of a copy of the root stack template, hosted in the public S3 bucket __aws-genomics-workflows__.
To use a customized root stack, upload your modified stack template to an S3 bucket (see [Building a Custom Distribution](build-custom-distribution.md)), and specify that template's URL in 'Amazon S3 URL'.
27
+
28
+
### Artifacts and Nested Stacks
29
+
The subsequent screen, 'Specify Stack Details', allows for customization of the deployed resources in the 'Distribution Configuration' section.
*__Artifact S3 Bucket Name__ and __Artifact S3 Prefix__ define the location of the artifacts uploaded prior to this deployment. By default, pre-prepared artifacts are stored in the __aws-genomics-workflows__ bucket.
37
+
*__Template Root URL__ defines the bucket and prefix used to store nested templates, called by the root template.
38
+
39
+
To use your own modified artifacts or nested templates, build and upload as described in [Building a Custom Distribution](build-custom-distribution.md), and specify the bucket and prefix in the fields above.
40
+
41
+
## Workflow Orchestrators
42
+
### Nextflow
43
+
This quick start deploys the Nextflow template `nextflow-resources.template.yaml`, which launches one nested stack:
44
+
45
+
* Root Stack __nextflow-resources__ - Creates resources specific to running Nextflow on AWS
46
+
* Container Build Stack __container-build__ - Creates resources for building a Docker container image using CodeBuild, storing the image in ECR, and optionally creating a corresponding Batch Job Definition
47
+
48
+
The nextflow root stack is specified in the same way as the GWFCore root stack, above, and a location for a modified root stack may be specified as with the Core stack.
49
+
50
+
The subsequent 'Specify Stack Details' screen has fields allowing the customization of the Nextflow deployment.
*__S3NextflowPrefix__, __S3LogsDirPrefix__, and __S3WorkDirPrefix__ specify the path within the GWFCore bucket in which to store per-run data and log files.
58
+
*__TemplateRootUrl__ specifies the path to the nested templates called by the Nextflow root template, as with the GWFCore root stack.
Copy file name to clipboardExpand all lines: docs/core-env/setup-aws-batch.md
+11-14Lines changed: 11 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,8 +46,8 @@ A complete AWS Batch environment consists of the following:
46
46
47
47
1. A Compute Environment that utilizes [EC2 Spot instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html) for cost-effective computing
48
48
2. A Compute Environment that utilizes EC2 on-demand (e.g. [public pricing](https://aws.amazon.com/ec2/pricing/on-demand/)) instances for high-priority work that can't risk job interruptions or delays due to insufficient Spot capacity.
49
-
3. A default Job Queue that utilizes the Spot compute environment first, but spills over to the on-demand compute environment if defined capacity limits (i.e. Max vCPUs) are reached.
50
-
4. A priority Job Queue that leverages the on-demand and Spot CE's (in that order) and has higher priority than the default queue.
49
+
3. A default Job Queue that solely utilizes the Spot compute environment. This is for jobs where timeliness isn't a constraint, and can wait for the right instances to become available, as well has handle interruption. It also ensures the most cost savings.
50
+
4. A priority Job Queue that leverages the on-demand, and optionally Spot, CE's (in that order) and has higher priority than the default queue. This is for jobs that cannot handle interruption, and need to be executed immediately.
51
51
52
52
### Automated via CloudFormation
53
53
@@ -81,7 +81,7 @@ You can create several compute environments to suit your needs. Below we'll cre
81
81
6. In the "Service role" drop down, select the `AWSBatchServiceRole` you created previously
82
82
7. In the "Instance role" drop down, select the `ecsInstanceRole` you created previously
83
83
8. For "Provisioning model" select "On-Demand"
84
-
9. "Allowed instance types" will be already populated with "optimal" - which is a mixture of M4, C4, and R4 instances.
84
+
9. "Allowed instance types" will be already populated with "optimal" - which is a mixture of M4, C4, and R4 instances. This should be sufficient for demonstration purposes. In a production setting, it is recommended to specify the instance famimlies and sizes most apprioriate for the jobs the CE will support. For the On-Demand CE, selecting newer instance types is beneficial as they tend to have better price per performance.
85
85
10. "Allocation strategy" will already be set to `BEST_FIT`. This is recommended for on-demand based compute environments as it ensures the most cost efficiency.
86
86
11. In the "Launch template" drop down, select the `genomics-workflow-template` you created previously
87
87
12. Set Minimum and Desired vCPUs to 0.
@@ -112,7 +112,7 @@ Click on "Create"
112
112
6. In the "Service role" drop down, select the `AWSBatchServiceRole` you created previously
113
113
7. In the "Instance role" drop down, select the `ecsInstanceRole` you created previously
114
114
8. For "Provisioning model" select "Spot"
115
-
9. "Allowed instance types" will be already populated with "optimal" - which is a mixture of M4, C4, and R4 instances.
115
+
9. "Allowed instance types" will be already populated with "optimal" - which is a mixture of M4, C4, and R4 instances. This should be sufficient for demonstration purposes. In a production setting, it is recommended to specify the instance families and sizes most appropriate for the jobs the CE will support. For the SPOT CE a wider diversity of instance types is recommended to maximize the pools from which capacity can be drawn from. Limiting the size of instances is also recommended to avoid scheduling too many jobs on a SPOT instance that could be interrupted.
116
116
10. "Allocation strategy" will already be set to `SPOT_CAPACITY_OPTIMIZED`. This is recommended for Spot based compute environments as it ensures the most compute capacity is available for your jobs.
117
117
11. In the "Launch template" drop down, select the `genomics-workflow-template` you created previously
118
118
12. Set Minimum and Desired vCPUs to 0.
@@ -135,20 +135,18 @@ Job queues can be associated with one or more compute environments in a preferre
135
135
Below we'll create two job queues:
136
136
137
137
* A "Default" job queue
138
-
* A "High Priority" job queue
138
+
* A "Priority" job queue
139
139
140
140
Both job queues will use both compute environments you created previously.
141
141
142
142
##### Create a "default" job queue
143
143
144
-
This queue is intended for jobs that do not require urgent completion, and can handle potential interruption. This queue will schedule jobs to:
144
+
This queue is intended for jobs that do not require urgent completion, and can handle potential interruption. This queue will schedule jobs to only the "spot" compute environment.
145
145
146
-
1. The "spot" compute environment
147
-
2. The "ondemand" compute environment
146
+
!!! note
147
+
It is not recommended to configure a job queue to "spillover" from Spot to On-Demand. Doing so could lead Insufficient Capacity Errors, resulting in Batch unable to schedule jobs, leaving them stuck in "RUNNABLE"
148
148
149
-
in that order.
150
-
151
-
Because it primarily leverages Spot instances, it will also be the most cost effective job queue.
149
+
Because it leverages Spot instances, it will also be the most cost effective job queue.
152
150
153
151
* Go to the AWS Batch Console
154
152
* Click on "Job queues"
@@ -157,8 +155,7 @@ Because it primarily leverages Spot instances, it will also be the most cost eff
157
155
* Set "Priority" to 1
158
156
* Under "Connected compute environments for this queue", using the drop down menu:
159
157
160
-
1. Select the "spot" compute environment you created previously, then
161
-
2. Select the "ondemand" compute environment you created previously
158
+
1. Select the "spot" compute environment you created previously
162
159
163
160
* Click on "Create Job Queue"
164
161
@@ -169,7 +166,7 @@ This queue is intended for jobs that are urgent and **cannot** handle potential
169
166
1. The "ondemand" compute environment
170
167
2. The "spot" compute environment
171
168
172
-
in that order.
169
+
in that order. In this queue configuration, Batch will schedule jobs to the "ondemand" compute environment first. When the number of Max vCPUs for that environment is reached, Batch will begin scheduling jobs to the "spot" compute environment. The use of the "spot" compute environment is optional, and is used to help drain pending jobs from the queue faster.
0 commit comments