Skip to content
This repository was archived by the owner on Aug 9, 2023. It is now read-only.

Commit 1676f13

Browse files
authored
Merge pull request #159 from aws-samples/master
April updates
2 parents 7b8e77c + 918ed9d commit 1676f13

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+10628
-25
lines changed
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
# Building Custom Resources
2+
3+
This section describes how to build and upload templates and artifacts to use in a customized deployment. Once uploaded, the locations of the templates and artifacts are used when deploying the Nextflow on AWS Batch solution (see [Customized Deployment](custom-deploy.md))
4+
5+
## Building a Custom Distribution
6+
7+
This step involves building a distribution of templates and artifacts from the solution's source code.
8+
9+
First, create a local clone of the [Genomics Workflows on AWS](https://github.com/aws-samples/aws-genomics-workflows) source code. The code base contains several directories:
10+
11+
* `_scripts/`: Shell scripts for building and uploading the customized distribution of templates and artifacts
12+
* `docs/`: Source code for the documentation, written in [MarkDown](https://markdownguide.org) for the [MkDocs](https://mkdocs.org) publishing platform. This documentation may be modified, expanded, and contributed in the same way as source code.
13+
* `src/`: Source code for the components of the solution:
14+
* `containers/`: CodeBuild buildspec files for building AWS-specific container images and pushing them to ECR
15+
* `_common/`
16+
* `build.sh`: A generic build script that first builds a base image for a container, then builds an AWS specific image
17+
* `entrypoint.aws.sh`: A generic entrypoint script that wraps a call to a binary tool in the container with handlers data staging from/to S3
18+
* `nextflow/`
19+
* `Dockerfile`
20+
* `nextflow.aws.sh`: Docker entrypoint script to execute the Nextflow workflow on AWS Batch
21+
* `ebs-autoscale/`
22+
* `get-amazon-ebs-autoscale.sh`: Script to retrieve and install [Amazon EBS Autoscale](https://github.com/awslabs/amazon-ebs-autoscale)
23+
* `ecs-additions/`: Scripts to be installed on ECS host instances to support the distribution
24+
* `awscli-shim.sh`: Installed as `/opt/aws-cli/bin/aws` and mounted onto the container, allows container images without full glibc to use the AWS CLI v2 through supplied shared libraries (especially libz) and `LD_LIBRARY_PATH`.
25+
* `ecs-additions-common.sh`: Utility script to install `fetch_and_run.sh`, Nextflow and Cromwell shims, and swap space
26+
* `ecs-additions-cromwell-linux2-worker.sh`:
27+
* `ecs-additions-cromwell.sh`:
28+
* `ecs-additions-nextflow.sh`:
29+
* `ecs-additions-step-functions.sh`:
30+
* `fetch_and_run.sh`: Uses AWS CLI to download and run scripts and zip files from S3
31+
* `provision.sh`: Appended to the userdata in the launch template created by [gwfcore-launch-template](custom-deploy.md): Starts SSM Agent, ECS Agent, Docker; runs `get-amazon-ebs-autoscale.sh`, `ecs-additions-common.sh` and orchestrator-specific `ecs-additions-` scripts.
32+
* `lambda/`: Lambda functions to create, modify or delete ECR registries or CodeBuild jobs
33+
* `templates/`: CloudFormation templates for the solution stack, as described in [Customized Deployment](custom-deploy.md)
34+
35+
## Deploying a Custom Distribution
36+
37+
The script `_scripts/deploy.sh` will create a custom distribution of artifacts and templates from files in the source tree, then upload this distribution to an S3 bucket. It will optionally also build and deploy a static documentation site from the Markdown documentation files. Its usage is:
38+
39+
```sh
40+
deploy.sh [--site-bucket BUCKET] [--asset-bucket BUCKET]
41+
[--asset-profile PROFILE] [--deploy-region REGION]
42+
[--public] [--verbose]
43+
STAGE
44+
45+
--site-bucket BUCKET Deploy documentation site to BUCKET
46+
--asset-bucket BUCKET Deploy assets to BUCKET
47+
--asset-profile PROFILE Use PROFILE for AWS CLI commands
48+
--deploy-region REGION Deploy in region REGION
49+
--public Deploy to public bucket with '--acl public-read' (Default false)
50+
--verbose Display more output
51+
STAGE 'test' or 'production'
52+
```
53+
54+
When running this script from the command line, use the value `test` for the stage. This will deploy the templates and artifacts into a directory `test` in your deployment bucket:
55+
56+
```
57+
$ aws s3 ls s3://my-deployment-bucket/test/
58+
PRE artifacts/
59+
PRE templates/
60+
```
61+
62+
Use these values when deploying a customized installation, as described in [Customized Deployment](custom-deploy.md), sections 'Artifacts and Nested Stacks' and 'Nextflow'. In the example from above, the values to use would be:
63+
64+
* Artifact S3 Bucket Name: `my-deployment-bucket`
65+
* Artifact S3 Prefix: `test/artifacts`
66+
* Template Root URL: `https://my-deployment-bucket.s3.amazonaws.com/test/templates`
67+
68+
The use of `production` for stage is reserved for deployments from a Travis CI/CD environment; this usage will deploy into a subdirectory named after the current release tag.

docs/core-env/custom-deploy.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Customized Deployment
2+
3+
Deployments of the 'Nextflow on AWS Batch' solution are based on nested CloudFormation templates, and on artifacts comprising scripts, software packages, and configuration files. The templates and artifacts are stored in S3 buckets, and their S3 URLs are used when launching the top-level template and as parameters to that template's deployment.
4+
5+
## VPC
6+
The quick start link deploys the [AWS VPC Quickstart](https://aws.amazon.com/quickstart/architecture/vpc/), which creates a VPC with up to 4 Availability Zones, each with a public subnet and a private subnet with NAT Gateway access to the Internet.
7+
8+
## Genomics Workflow Core
9+
This quick start link deploys the CloudFormation template `gwfcore-root.template.yaml` for the Genomics Workflow Core (GWFCore) from the [Genomics Workflows on AWS](https://github.com/aws-samples/aws-genomics-workflows) solution. This template launches a number of nested templates, as shown below:
10+
11+
* Root Stack __gwfcore-root__ - Top level template for Genomics Workflow Core
12+
* S3 Stack __gwfcore-s3__ - S3 bucket (new or existing) for storing analysis results
13+
* IAM Stack __gwfcore-iam__ - Creates IAM roles to use with AWS Batch scalable genomics workflow environment
14+
* Code Stack __gwfcore-code__ - Creates AWS CodeCommit repos and CodeBuild projects for Genomics Workflows Core assets and artifacts
15+
* Launch Template Stack __gwfcore-launch-template__ - Creates an EC2 Launch Template for AWS Batch based genomics workflows
16+
* Batch Stack __gwfcore-batch__ - Deploys resource for a AWS Batch environment that is suitable for genomics, including default and high-priority JobQueues
17+
18+
### Root Stack
19+
The quick start solution links to the CloudFormation console, where the 'Amazon S3 URL' field is prefilled with the S3 URL of a copy of the root stack template, hosted in the public S3 bucket __aws-genomics-workflows__.
20+
21+
<img src="https://dpkk088kye7gn.cloudfront.net/aws-genomics-workflows/docs/images/custom-deploy-0.png"
22+
alt="custom-deploy-0"
23+
width="100%" height="100%"
24+
class="screenshot" />
25+
26+
To use a customized root stack, upload your modified stack template to an S3 bucket (see [Building a Custom Distribution](build-custom-distribution.md)), and specify that template's URL in 'Amazon S3 URL'.
27+
28+
### Artifacts and Nested Stacks
29+
The subsequent screen, 'Specify Stack Details', allows for customization of the deployed resources in the 'Distribution Configuration' section.
30+
31+
<img src="https://dpkk088kye7gn.cloudfront.net/aws-genomics-workflows/docs/images/custom-deploy-1.png"
32+
alt="custom-deploy-1"
33+
width="70%" height="70%"
34+
class="screenshot" />
35+
36+
* __Artifact S3 Bucket Name__ and __Artifact S3 Prefix__ define the location of the artifacts uploaded prior to this deployment. By default, pre-prepared artifacts are stored in the __aws-genomics-workflows__ bucket.
37+
* __Template Root URL__ defines the bucket and prefix used to store nested templates, called by the root template.
38+
39+
To use your own modified artifacts or nested templates, build and upload as described in [Building a Custom Distribution](build-custom-distribution.md), and specify the bucket and prefix in the fields above.
40+
41+
## Workflow Orchestrators
42+
### Nextflow
43+
This quick start deploys the Nextflow template `nextflow-resources.template.yaml`, which launches one nested stack:
44+
45+
* Root Stack __nextflow-resources__ - Creates resources specific to running Nextflow on AWS
46+
* Container Build Stack __container-build__ - Creates resources for building a Docker container image using CodeBuild, storing the image in ECR, and optionally creating a corresponding Batch Job Definition
47+
48+
The nextflow root stack is specified in the same way as the GWFCore root stack, above, and a location for a modified root stack may be specified as with the Core stack.
49+
50+
The subsequent 'Specify Stack Details' screen has fields allowing the customization of the Nextflow deployment.
51+
52+
<img src="https://dpkk088kye7gn.cloudfront.net/aws-genomics-workflows/docs/images/nextflow-0.png"
53+
alt="nextflow-0"
54+
width="70%" height="70%"
55+
class="screenshot" />
56+
57+
* __S3NextflowPrefix__, __S3LogsDirPrefix__, and __S3WorkDirPrefix__ specify the path within the GWFCore bucket in which to store per-run data and log files.
58+
* __TemplateRootUrl__ specifies the path to the nested templates called by the Nextflow root template, as with the GWFCore root stack.

docs/core-env/setup-aws-batch.md

Lines changed: 11 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,8 @@ A complete AWS Batch environment consists of the following:
4646

4747
1. A Compute Environment that utilizes [EC2 Spot instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html) for cost-effective computing
4848
2. A Compute Environment that utilizes EC2 on-demand (e.g. [public pricing](https://aws.amazon.com/ec2/pricing/on-demand/)) instances for high-priority work that can't risk job interruptions or delays due to insufficient Spot capacity.
49-
3. A default Job Queue that utilizes the Spot compute environment first, but spills over to the on-demand compute environment if defined capacity limits (i.e. Max vCPUs) are reached.
50-
4. A priority Job Queue that leverages the on-demand and Spot CE's (in that order) and has higher priority than the default queue.
49+
3. A default Job Queue that solely utilizes the Spot compute environment. This is for jobs where timeliness isn't a constraint, and can wait for the right instances to become available, as well has handle interruption. It also ensures the most cost savings.
50+
4. A priority Job Queue that leverages the on-demand, and optionally Spot, CE's (in that order) and has higher priority than the default queue. This is for jobs that cannot handle interruption, and need to be executed immediately.
5151

5252
### Automated via CloudFormation
5353

@@ -81,7 +81,7 @@ You can create several compute environments to suit your needs. Below we'll cre
8181
6. In the "Service role" drop down, select the `AWSBatchServiceRole` you created previously
8282
7. In the "Instance role" drop down, select the `ecsInstanceRole` you created previously
8383
8. For "Provisioning model" select "On-Demand"
84-
9. "Allowed instance types" will be already populated with "optimal" - which is a mixture of M4, C4, and R4 instances.
84+
9. "Allowed instance types" will be already populated with "optimal" - which is a mixture of M4, C4, and R4 instances. This should be sufficient for demonstration purposes. In a production setting, it is recommended to specify the instance famimlies and sizes most apprioriate for the jobs the CE will support. For the On-Demand CE, selecting newer instance types is beneficial as they tend to have better price per performance.
8585
10. "Allocation strategy" will already be set to `BEST_FIT`. This is recommended for on-demand based compute environments as it ensures the most cost efficiency.
8686
11. In the "Launch template" drop down, select the `genomics-workflow-template` you created previously
8787
12. Set Minimum and Desired vCPUs to 0.
@@ -112,7 +112,7 @@ Click on "Create"
112112
6. In the "Service role" drop down, select the `AWSBatchServiceRole` you created previously
113113
7. In the "Instance role" drop down, select the `ecsInstanceRole` you created previously
114114
8. For "Provisioning model" select "Spot"
115-
9. "Allowed instance types" will be already populated with "optimal" - which is a mixture of M4, C4, and R4 instances.
115+
9. "Allowed instance types" will be already populated with "optimal" - which is a mixture of M4, C4, and R4 instances. This should be sufficient for demonstration purposes. In a production setting, it is recommended to specify the instance families and sizes most appropriate for the jobs the CE will support. For the SPOT CE a wider diversity of instance types is recommended to maximize the pools from which capacity can be drawn from. Limiting the size of instances is also recommended to avoid scheduling too many jobs on a SPOT instance that could be interrupted.
116116
10. "Allocation strategy" will already be set to `SPOT_CAPACITY_OPTIMIZED`. This is recommended for Spot based compute environments as it ensures the most compute capacity is available for your jobs.
117117
11. In the "Launch template" drop down, select the `genomics-workflow-template` you created previously
118118
12. Set Minimum and Desired vCPUs to 0.
@@ -135,20 +135,18 @@ Job queues can be associated with one or more compute environments in a preferre
135135
Below we'll create two job queues:
136136

137137
* A "Default" job queue
138-
* A "High Priority" job queue
138+
* A "Priority" job queue
139139

140140
Both job queues will use both compute environments you created previously.
141141

142142
##### Create a "default" job queue
143143

144-
This queue is intended for jobs that do not require urgent completion, and can handle potential interruption. This queue will schedule jobs to:
144+
This queue is intended for jobs that do not require urgent completion, and can handle potential interruption. This queue will schedule jobs to only the "spot" compute environment.
145145

146-
1. The "spot" compute environment
147-
2. The "ondemand" compute environment
146+
!!! note
147+
It is not recommended to configure a job queue to "spillover" from Spot to On-Demand. Doing so could lead Insufficient Capacity Errors, resulting in Batch unable to schedule jobs, leaving them stuck in "RUNNABLE"
148148

149-
in that order.
150-
151-
Because it primarily leverages Spot instances, it will also be the most cost effective job queue.
149+
Because it leverages Spot instances, it will also be the most cost effective job queue.
152150

153151
* Go to the AWS Batch Console
154152
* Click on "Job queues"
@@ -157,8 +155,7 @@ Because it primarily leverages Spot instances, it will also be the most cost eff
157155
* Set "Priority" to 1
158156
* Under "Connected compute environments for this queue", using the drop down menu:
159157

160-
1. Select the "spot" compute environment you created previously, then
161-
2. Select the "ondemand" compute environment you created previously
158+
1. Select the "spot" compute environment you created previously
162159

163160
* Click on "Create Job Queue"
164161

@@ -169,7 +166,7 @@ This queue is intended for jobs that are urgent and **cannot** handle potential
169166
1. The "ondemand" compute environment
170167
2. The "spot" compute environment
171168

172-
in that order.
169+
in that order. In this queue configuration, Batch will schedule jobs to the "ondemand" compute environment first. When the number of Max vCPUs for that environment is reached, Batch will begin scheduling jobs to the "spot" compute environment. The use of the "spot" compute environment is optional, and is used to help drain pending jobs from the queue faster.
173170

174171
* Go to the AWS Batch Console
175172
* Click on "Job queues"

docs/extra.css

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,4 +57,10 @@
5757

5858
.md-header, .md-footer, .md-footer-nav, .md-footer-meta {
5959
background-color: #232f3e !important;
60+
}
61+
62+
.screenshot {
63+
style: "float: left";
64+
margin: 10px;
65+
border: 1px solid lightgrey;
6066
}

docs/images/custom-deploy-0.png

126 KB
Loading

docs/images/custom-deploy-1.png

93.9 KB
Loading

docs/images/nextflow-0.png

121 KB
Loading

mkdocs.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ nav:
99
- Permissions: core-env/create-iam-roles.md
1010
- Compute Resources: core-env/create-custom-compute-resources.md
1111
- AWS Batch: core-env/setup-aws-batch.md
12+
- Customized Deployment: core-env/custom-deploy.md
13+
- Building a Custom Distribution: core-env/build-custom-distribution.md
1214
# - Containerized Tooling:
1315
# - Introduction: containers/container-introduction.md
1416
# - Examples: containers/container-examples.md
@@ -57,3 +59,5 @@ extra:
5759
s3:
5860
bucket: docs.opendata.aws
5961
prefix: genomics-workflows
62+
63+
use_directory_urls: false

src/aws-genomics-cdk/.gitignore

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
*.js
2+
!jest.config.js
3+
*.d.ts
4+
node_modules
5+
6+
# CDK asset staging directory
7+
.cdk.staging
8+
cdk.out
9+
cdk.context.json

src/aws-genomics-cdk/.npmignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
*.ts
2+
!*.d.ts
3+
4+
# CDK asset staging directory
5+
.cdk.staging
6+
cdk.out

0 commit comments

Comments
 (0)