Skip to content
This repository was archived by the owner on Aug 9, 2023. It is now read-only.

Commit 708e176

Browse files
authored
Update Step Functions Example (#79)
Bug Fixes * explicitly stop ecs before starting ebs autoscale on /var/lib/docker * move nextflow additions to before start ecs * add steps missing that are documented in https://docs.docker.com/storage/storagedriver/btrfs-driver/ Improvements * Adding SSM agent and permissions to Batch hosts to allow SSM capabilities like Session Manager to facilitate troubleshooting via SSH without needing an EC2 keypair. * refactor containers and job defs * use host bind mounted awscli * use job def environment variables for execution options * use common entrypoint script for all containers * update sfn example to use dynamic parallelism * remove unneeded parameters from job definitions * update example workflow input * update build dependencies * explicitly add pip * unpin cfn-lint. we need this to stay up to date. * use common build script for tooling containers * add container build template * refactor step functions stack into separate templates * create a generic workflow template that uses nested templates to build individual containers and the state machine for the workflow * simplify the workflow definition templates - the container builds and IAM role creation happens in parent templates * add UpdateReplacePolicy for S3 Buckets Documentation Updates * update nextflow documentation * fix a couple inconsistencies * improve flow and clarity * typo fixes * update step functions docs * update images * add more details on job definition and sfn task * add more details on the example workflow * fix job output prefix in example input * update workflow completion time * add more detailed explanations of important job def parts and how they translate into sfn task code.
1 parent 3218203 commit 708e176

31 files changed

+1318
-769
lines changed

_scripts/test.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
set -e
44

55
# check cfn templates for errors
6+
cfn-lint --version
67
cfn-lint src/templates/**/*.template.yaml
78

89
# make sure that site can build

docs/orchestration/nextflow/nextflow-overview.md

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ Nextflow can be run either locally or on a dedicated EC2 instance. The latter i
88

99
## Full Stack Deployment
1010

11-
The following CloudFormation template will launch an EC2 instance pre-configured for using Nextflow.
11+
_For the impatient:_
12+
The following CloudFormation template will create all the resources you need to runs Nextflow using the architecture shown above. It combines the CloudFormation stacks referenced below in the [Requirements](#requirements) section.
1213

1314
| Name | Description | Source | Launch Stack |
1415
| -- | -- | :--: | -- |
@@ -20,14 +21,24 @@ When the above stack is complete, you will have a preconfigured Batch Job Defini
2021

2122
To get started using Nextflow on AWS you'll need the following setup in your AWS account:
2223

23-
* The core set of resources (S3 Bucket, IAM Roles, AWS Batch) described in the [Getting Started](../../../core-env/introduction) section.
24-
* A containerized `nextflow` executable that pulls configuration and workflow definitions from S3
24+
* The core set of resources (S3 Bucket, IAM Roles, AWS Batch) described in the [Core Environment](../../../core-env/introduction) section.
25+
26+
If you are in a hurry, you can create the complete Core Environment using the following CloudFormation template:
27+
28+
| Name | Description | Source | Launch Stack |
29+
| -- | -- | :--: | :--: |
30+
{{ cfn_stack_row("GWFCore (Existing VPC)", "GWFCore-Full", "aws-genomics-root-novpc.template.yaml", "Create EC2 Launch Templates, AWS Batch Job Queues and Compute Environments, a secure Amazon S3 bucket, and IAM policies and roles within an **existing** VPC. _NOTE: You must provide VPC ID, and subnet IDs_.") }}
31+
32+
!!! note
33+
The CloudFormation above does **not** create a new VPC, and instead will create associated resources in an existing VPC of your choosing, or your default VPC. To automate creating a new VPC to isolate your resources, you can use the [AWS VPC QuickStart](https://aws.amazon.com/quickstart/architecture/vpc/).
34+
35+
* A containerized `nextflow` executable with a custom entrypoint script that draws configuration information from AWS Batch supplied environment variables
2536
* The AWS CLI installed in job instances using `conda`
2637
* A Batch Job Definition that runs a Nextflow head node
27-
* An IAM Role for the Nextflow head node job that allows it access to AWS Batch
28-
* (optional) An S3 Bucket to store your Nextflow workflow definitions
38+
* An IAM Role for the Nextflow head node job that allows it to submit AWS Batch jobs
39+
* (optional) An S3 Bucket to store your Nextflow session cache
2940

30-
The last five items above are created by the following CloudFormation template:
41+
The five items above are created by the following CloudFormation template:
3142

3243
| Name | Description | Source | Launch Stack |
3344
| -- | -- | :--: | -- |
@@ -181,6 +192,9 @@ chown -R ec2-user:ec2-user $USER/miniconda
181192
rm Miniconda3-latest-Linux-x86_64.sh
182193
```
183194

195+
!!! note
196+
The actual Launch Template used in the [Core Environment](../../core-env/introduction.md) does a couple more things, like installing additional resources for [managing space for the job](../../core-env/create-custom-compute-resources.md)
197+
184198
### Batch job definition
185199

186200
An AWS Batch Job Definition for the containerized Nextflow described above is shown below.
@@ -374,7 +388,7 @@ You can customize these job definitions to incorporate additional environment va
374388
!!! important
375389
Instances provisioned using the Nextflow specific EC2 Launch Template configure `/var/lib/docker` in the host instance to use automatically [expandable scratch space](../../../core-env/create-custom-compute-resources/), allowing containerized jobs to stage as much data as needed without running into disk space limits.
376390

377-
### Running the workflow
391+
### Running workflows
378392

379393
To run a workflow you submit a `nextflow` Batch job to the appropriate Batch Job Queue via:
380394

-198 KB
Loading
-60.9 KB
Loading
9.87 KB
Loading
-17.5 KB
Loading
73 KB
Loading

docs/orchestration/step-functions/step-functions-overview.md

Lines changed: 141 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ State machines that use AWS Batch for job execution and send events to CloudWatc
4141
"Version": "2012-10-17",
4242
"Statement": [
4343
{
44+
"Sid": "enable submitting batch jobs",
4445
"Effect": "Allow",
4546
"Action": [
4647
"batch:SubmitJob",
@@ -64,9 +65,39 @@ State machines that use AWS Batch for job execution and send events to CloudWatc
6465
}
6566
```
6667

68+
For more complex workflows that use nested workflows or require more complex input parsing, you need to add additional permissions for executing Step Functions State Machines and invoking Lambda functions:
69+
70+
```json
71+
{
72+
"Version": "2012-10-17",
73+
"Statement": [
74+
{
75+
"Sid": "enable calling lambda functions",
76+
"Effect": "Allow",
77+
"Action": [
78+
"lambda:InvokeFunction"
79+
],
80+
"Resource": "*"
81+
},
82+
{
83+
"Sid": "enable calling other step functions",
84+
"Effect": "Allow",
85+
"Action": [
86+
"states:StartExecution"
87+
],
88+
"Resource": "*"
89+
},
90+
...
91+
]
92+
}
93+
```
94+
95+
!!! note
96+
All `Resource` values in the policy statements above can be scoped to be more specific if needed.
97+
6798
## Step Functions State Machine
6899

69-
Workflows in AWS Step Functions are built using [Amazon States Language](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-amazon-states-language.html) (ASL), a declarative, JSON-based, structured language used to define your state machine, a collection of states that can do work (Task states), determine which states to transition to next (Choice states), stop an execution with an error (Fail states), and so on.
100+
Workflows in AWS Step Functions are built using [Amazon States Language](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-amazon-states-language.html) (ASL), a declarative, JSON-based, structured language used to define a "state-machine". An AWS Step Functions State-Machine is a collection of states that can do work (Task states), determine which states to transition to next (Choice states), stop an execution with an error (Fail states), and so on.
70101

71102
### Building workflows with AWS Step Functions
72103

@@ -123,9 +154,7 @@ Step Functions [ASL documentation](https://docs.aws.amazon.com/step-functions/la
123154

124155
### Batch Job Definitions
125156

126-
It is recommended to have [Batch Job Definitions](https://docs.aws.amazon.com/batch/latest/userguide/job_definitions.html) created for your tooling prior to building a Step Functions state machine. These can then be referenced in state machine `Task` states by their respective ARNs.
127-
128-
Step Functions will use the Batch Job Definition to define compute resource requirements and parameter defaults for the Batch Job it submits.
157+
[AWS Batch Job Definitions](https://docs.aws.amazon.com/batch/latest/userguide/job_definitions.html) are used to define compute resource requirements and parameter defaults for an AWS Batch Job. These are then referenced in state machine `Task` states by their respective ARNs.
129158

130159
An example Job Definition for the `bwa-mem` sequence aligner is shown below:
131160

@@ -134,58 +163,85 @@ An example Job Definition for the `bwa-mem` sequence aligner is shown below:
134163
"jobDefinitionName": "bwa-mem",
135164
"type": "container",
136165
"parameters": {
137-
"InputReferenceS3Prefix": "s3://<bucket-name>/reference",
138-
"InputFastqS3Path1": "s3://<bucket-name>/<sample-name>/fastq/read1.fastq.gz",
139-
"InputFastqS3Path2": "s3://<bucket-name>/<sample-name>/fastq/read2.fastq.gz",
140-
"OutputS3Prefix": "s3://<bucket-name>/<sample-name>/aligned"
166+
"threads": "8"
141167
},
142168
"containerProperties": {
143169
"image": "<dockerhub-user>/bwa-mem:latest",
144170
"vcpus": 8,
145171
"memory": 32000,
146172
"command": [
147-
"Ref::InputReferenceS3Prefix",
148-
"Ref::InputFastqS3Path1",
149-
"Ref::InputFastqS3Path2",
150-
"Ref::OutputS3Prefix",
173+
"bwa", "mem",
174+
"-t", "Ref::threads",
175+
"-p",
176+
"reference.fasta",
177+
"sample_1.fastq.gz"
151178
],
152179
"volumes": [
153180
{
154181
"host": {
155182
"sourcePath": "/scratch"
156183
},
157184
"name": "scratch"
185+
},
186+
{
187+
"host": {
188+
"sourcePath": "/opt/miniconda"
189+
},
190+
"name": "aws-cli"
191+
}
192+
],
193+
"environment": [
194+
{
195+
"name": "REFERENCE_URI",
196+
"value": "s3://<bucket-name>/reference/*"
197+
},
198+
{
199+
"name": "INPUT_DATA_URI",
200+
"value": "s3://<bucket-name>/<sample-name>/fastq/*.fastq.gz"
201+
},
202+
{
203+
"name": "OUTPUT_DATA_URI",
204+
"value": "s3://<bucket-name>/<sample-name>/aligned"
158205
}
159206
],
160-
"environment": [],
161207
"mountPoints": [
162208
{
163209
"containerPath": "/opt/work",
164210
"sourceVolume": "scratch"
211+
},
212+
{
213+
"containerPath": "/opt/miniconda",
214+
"sourceVolume": "aws-cli"
165215
}
166216
],
167217
"ulimits": []
168218
}
169219
}
170220
```
171221

172-
!!! note
173-
The Job Definition above assumes that `bwa-mem` has been containerized with an
174-
`entrypoint` script that handles Amazon S3 URIs for input and output data
175-
staging.
222+
There are three key parts of the above definition to take note of.
176223

177-
Because data staging requirements can be unique to the tooling used, neither AWS Batch nor Step Functions handles this automatically.
224+
* Command and Parameters
225+
226+
The **command** is a list of strings that will be sent to the container. This is the same as the `...` arguments that you would provide to a `docker run mycontainer ...` command.
227+
228+
**Parameters** are placeholders that you define whose values are substituted when a job is submitted. In the case above a `threads` parameter is defined with a default value of `8`. The job definition's `command` references this parameter with `Ref::threads`.
229+
230+
!!! note
231+
Parameter references in the command list must be separate strings - concatenation with other parameter references or static values is not allowed.
232+
233+
* Environment
234+
235+
**Environment** defines a set of environment variables that will be available for the container. For example, you can define environment variables used by the container entrypoint script to identify data it needs to stage in.
236+
237+
* Volumes and Mount Points
238+
239+
Together, **volumes** and **mountPoints** define what you would provide as using a `-v hostpath:containerpath` option to a `docker run` command. These can be used to map host directories with resources (e.g. data or tools) used by all containers. In the example above, a `scratch` volume is mapped so that the container can utilize a larger disk on the host. Also, a version of the AWS CLI installed with `conda` is mapped into the container - enabling the container to have access to it (e.g. so it can transfer data from S3 and back) with out explicitly building in.
178240

179-
!!! note
180-
The `volumes` and `mountPoints` specifications allow the job container to
181-
use scratch storage space on the instance it is placed on. This is equivalent
182-
to the `-v host_path:container_path` option provided to a `docker run` call
183-
at the command line.
184241

185242
### State Machine Batch Job Tasks
186243

187-
Conveniently for genomics workflows, AWS Step Functions has built-in integration with AWS Batch (and [several other services](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-connectors.html)), and provides snippets of code to make developing your state-machine
188-
Batch tasks easier.
244+
AWS Step Functions has built-in integration with AWS Batch (and [several other services](https://docs.aws.amazon.com/step-functions/latest/dg/concepts-connectors.html)), and provides snippets of code to make developing your state-machine tasks easier.
189245

190246
![Manage a Batch Job Snippet](images/sfn-batch-job-snippet.png)
191247

@@ -202,7 +258,15 @@ would look like the following:
202258
"JobDefinition": "arn:aws:batch:<region>:<account>:job-definition/bwa-mem:1",
203259
"JobName": "bwa-mem",
204260
"JobQueue": "<queue-arn>",
205-
"Parameters.$": "$.bwa-mem.parameters"
261+
"Parameters.$": "$.bwa-mem.parameters",
262+
"Environment": [
263+
{"Name": "REFERENCE_URI",
264+
"Value.$": "$.bwa-mem.environment.REFERENCE_URI"},
265+
{"Name": "INPUT_DATA_URI",
266+
"Value.$": "$.bwa-mem.environment.INPUT_DATA_URI"},
267+
{"Name": "OUTPUT_DATA_URI",
268+
"Value.$": "$.bwa-mem.environment.OUTPUT_DATA_URI"}
269+
]
206270
},
207271
"Next": "NEXT_TASK_NAME"
208272
}
@@ -214,36 +278,79 @@ Inputs to a state machine that uses the above `BwaMemTask` would look like this:
214278
{
215279
"bwa-mem": {
216280
"parameters": {
217-
"InputReferenceS3Prefix": "s3://<bucket-name/><sample-name>/reference",
218-
"InputFastqS3Path1": "s3://<bucket-name/><sample-name>/fastq/read1.fastq.gz",
219-
"InputFastqS3Path2": "s3://<bucket-name/><sample-name>/fastq/read2.fastq.gz",
220-
"OutputS3Prefix": "s3://<bucket-name/><sample-name>/aligned"
281+
"threads": 8
282+
},
283+
"environment": {
284+
"REFERENCE_URI": "s3://<bucket-name/><sample-name>/reference/*",
285+
"INPUT_DATA_URI": "s3://<bucket-name/><sample-name>/fastq/*.fastq.gz",
286+
"OUTPUT_DATA_URI": "s3://<bucket-name/><sample-name>/aligned"
221287
}
222288
},
223289
...
224-
}
290+
}
225291
```
226292

227293
When the Task state completes Step Functions will add information to a new `status` key under `bwa-mem` in the JSON object. The complete object will be passed on to the next state in the workflow.
228294

229295
## Example state machine
230296

231-
All of the above is created by the following CloudFormation template.
297+
The following CloudFormation template creates container images, AWS Batch Job Definitions, and an AWS Step Functions State Machine for a simple genomics workflow using bwa, samtools, and bcftools.
232298

233299
| Name | Description | Source | Launch Stack |
234300
| -- | -- | :--: | :--: |
235-
{{ cfn_stack_row("AWS Step Functions Example", "SfnExample", "step-functions/sfn-example.template.yaml", "Create a Step Functions State Machine, Batch Job Definitions, and container images to run an example genomics workflow") }}
301+
{{ cfn_stack_row("AWS Step Functions Example", "SfnExample", "step-functions/sfn-workflow.template.yaml", "Create a Step Functions State Machine, Batch Job Definitions, and container images to run an example genomics workflow") }}
236302

237303
!!! note
238304
The stack above needs to create several IAM Roles. You must have administrative privileges in your AWS Account for this to succeed.
239305

306+
The example workflow is a simple secondary analysis pipeline that converts raw FASTQ files into VCFs with variants called for a list of chromosomes. It uses the following open source based tools:
307+
308+
* `bwa-mem`: Burrows-Wheeler Aligner for aligning short sequence reads to a reference genome
309+
* `samtools`: **S**equence **A**lignment **M**apping library for indexing and sorting aligned reads
310+
* `bcftools`: **B**inary (V)ariant **C**all **F**ormat library for determining variants in sample reads relative to a reference genome
311+
312+
Read alignment, sorting, and indexing occur sequentially by Step Functions Task States. Variant calls for chromosomes occur in parallel using a Step Functions Map State and sub-Task States therein. All tasks submit AWS Batch Jobs to perform computational work using containerized versions of the tools listed above.
313+
314+
![example genomics workflow state machine](./images/sfn-example-mapping-state-machine.png)
315+
316+
The tooling containers used by the workflow use a [generic entrypoint script]({{ repo_url + "tree/master/src/containers" }}) that wraps the underlying tool and handles S3 data staging. It uses the AWS CLI to transfer objects and environment variables to identify data inputs and outputs to stage.
317+
240318
### Running the workflow
241319

242320
When the stack above completes, go to the outputs tab and copy the JSON string provided in `StateMachineInput`.
243321

244322
![cloud formation output tab](./images/cfn-stack-outputs-tab.png)
245323
![example state-machine input](./images/cfn-stack-outputs-statemachineinput.png)
246324

325+
The input JSON will like the following, but with the values for `queue` and `JOB_OUTPUT_PREFIX` prepopulated with resource names specific to the stack created by the CloudFormation template above:
326+
327+
```json
328+
{
329+
"params": {
330+
"__comment__": {
331+
"replace values for `queue` and `environment.JOB_OUTPUT_PREFIX` with values that match your resources": {
332+
"queue": "Name or ARN of the AWS Batch Job Queue the workflow will use by default.",
333+
"environment.JOB_OUTPUT_PREFIX": "S3 URI (e.g. s3://bucket/prefix) you are using for workflow inputs and outputs."
334+
},
335+
},
336+
"queue": "default",
337+
"environment": {
338+
"REFERENCE_NAME": "Homo_sapiens_assembly38",
339+
"SAMPLE_ID": "NIST7035",
340+
"SOURCE_DATA_PREFIX": "s3://aws-batch-genomics-shared/secondary-analysis/example-files/fastq",
341+
"JOB_OUTPUT_PREFIX": "s3://YOUR-BUCKET-NAME/PREFIX",
342+
"JOB_AWS_CLI_PATH": "/opt/miniconda/bin"
343+
},
344+
"chromosomes": [
345+
"chr19",
346+
"chr20",
347+
"chr21",
348+
"chr22"
349+
]
350+
}
351+
}
352+
```
353+
247354
Next head to the AWS Step Functions console and select the state-machine that was created.
248355

249356
![select state-machine](./images/sfn-console-statemachine.png)
@@ -260,4 +367,4 @@ You will then be taken to the execution tracking page where you can monitor the
260367

261368
![execution tracking](./images/sfn-console-execution-inprogress.png)
262369

263-
The workflow takes approximately 5-6hrs to complete on `r4.2xlarge` SPOT instances.
370+
The example workflow references a small demo dataset and takes approximately 20-30 minutes to complete.

environment.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,9 @@ channels:
33
- defaults
44
dependencies:
55
- python=3.6.6
6+
- pip
67
- pip:
7-
- cfn-lint==0.16.0
8+
- cfn-lint
89
- fontawesome-markdown==0.2.6
910
- mkdocs==1.0.4
1011
- mkdocs-macros-plugin==0.2.4

src/containers/_common/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# Common assets for tooling containers
2+
3+
These are assets that are used to build all tooling containers.
4+
5+
* `build.sh`: a generic build script that first builds a base image for a container, then builds an AWS specific image
6+
* `entrypoint.aws.sh`: a generic entrypoint script that wraps a call to a binary tool in the container with handlers data staging from/to S3

0 commit comments

Comments
 (0)