Skip to content
This repository was archived by the owner on Aug 9, 2023. It is now read-only.

Commit 72adffa

Browse files
wleepangitzhapazhenriqueribeiroFriedmanajfriedman18
authored
v3.1.0 release (#192)
* move ecs configs to ecs-additions-common * fix nextflow workflow cancel behavior * increase docker / ecs stop timeout to allow nextflow cleanup activities * fix error handling in cleanup() in nextflow container entrypoint script * bump to cdk v1.102.0 * change examples source files * enable docker authentication on cromwell * change policy name * change policy * fix some typos and add policies * fix typo on IAM * add cromwell and gwf-core template * scope down policies * fix passing lists to nested stacks * everything is AL2 compatible now * change secret name and scope down permissions * updated efs * update to gp3 * updated EFS support in Nextflow * fixed yaml errors * updated efs deployment * add message about s3 bucket location * update nextflow mount script * support existing EFS * updated combine nextflow and core * update readme * updated with lint * fix pr comments * update ecs-config changes * updated null efs param to none * updated typos and missed string to number * add nextflow helper script * catch error if log not ready * Update README.md Updating README to say MIT-0 rather than Modified MIT now that MIT-0 is a standard SPDX tag. * Cromwell install docs * gwf-core auto update code pipeline * bump mkdocs version * update dependencies * updating with changes to integrate use of FSx and also made some changes to other params for more options. * updated with latest cromwell jar from Henrique * updated cromwell db timeout from 5 secs to 30 secs due to issues faced. Updated cromwell jar with changes recently made with caching added instance type options to config as per other PR raised last year. Would be helpful to have that option * Bump minimist from 1.2.5 to 1.2.6 in /src/aws-genomics-cdk Bumps [minimist](https://github.com/substack/minimist) from 1.2.5 to 1.2.6. - [Release notes](https://github.com/substack/minimist/releases) - [Commits](https://github.com/substack/minimist/compare/1.2.5...1.2.6) --- updated-dependencies: - dependency-name: minimist dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * updated the Max VCPU's to 1000 as per Lee's recommendation Co-authored-by: Itzik Paz <[email protected]> Co-authored-by: Henrique Silva <[email protected]> Co-authored-by: Friedman <[email protected]> Co-authored-by: ajfriedman18 <[email protected]> Co-authored-by: Henri Yandell <[email protected]> Co-authored-by: Mark Schreiber <[email protected]> Co-authored-by: patsarth_gfb <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
1 parent 1676f13 commit 72adffa

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+21092
-4843
lines changed

README.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,15 @@ aws cloudformation create-stack \
5555

5656
```
5757

58+
## Shared File System Support
59+
60+
Amazon EFS is supported out of the box for `GWFCore` and `Nextflow`. You have two options to use EFS.
61+
62+
1. **Create a new EFS File System:** Be sure to have `CreateEFS` set to `Yes` and also include the total number of subnets.
63+
2. **Use an Exisitng EFS File System:** Be sure to specify the EFS ID in the `ExistingEFS` parameter. This file system should be accessible from every subnet you specify.
64+
65+
Following successful deployment of `GWFCore`, when creating your Nextflow Resources, set `MountEFS` to `Yes`.
66+
5867
## Building the documentation
5968

6069
The documentation is built using mkdocs.
@@ -76,4 +85,4 @@ $ mkdocs build
7685

7786
## License Summary
7887

79-
This sample code is made available under a modified MIT license. See the LICENSE file.
88+
This library is licensed under the MIT-0 License. See the LICENSE file.
85.6 KB
Loading
249 KB
Loading
93.1 KB
Loading
273 KB
Loading
54.5 KB
Loading

docs/install-cromwell/index.md

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
# Installing the Genomics Workflow Core and Cromwell
2+
3+
## Summary
4+
5+
The purpose of this document is to demonstrate how an AWS user can provision the infrastructure necessary to run Cromwell versions 52 and beyond on AWS Batch using S3 as an object store using CloudFormation. The instructions cover deployment into an existing VPC. There are two main steps: deploying the genomics workflow core infrastructure which can be used with Cromwell, Nextflow and AWS Step Functions, and the deployment of the Cromwell server and related artifacts.
6+
* * *
7+
8+
## Assumptions
9+
10+
1. The instructions assume you have an existing AWS account with sufficient credentials to deploy the infrastructure or that you will use a role with CloudFormation that has sufficient privileges (admin role is recommended).
11+
2. You have an existing VPC to deploy artifacts into. This VPC should have a minimum of two subnets with routes to the public internet. Private subnet routes may be through a NAT Gateway.
12+
13+
* * *
14+
15+
## Deployment of Genomics Workflow Core into an existing VPC.
16+
17+
Take note of the id of the VPC that you will use and the ids of the subnets of the VPC that you will use for the Batch worker nodes. We recommend using two or more private subnets.
18+
19+
1. Open the CloudFormation consoles and select “**Create stack**” with new resources. Enter `https://aws-genomics-workflows.s3.amazonaws.com/latest/templates/gwfcore/gwfcore-root.template.yaml` as the Amazon S3 URL.
20+
21+
![](./images/screen1.png)
22+
23+
1. Select appropriate values for your environment including the VPC and subnets you recorded above. It is recommended to leave the Default and High Priority Min vCPU values at 0 so that the AWS Batch cluster will not have any instances running when there are no workflows running. Max vCPU values may be increased if you expect to run large workloads utilizing many CPUs. Leave the Distribution Configuration values with the preset defaults.
24+
25+
![](./images/screen2.png)
26+
27+
1. Optionally add tags and click **Next**
28+
29+
1. Review the parameters, acknowledge the Capabilities notifications and click “**Create Stack**
30+
31+
![](./images/screen5.png)
32+
33+
The template will now create several nested stacks to deploy the required resources. This step will take approximately 10 minutes to complete. When this is complete you can proceed with the “[Deploy Cromwell Resources](#deploy-cromwell-resources)” section below.
34+
* * *
35+
36+
## Deploy Cromwell Resources
37+
38+
1. Ensure all steps of the CloudFormation deployment of the Genomics Workflow Core have successfully completed before proceeding any further.
39+
2. From the CloudFormation console select “**Create Stack**” and if prompted select “**With new resources (Standard)**
40+
3. Fill in the Amazon S3 URL with `https://aws-genomics-workflows.s3.amazonaws.com/latest/templates/cromwell/cromwell-resources.template.yaml`
41+
42+
![](./images/screen3.png)
43+
44+
4. Fill in appropriate values for the template. **For `GWFCoreNamespace` use the names space value you used in the section above****.** You should use the same VPC as you used in the previous step above. To secure your Cromwell server you should change the `SSH Address Range` and `HTTP Address Range` to trusted values, these will be used when creating the servers security group.
45+
5. You may either use the latest version of Cromwell (recommended) or specify a version **52 or greater.**
46+
6. Select a MySQL compliant `Cromwell Database Password` that will be used for Cromwell’s metadata database. Select “**Next”**.
47+
48+
![](./images/screen4.png)
49+
50+
7. On the remaining two screens keep the defaults, acknowledge the IAM capabilities and then click “**Create Stack**
51+
52+
Once the stack completes an EC2 will be deployed and it will be running an instance of the Cromwell server. You can now proceed with "[Testing your deployment](#testing-your-deployment)"
53+
* * *
54+
55+
## Testing your Deployment
56+
57+
The following WDL file is a very simple workflow that can be used to test that all the components of the deployment are working together. Add the code block below to a file named `workflow.wdl`
58+
59+
```
60+
workflow helloWorld {
61+
call sayHello
62+
}
63+
64+
task sayHello {
65+
command {
66+
echo "hello world"
67+
}
68+
output {
69+
String out = read_string(stdout())
70+
}
71+
72+
runtime {
73+
docker: "ubuntu:latest"
74+
memory: "1 GB"
75+
cpu: 1
76+
}
77+
}
78+
```
79+
80+
This task can be submitted to the servers REST endpoint using `curl` either from a client that has access to the servers elastic IP or from within the server itself using `localhost.` The hostname of the server is also emitted as an output from the cromwell-resources CloudFormation template.
81+
82+
```
83+
curl -X POST "http://localhost:8000/api/workflows/v1" \
84+
-H "accept: application/json" \
85+
86+
```
87+
88+
It can take a few minutes for AWS Batch to realize there is a job in the work queue and provision a worker to run it. You can monitor this in the AWS Batch console.
89+
90+
You can also monitor the Cromwell server logs in CloudWatch. There will be a log group called `cromwell-server.` Once the run is completed you will see output similar to:
91+
92+
![](./images/screen5.png)
93+
94+
If the run is successful subsequent runs will be “call cached” meaning that the results of the previous run will be copied for all successful steps. If you resubmit the job you will very quickly see the workflow success in the server logs and no additional jobs will be seen in the AWS Batch console. You can disable call caching for the job by adding an options file and submitting it with the run. This will cause the workflow to be re-executed in full.
95+
96+
```json5
97+
{
98+
"write_to_cache": false,
99+
"read_from_cache": false
100+
}
101+
```
102+
103+
```shell
104+
curl -X POST "http://localhost:8000/api/workflows/v1" \
105+
-H "accept: application/json" \
106+
107+
108+
```
109+
110+
For a more realistic workflow, a WDL for simple variant calling using bwa-mem, samtools, and bcftools is available [here](https://github.com/wleepang/demo-genomics-workflow-wdl):
111+
112+
Clone the repo, and submit the WDL file to cromwell. The workflow uses default inputs from public data sources. If you want to override these inputs, modify the `inputs.json` file accordingly and submit it along with the workflow.

requirements.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
mkdocs==1.0.4
1+
mkdocs==1.2.3
22
mkdocs-macros-plugin==0.2.4
33
mkdocs-markdownextradata-plugin==0.0.5
44
mkdocs-material==3.1.0
55
pymdown-extensions==6.0
66
fontawesome-markdown==0.2.6
7-
cfn-lint==0.16.0
7+
cfn-lint==0.16.0

src/aws-genomics-cdk/examples/batch-bwa-job.json

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,15 @@
33
"jobQueue": "genomics-default-queue",
44
"jobDefinition": "bwa:1",
55
"containerOverrides": {
6-
"command": ["bwa mem -t 8 -p -o ${SAMPLE_ID}.sam ${REFERENCE_NAME}.fasta ${SAMPLE_ID}_*1*.fastq.gz"],
6+
"command": ["bwa mem -t 8 -p -o ${SAMPLE_ID}.sam ${REFERENCE_NAME}.fasta ${SAMPLE_ID}_1*.fastq.gz"],
7+
"memory": 32000,
78
"environment": [{
89
"name": "JOB_INPUTS",
9-
"value": "s3://aws-batch-genomics-shared/secondary-analysis/example-files/fastq/NIST7035* s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta*"
10+
"value": "s3://1000genomes/pilot_data/data/NA12878/pilot3_unrecal/SRR014820_*.fastq.gz s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta*"
1011
},
1112
{
1213
"name": "SAMPLE_ID",
13-
"value": "NIST7035"
14+
"value": "SRR014820"
1415
},
1516
{
1617
"name": "REFERENCE_NAME",

src/aws-genomics-cdk/examples/batch-fastqc-job.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"command": ["fastqc *.gz"],
77
"environment": [{
88
"name": "JOB_INPUTS",
9-
"value": "s3://aws-batch-genomics-shared/secondary-analysis/example-files/fastq/NIST7035_R*.fastq.gz"
9+
"value": "s3://1000genomes/pilot_data/data/NA12878/pilot3_unrecal/SRR014820_*.fastq.gz"
1010
},
1111
{
1212
"name": "JOB_OUTPUTS",

0 commit comments

Comments
 (0)