aws-samples
diff --git a/‎README.md‎
Lines changed: 10 additions & 1 deletion b/‎README.md‎
Lines changed: 10 additions & 1 deletion
diff --git a/‎docs/install-cromwell/images/screen1.png‎
85.6 KB b/‎docs/install-cromwell/images/screen1.png‎
85.6 KB
diff --git a/‎docs/install-cromwell/images/screen2.png‎
249 KB b/‎docs/install-cromwell/images/screen2.png‎
249 KB
diff --git a/‎docs/install-cromwell/images/screen3.png‎
93.1 KB b/‎docs/install-cromwell/images/screen3.png‎
93.1 KB
diff --git a/‎docs/install-cromwell/images/screen4.png‎
273 KB b/‎docs/install-cromwell/images/screen4.png‎
273 KB
diff --git a/‎docs/install-cromwell/images/screen5.png‎
54.5 KB b/‎docs/install-cromwell/images/screen5.png‎
54.5 KB
diff --git a/‎docs/install-cromwell/index.md‎
Lines changed: 112 additions & 0 deletions b/‎docs/install-cromwell/index.md‎
Lines changed: 112 additions & 0 deletions
diff --git a/‎requirements.txt‎
Lines changed: 2 additions & 2 deletions b/‎requirements.txt‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎src/aws-genomics-cdk/examples/batch-bwa-job.json‎
Lines changed: 4 additions & 3 deletions b/‎src/aws-genomics-cdk/examples/batch-bwa-job.json‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎src/aws-genomics-cdk/examples/batch-fastqc-job.json‎
Lines changed: 1 addition & 1 deletion b/‎src/aws-genomics-cdk/examples/batch-fastqc-job.json‎
Lines changed: 1 addition & 1 deletion
@@ -55,6 +55,15 @@ aws cloudformation create-stack \
 
 ```
 
+## Shared File System Support
+
+Amazon EFS is supported out of the box for `GWFCore` and `Nextflow`. You have two options to use EFS.
+
+1. **Create a new EFS File System:** Be sure to have `CreateEFS` set to `Yes` and also include the total number of subnets.
+2. **Use an Exisitng EFS File System:** Be sure to specify the EFS ID in the `ExistingEFS` parameter. This file system should be accessible from every subnet you specify.
+
+Following successful deployment of `GWFCore`, when creating your Nextflow Resources, set `MountEFS` to `Yes`.
+
 ## Building the documentation
 
 The documentation is built using mkdocs.
@@ -76,4 +85,4 @@ $ mkdocs build
 
 ## License Summary
 
-This sample code is made available under a modified MIT license. See the LICENSE file.
+This library is licensed under the MIT-0 License. See the LICENSE file.
@@ -0,0 +1,112 @@
+# Installing the Genomics Workflow Core and Cromwell
+
+## Summary
+
+The purpose of this document is to demonstrate how an AWS user can provision the infrastructure necessary to run Cromwell versions 52 and beyond on AWS Batch using S3 as an object store using CloudFormation. The instructions cover deployment into an existing VPC. There are two main steps: deploying the genomics workflow core infrastructure which can be used with Cromwell, Nextflow and AWS Step Functions, and the deployment of the Cromwell server and related artifacts.
+* * *
+
+## Assumptions
+
+1. The instructions assume you have an existing AWS account with sufficient credentials to deploy the infrastructure or that you will use a role with CloudFormation that has sufficient privileges (admin role is recommended).
+2. You have an existing VPC to deploy artifacts into. This VPC should have a minimum of two subnets with routes to the public internet. Private subnet routes may be through a NAT Gateway.
+
+* * *
+
+## Deployment of Genomics Workflow Core into an existing VPC.
+
+Take note of the id of the VPC that you will use and the ids of the subnets of the VPC that you will use for the Batch worker nodes. We recommend using two or more private subnets. 
+
+1. Open the CloudFormation consoles and select “**Create stack**” with new resources. Enter `https://aws-genomics-workflows.s3.amazonaws.com/latest/templates/gwfcore/gwfcore-root.template.yaml` as the Amazon S3 URL.
+
+![](./images/screen1.png)
+
+1. Select appropriate values for your environment including the VPC and subnets you recorded above. It is recommended to leave the Default and High Priority Min vCPU values at 0 so that the AWS Batch cluster will not have any instances running when there are no workflows running. Max vCPU values may be increased if you expect to run large workloads utilizing many CPUs. Leave the Distribution Configuration values with the preset defaults.
+
+![](./images/screen2.png)
+
+1. Optionally add tags and click **Next**
+
+1. Review the parameters, acknowledge the Capabilities notifications and click “**Create Stack**”
+
+![](./images/screen5.png)
+
+The template will now create several nested stacks to deploy the required resources. This step will take approximately 10 minutes to complete. When this is complete you can proceed with the “[Deploy Cromwell Resources](#deploy-cromwell-resources)” section below.
+* * *
+
+## Deploy Cromwell Resources
+
+1. Ensure all steps of the CloudFormation deployment of the Genomics Workflow Core have successfully completed before proceeding any further.
+2. From the CloudFormation console select “**Create Stack**” and if prompted select “**With new resources (Standard)**”
+3. Fill in the Amazon S3 URL with `https://aws-genomics-workflows.s3.amazonaws.com/latest/templates/cromwell/cromwell-resources.template.yaml`
+
+![](./images/screen3.png)
+
+4. Fill in appropriate values for the template. **For `GWFCoreNamespace` use the names space value you used in the section above****.** You should use the same VPC as you used in the previous step above. To secure your Cromwell server you should change the `SSH Address Range` and `HTTP Address Range` to trusted values, these will be used when creating the servers security group. 
+5. You may either use the latest version of Cromwell (recommended) or specify a version **52 or greater.**
+6. Select a MySQL compliant `Cromwell Database Password` that will be used for Cromwell’s metadata database. Select “**Next”**.
+
+![](./images/screen4.png)
+
+7. On the remaining two screens keep the defaults, acknowledge the IAM capabilities and then click “**Create Stack**”
+
+Once the stack completes an EC2 will be deployed and it will be running an instance of the Cromwell server. You can now proceed with "[Testing your deployment](#testing-your-deployment)"
+* * *
+
+## Testing your Deployment
+
+The following WDL file is a very simple workflow that can be used to test that all the components of the deployment are working together. Add the code block below to a file named `workflow.wdl`
+
+```
+workflow helloWorld {
+    call sayHello
+}
+
+task sayHello {
+    command {
+        echo "hello world"
+    }
+    output {
+        String out = read_string(stdout())
+    }
+
+    runtime {
+       docker: "ubuntu:latest"
+       memory: "1 GB"
+       cpu: 1
+    }
+}
+```
+
+This task can be submitted to the servers REST endpoint using `curl` either from a client that has access to the servers elastic IP or from within the server itself using `localhost.` The hostname of the server is also emitted as an output from the cromwell-resources CloudFormation template.
+
+```
+curl -X POST "http://localhost:8000/api/workflows/v1" \
+     -H "accept: application/json" \
+     -F "[email protected]"
+```
+
+It can take a few minutes for AWS Batch to realize there is a job in the work queue and provision a worker to run it. You can monitor this in the AWS Batch console. 
+
+You can also monitor the Cromwell server logs in CloudWatch. There will be a log group called `cromwell-server.` Once the run is completed you will see output similar to:
+
+![](./images/screen5.png)
+
+If the run is successful subsequent runs will be “call cached” meaning that the results of the previous run will be copied for all successful steps. If you resubmit the job you will very quickly see the workflow success in the server logs and no additional jobs will be seen in the AWS Batch console. You can disable call caching for the job by adding an options file and submitting it with the run. This will cause the workflow to be re-executed in full.
+
+```json5
+{
+    "write_to_cache": false,
+    "read_from_cache": false
+}
+```
+
+```shell
+curl -X POST "http://localhost:8000/api/workflows/v1" \
+     -H "accept: application/json" \
+     -F "[email protected]" \
+     -F "[email protected]"
+```
+
+For a more realistic workflow, a WDL for simple variant calling using bwa-mem, samtools, and bcftools is available [here](https://github.com/wleepang/demo-genomics-workflow-wdl):
+
+Clone the repo, and submit the WDL file to cromwell. The workflow uses default inputs from public data sources. If you want to override these inputs, modify the `inputs.json` file accordingly and submit it along with the workflow.
@@ -1,7 +1,7 @@
-mkdocs==1.0.4
+mkdocs==1.2.3
 mkdocs-macros-plugin==0.2.4
 mkdocs-markdownextradata-plugin==0.0.5
 mkdocs-material==3.1.0
 pymdown-extensions==6.0
 fontawesome-markdown==0.2.6
-cfn-lint==0.16.0
+cfn-lint==0.16.0
@@ -3,14 +3,15 @@
     "jobQueue": "genomics-default-queue",
     "jobDefinition": "bwa:1",
     "containerOverrides": {
-        "command": ["bwa mem -t 8 -p -o ${SAMPLE_ID}.sam ${REFERENCE_NAME}.fasta ${SAMPLE_ID}_*1*.fastq.gz"],
+        "command": ["bwa mem -t 8 -p -o ${SAMPLE_ID}.sam ${REFERENCE_NAME}.fasta ${SAMPLE_ID}_1*.fastq.gz"],
+        "memory": 32000,
         "environment": [{
                 "name": "JOB_INPUTS",
-                "value": "s3://aws-batch-genomics-shared/secondary-analysis/example-files/fastq/NIST7035* s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta*"
+                "value": "s3://1000genomes/pilot_data/data/NA12878/pilot3_unrecal/SRR014820_*.fastq.gz s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta*"
             },
             {
                 "name": "SAMPLE_ID",
-                "value": "NIST7035"
+                "value": "SRR014820"
             },
             {
                 "name": "REFERENCE_NAME",
 
@@ -6,7 +6,7 @@
         "command": ["fastqc *.gz"],
         "environment": [{
                 "name": "JOB_INPUTS",
-                "value": "s3://aws-batch-genomics-shared/secondary-analysis/example-files/fastq/NIST7035_R*.fastq.gz"
+                "value": "s3://1000genomes/pilot_data/data/NA12878/pilot3_unrecal/SRR014820_*.fastq.gz"
             },
             {
                 "name": "JOB_OUTPUTS",
Original file line number	Diff line number	Diff line change
`@@ -6,7 +6,7 @@`
`6`	`6`	`"command": ["fastqc *.gz"],`
`7`	`7`	`"environment": [{`
`8`	`8`	`"name": "JOB_INPUTS",`
`9`		`- "value": "s3://aws-batch-genomics-shared/secondary-analysis/example-files/fastq/NIST7035_R*.fastq.gz"`
	`9`	`+ "value": "s3://1000genomes/pilot_data/data/NA12878/pilot3_unrecal/SRR014820_*.fastq.gz"`
`10`	`10`	`},`
`11`	`11`	`{`
`12`	`12`	`"name": "JOB_OUTPUTS",`