Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions mlops-template-terraform/LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.

SPDX-License-Identifier: MIT-0

Permission is hereby granted, free of charge, to any person obtaining a copy of this
software and associated documentation files (the "Software"), to deal in the Software
without restriction, including without limitation the rights to use, copy, modify,
merge, publish, distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
62 changes: 62 additions & 0 deletions mlops-template-terraform/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
## MLOps Terraform Template for SageMaker Projects

An important aspect of Machine Learning (ML) projects is the transition from the manual experimentation with
Jupyter notebooks and similar to an architecture, where workflows for building, training, deploying and maintaining ML models
in production are automated and orchestrated. In order to achieve this, an operating model between different personas such as Data Scientists,
Data Engineers, ML Engineers, DevOps Engineers, IT and business stakeholders needs to be established. Further, the data and
model lifecycle and the underlying workflows need to be defined, as well as the responsibilities of the different personas
in these areas. This collection of practices is called Machine Learning Operations (MLOps).

This repository contains a set baseline infrastructure for an MLOps environment on AWS for a single AWS account at this moment.
The infrastructure is defined with Terraform and is built around the Amazon SageMaker service.

The 3 main components in the repository are:

### Component 1: ./mlops_infra:

This terraform project is used to bootstrap an account for ML and includes the following modules:
- modules/networking: Deploy vpc, subnet & vpc endpoints for SageMaker
- modules/code_commit: Deploy codecommit repository & associate it as SageMaker repository
- modules/kms: Deploy KMS key for encryption of SageMaker resources
- modules/iam: Deploy Sagemaker roles & policies
- modules/sagemaker_studio: Deploy SageMaker Studio with users and default Jupyterhub app, as well as enabling SageMaker projects

### Component 2: ./mlops_templates:

This terraform project is used to bootstrap service catalog with a portfolio and example terraform based SageMaker project.
It allows deploying many different organizational SageMaker project templates.

- modules/sagemaker_project_template: Create Service Catalog Portolio & products

### Component 3: ./mlops_templates/templates/mlops_terraform_template/seed_code

These folders contain the "seed code", which is the code that will be initialized when a new SageMaker project is created in SageMaker Studio.
The seed code is associated with the corresponding template in the mlops_template code. The seed code should be 100% generic
and should provide the baseline for new ML projects to build on.

- seed_code/build_app: Example terraform based model build application using SageMaker Pipelines, Codecommit & Codebuild
- seed_code/deploy_app: Example terraform based model deployment application that deploys trained models to SageMaker endpoints

## Prerequisites

- Terraform
- Git
- AWS CLI v2

## Architecture overview and workflows

![Architecture Diagram](./mlops_templates/diagrams/mlops-terraform-template-overview.png)


## How to use

### Step 1: Deploy mlops_infra into a fresh account

Navigate to the 'mlops_infra' directory with `cd mlops_infra` and follow instructions:
[mlops_infra](mlops_infra/README.md)


### Step 2: Deploy mlops_template into the same account

Navigate to the 'mlops_templates' directory with `cd mlops_templates` and follow instructions:
[mlops_templates](mlops_templates/README.md)
17 changes: 17 additions & 0 deletions mlops-template-terraform/mlops_infra/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
.DEFAULT_GOAL = help
.PHONY: help bootstrap init plan apply

help:
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}'

bootstrap:
@./scripts/terraform-account-setup.sh

init:
@cd terraform && terraform init

plan:
@cd terraform && terraform plan

apply:
@cd terraform && terraform apply
53 changes: 53 additions & 0 deletions mlops-template-terraform/mlops_infra/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# mlops_infra

## Summary

This repository is used to deploy the foundational infrastructure for MLOps on AWS using Terraform. It includes the modules to deploy:

- `networking`: Sets up basic VPC and Subnets and required VPC Endpoints for running SageMaker Studio in private subnets
- `iam`: Sets up basic IAM roles and IAM policies
- `kms`: Creates KMS key and policies
- `sagemaker_studio`: Deploys and configures Amazon SageMaker Studio including automatically enabling Amazon SageMaker projects

## Architecture overview

![Architecture Diagram](./diagrams/SageMaker_dev_env.png)

## Getting started


How to apply the resources:

Expose aws credentials via environment variables(https://registry.terraform.io/providers/hashicorp/aws/latest/docs#environment-variables)


Create state bucket via CloudFormation & generate terraform/provider.tf, as well as initializing Terraform
```bat
make bootstrap
make init
```

Modify the locals in `terraform/main.tf`. In particular, change the `prefix` to a unique name for your project/use-case. This will allow deploying multiple versions of the infrastructure side by side for each prefix.

## Deploying your infrastructure

```bat
make plan
make apply
```

## Adding new users to Amazon SageMaker Studio Domain

To add additional users to the Amazon SageMaker Studio domain

1. Open `terraform/main.tf`
2. Modify the user list
3. `make plan && make apply`

## How to destroy the resources:

```bat
make destroy
```

> Note: For the SageMaker Studio related resources (e.g. User etc.) you need to currently do it manually, as the destroy command fails with the user still being in use.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Resources:
S3Bucket:
Type: 'AWS::S3::Bucket'
DeletionPolicy: Retain
Properties:
BucketName: !Sub 'mlops-${AWS::AccountId}-${AWS::Region}-tf-state'
Outputs:
BucketName:
Value: !Ref S3Bucket
Description: Name of the sample Amazon S3 bucket with CORS enabled.
BucketRegion:
Value: !Sub "${AWS::Region}"
Description: Region on state bucket
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
provider "aws" {
region = "$BUCKET_REGION"
}

terraform {
required_version = ">= 1.0.0"
backend "s3" {
bucket = "$BUCKET_NAME"
key = "mlops.tfstate"
region = "$BUCKET_REGION"
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/bash
STACK_NAME="mlops-tf-bootstrap"
AWS_ACCOUNT=$(aws sts get-caller-identity --query "Account" --output text)
AWS_REGION=${AWS_REGION:-$(aws configure get region)}

bootstrap() {
echo "---------------------BOOTSTRAPPING---------------------"
aws cloudformation deploy --template ./scripts/bootstrap/bootstrap_cfn.yaml --stack-name $STACK_NAME --region $AWS_REGION
export BUCKET_NAME=$(aws cloudformation describe-stacks --stack-name $STACK_NAME --query "Stacks[0].Outputs[?OutputKey=='BucketName'].OutputValue" --output text)
export BUCKET_REGION=$(aws cloudformation describe-stacks --stack-name $STACK_NAME --query "Stacks[0].Outputs[?OutputKey=='BucketRegion'].OutputValue" --output text)

# Update provider.tf to use bucket and region
envsubst < "./scripts/bootstrap/provider.template" > "./terraform/provider.tf"

# Re-initialize terraform
terraform init -reconfigure || true
echo "State Bucket: $BUCKET_NAME"
echo "State Region: $BUCKET_REGION"
echo "-----------------------COMPLETE------------------------"
echo "Ensure to set your AWS_REGION environment variable to $AWS_REGION for Terrform to select the correct region"
}

read -r -p "Bootstrap $AWS_ACCOUNT in $AWS_REGION?? [Y/n] " CONFIRMATION
case "$CONFIRMATION" in
[yY][eE][sS]|[yY])
# Deploy bucket for state files
bootstrap
;;
*)
echo "Skipping bootstrap"
;;
esac

41 changes: 41 additions & 0 deletions mlops-template-terraform/mlops_infra/terraform/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
locals {
prefix = "mlops"
user_profile_names = ["user1", "user2"]
domain_name = "${local.prefix}-studio-domain"
kms_key_alias = "${local.prefix}-kms-key"
}

data "aws_caller_identity" "current" {}

data "aws_region" "current" {}

module "networking" {
source = ".//modules/networking"
prefix = local.prefix
vpc_cidr_block = "10.0.0.0/16"
private_subnet_cidr_block = "10.0.1.0/24"
public_subnet_cidr_block = "10.0.0.0/24"
availability_zone = "${data.aws_region.current.name}a" # Get rid of this
}

module "kms" {
source = ".//modules/kms"
kms_key_alias = local.kms_key_alias
}

module "iam" {
source = ".//modules/iam"
prefix = local.prefix
kms_key_arn = module.kms.kms_key_arn
}

module "sm_studio" {
source = ".//modules/sagemaker_studio"
domain_name = local.domain_name
vpc_id = module.networking.vpc_id
subnet_ids = [module.networking.private_subnet_ids]
sm_execution_role_arn = module.iam.sm_execution_role_arn
kms_key_arn = module.kms.kms_key_arn
user_profile_names = local.user_profile_names
}

Loading