From 02882e8bb11bf6c05fbf8f19e612b292eed3a63e Mon Sep 17 00:00:00 2001 From: Igor Lakhtenkov Date: Tue, 7 Mar 2023 00:20:56 +0100 Subject: [PATCH 1/2] Added CMEK for GCS service. Minor documentation changes. --- .terraform-docs.yml | 5 ++-- README.md | 60 +++++++++++++++++++++++++-------------------- pipeline.tf | 6 +++++ variables.tf | 43 ++++++++++++++++++++------------ 4 files changed, 70 insertions(+), 44 deletions(-) diff --git a/.terraform-docs.yml b/.terraform-docs.yml index 2a956d7..9b0bb24 100644 --- a/.terraform-docs.yml +++ b/.terraform-docs.yml @@ -15,6 +15,7 @@ content: |- settings: indent: 4 escape: false - default: false - required: false + default: true + required: true + html: true type: true diff --git a/README.md b/README.md index 5340cf0..8520385 100644 --- a/README.md +++ b/README.md @@ -16,33 +16,34 @@ These deployment templates are provided as is, without warranty. See [Copyright #### Inputs -| Name | Description | Type | -|------|-------------|------| -| [dataflow_job_name](#input_dataflow_job_name) | Dataflow job name. No spaces | `string` | -| [log_filter](#input_log_filter) | Log filter to use when exporting logs | `string` | -| [network](#input_network) | Network to deploy into | `string` | -| [project](#input_project) | Project ID to deploy resources in | `string` | -| [region](#input_region) | Region to deploy regional-resources into. This must match subnet's region if deploying into existing network (e.g. Shared VPC). See `subnet` parameter below | `string` | -| [splunk_hec_url](#input_splunk_hec_url) | Splunk HEC URL to write data to. Example: https://[MY_SPLUNK_IP_OR_FQDN]:8088 | `string` | -| [create_network](#input_create_network) | Boolean value specifying if a new network needs to be created. | `bool` | -| [dataflow_job_batch_count](#input_dataflow_job_batch_count) | (Optional) Batch count of messages in single request to Splunk (default 50) | `number` | -| [dataflow_job_disable_certificate_validation](#input_dataflow_job_disable_certificate_validation) | (Optional) Boolean to disable SSL certificate validation (default `false`) | `bool` | -| [dataflow_job_machine_count](#input_dataflow_job_machine_count) | (Optional) Dataflow job max worker count (default 2) | `number` | -| [dataflow_job_machine_type](#input_dataflow_job_machine_type) | (Optional) Dataflow job worker machine type (default 'n1-standard-4') | `string` | -| [dataflow_job_parallelism](#input_dataflow_job_parallelism) | (Optional) Maximum parallel requests to Splunk (default 8) | `number` | -| [dataflow_job_udf_function_name](#input_dataflow_job_udf_function_name) | (Optional) Name of JavaScript function to be called (default No UDF used) | `string` | -| [dataflow_job_udf_gcs_path](#input_dataflow_job_udf_gcs_path) | (Optional) GCS path for JavaScript file (default No UDF used) | `string` | -| [dataflow_template_version](#input_dataflow_template_version) | (Optional) Dataflow template release version (default 'latest'). Override this for version pinning e.g. '2021-08-02-00_RC00'. Must specify version only since template GCS path will be deduced automatically: 'gs://dataflow-templates/`version`/Cloud_PubSub_to_Splunk' | `string` | -| [dataflow_worker_service_account](#input_dataflow_worker_service_account) | (Optional) Name of Dataflow worker service account to be created and used to execute job operations. In the default case of creating a new service account (`use_externally_managed_dataflow_sa=false`), this parameter must be 6-30 characters long, and match the regular expression [a-z]([-a-z0-9]*[a-z0-9]). If the parameter is empty, worker service account defaults to project's Compute Engine default service account. If using external service account (`use_externally_managed_dataflow_sa=true`), this parameter must be the full email address of the external service account. | `string` | -| [deploy_replay_job](#input_deploy_replay_job) | (Optional) Determines if replay pipeline should be deployed or not (default: `false`) | `bool` | -| [primary_subnet_cidr](#input_primary_subnet_cidr) | The CIDR Range of the primary subnet | `string` | -| [scoping_project](#input_scoping_project) | Cloud Monitoring scoping project ID to create dashboard under.
This assumes a pre-existing scoping project whose metrics scope contains the `project` where dataflow job is to be deployed.
See [Cloud Monitoring settings](https://cloud.google.com/monitoring/settings) for more details on scoping project.
If parameter is empty, scoping project defaults to value of `project` parameter above. | `string` | -| [splunk_hec_token](#input_splunk_hec_token) | (Optional) Splunk HEC token. Must be defined if `splunk_hec_token_source` if type of `PLAINTEXT` or `KMS`. | `string` | -| [splunk_hec_token_kms_encryption_key](#input_splunk_hec_token_kms_encryption_key) | (Optional) The Cloud KMS key to decrypt the HEC token string. Required if `splunk_hec_token_source` is type of KMS (default: '') | `string` | -| [splunk_hec_token_secret_id](#input_splunk_hec_token_secret_id) | (Optional) Id of the Secret for Splunk HEC token. Required if `splunk_hec_token_source` is type of SECRET_MANAGER (default: '') | `string` | -| [splunk_hec_token_source](#input_splunk_hec_token_source) | (Optional) Define in which type HEC token is provided. Possible options: [PLAINTEXT, KMS, SECRET_MANAGER]. Default: PLAINTEXT | `string` | -| [subnet](#input_subnet) | Subnet to deploy into. This is required when deploying into existing network (`create_network=false`) (e.g. Shared VPC) | `string` | -| [use_externally_managed_dataflow_sa](#input_use_externally_managed_dataflow_sa) | (Optional) Determines if the worker service account provided by `dataflow_worker_service_account` variable should be created by this module (default) or is managed outside of the module. In the latter case, user is expected to apply and manage the service account IAM permissions over external resources (e.g. Cloud KMS key or Secret version) before running this module. | `bool` | +| Name | Description | Type | Default | Required | +|------|-------------|------|---------|:--------:| +| [dataflow_job_name](#input_dataflow_job_name) | Dataflow job name. No spaces | `string` | n/a | yes | +| [log_filter](#input_log_filter) | Log filter to use when exporting logs | `string` | n/a | yes | +| [network](#input_network) | Network to deploy into | `string` | n/a | yes | +| [project](#input_project) | Project ID to deploy resources in | `string` | n/a | yes | +| [region](#input_region) | Region to deploy regional-resources into. This must match subnet's region if deploying into existing network (e.g. Shared VPC). See `subnet` parameter below | `string` | n/a | yes | +| [splunk_hec_url](#input_splunk_hec_url) | Splunk HEC URL to write data to. Example: https://[MY_SPLUNK_IP_OR_FQDN]:8088 | `string` | n/a | yes | +| [create_network](#input_create_network) | Boolean value specifying if a new network needs to be created. | `bool` | `false` | no | +| [dataflow_job_batch_count](#input_dataflow_job_batch_count) | Batch count of messages in single request to Splunk | `number` | `50` | no | +| [dataflow_job_disable_certificate_validation](#input_dataflow_job_disable_certificate_validation) | Boolean to disable SSL certificate validation | `bool` | `false` | no | +| [dataflow_job_machine_count](#input_dataflow_job_machine_count) | Dataflow job max worker count | `number` | `2` | no | +| [dataflow_job_machine_type](#input_dataflow_job_machine_type) | Dataflow job worker machine type | `string` | `"n1-standard-4"` | no | +| [dataflow_job_parallelism](#input_dataflow_job_parallelism) | Maximum parallel requests to Splunk | `number` | `8` | no | +| [dataflow_job_udf_function_name](#input_dataflow_job_udf_function_name) | Name of JavaScript function to be called | `string` | `""` | no | +| [dataflow_job_udf_gcs_path](#input_dataflow_job_udf_gcs_path) | GCS path for JavaScript file | `string` | `""` | no | +| [dataflow_template_version](#input_dataflow_template_version) | Dataflow template release version (default 'latest'). Override this for version pinning e.g. '2021-08-02-00_RC00'. Must specify version only since template GCS path will be deduced automatically: 'gs://dataflow-templates/`version`/Cloud_PubSub_to_Splunk' | `string` | `"latest"` | no | +| [dataflow_worker_service_account](#input_dataflow_worker_service_account) | Name of Dataflow worker service account to be created and used to execute job operations. In the default case of creating a new service account (`use_externally_managed_dataflow_sa=false`), this parameter must be 6-30 characters long, and match the regular expression [a-z]([-a-z0-9]*[a-z0-9]). If the parameter is empty, worker service account defaults to project's Compute Engine default service account. If using external service account (`use_externally_managed_dataflow_sa=true`), this parameter must be the full email address of the external service account. | `string` | `""` | no | +| [deploy_replay_job](#input_deploy_replay_job) | Determines if replay pipeline should be deployed or not | `bool` | `false` | no | +| [gcs_kms_key_name](#input_gcs_kms_key_name) | The `id` of a Cloud KMS key that will be used to encrypt objects inserted into temporary bucket.
User must sure that `roles/cloudkms.cryptoKeyEncrypterDecrypter` is granted to this key for Cloud Storage Service Identity. | `string` | `""` | no | +| [primary_subnet_cidr](#input_primary_subnet_cidr) | The CIDR Range of the primary subnet | `string` | `"10.128.0.0/20"` | no | +| [scoping_project](#input_scoping_project) | Cloud Monitoring scoping project ID to create dashboard under.
This assumes a pre-existing scoping project whose metrics scope contains the `project` where dataflow job is to be deployed.
See [Cloud Monitoring settings](https://cloud.google.com/monitoring/settings) for more details on scoping project.
If parameter is empty, scoping project defaults to value of `project` parameter above. | `string` | `""` | no | +| [splunk_hec_token](#input_splunk_hec_token) | Splunk HEC token. Must be defined if `splunk_hec_token_source` if type of `PLAINTEXT` or `KMS`. | `string` | `""` | no | +| [splunk_hec_token_kms_encryption_key](#input_splunk_hec_token_kms_encryption_key) | The Cloud KMS key to decrypt the HEC token string. Required if `splunk_hec_token_source` is type of KMS | `string` | `""` | no | +| [splunk_hec_token_secret_id](#input_splunk_hec_token_secret_id) | Id of the Secret for Splunk HEC token. Required if `splunk_hec_token_source` is type of SECRET_MANAGER | `string` | `""` | no | +| [splunk_hec_token_source](#input_splunk_hec_token_source) | Define in which type HEC token is provided. Possible options: [PLAINTEXT, KMS, SECRET_MANAGER]. | `string` | `"PLAINTEXT"` | no | +| [subnet](#input_subnet) | Subnet to deploy into. This is required when deploying into existing network (`create_network=false`) (e.g. Shared VPC) | `string` | `""` | no | +| [use_externally_managed_dataflow_sa](#input_use_externally_managed_dataflow_sa) | Determines if the worker service account provided by `dataflow_worker_service_account` variable should be created by this module (default) or is managed outside of the module. In the latter case, user is expected to apply and manage the service account IAM permissions over external resources (e.g. Cloud KMS key or Secret version) before running this module. | `bool` | `false` | no | #### Outputs | Name | Description | @@ -163,6 +164,11 @@ To delete resources created by Terraform, run the following then confirm: $ terraform destroy ``` +### CMEK +Project support providing of Customer managed encription keys for different services. +All CMEK configurations are not compartible with +1. GCS User is responsible for granting of `roles/cloudkms.cryptoKeyEncrypterDecrypter` for Cloud Storage service identity. + ### TODOs * Expose logging level knob diff --git a/pipeline.tf b/pipeline.tf index 389322a..9574df0 100644 --- a/pipeline.tf +++ b/pipeline.tf @@ -38,6 +38,12 @@ resource "google_storage_bucket" "dataflow_job_temp_bucket" { location = var.region storage_class = "REGIONAL" uniform_bucket_level_access = true + dynamic "encryption" { + for_each = (var.gcs_kms_key_name == "") ? [] : [1] + content { + default_kms_key_name = var.gcs_kms_key_name + } + } } resource "google_storage_bucket_object" "dataflow_job_temp_object" { diff --git a/variables.tf b/variables.tf index 44e9371..e3a034d 100644 --- a/variables.tf +++ b/variables.tf @@ -80,7 +80,7 @@ variable "splunk_hec_url" { variable "splunk_hec_token_source" { type = string default = "PLAINTEXT" - description = "(Optional) Define in which type HEC token is provided. Possible options: [PLAINTEXT, KMS, SECRET_MANAGER]. Default: PLAINTEXT" + description = "Define in which type HEC token is provided. Possible options: [PLAINTEXT, KMS, SECRET_MANAGER]." validation { condition = contains(["PLAINTEXT", "KMS", "SECRET_MANAGER"], var.splunk_hec_token_source) @@ -90,14 +90,14 @@ variable "splunk_hec_token_source" { variable "splunk_hec_token" { type = string - description = "(Optional) Splunk HEC token. Must be defined if `splunk_hec_token_source` if type of `PLAINTEXT` or `KMS`." + description = "Splunk HEC token. Must be defined if `splunk_hec_token_source` if type of `PLAINTEXT` or `KMS`." default = "" sensitive = true } variable "splunk_hec_token_kms_encryption_key" { type = string - description = "(Optional) The Cloud KMS key to decrypt the HEC token string. Required if `splunk_hec_token_source` is type of KMS (default: '')" + description = "The Cloud KMS key to decrypt the HEC token string. Required if `splunk_hec_token_source` is type of KMS" default = "" validation { condition = can(regex("^projects\\/[^\\n\\r\\/]+\\/locations\\/[^\\n\\r\\/]+\\/keyRings\\/[^\\n\\r\\/]+\\/cryptoKeys\\/[^\\n\\r\\/]+$", var.splunk_hec_token_kms_encryption_key)) || var.splunk_hec_token_kms_encryption_key == "" @@ -108,7 +108,7 @@ variable "splunk_hec_token_kms_encryption_key" { # TODO: Make cross variable validation once https://github.com/hashicorp/terraform/issues/25609 is resolved variable "splunk_hec_token_secret_id" { type = string - description = "(Optional) Id of the Secret for Splunk HEC token. Required if `splunk_hec_token_source` is type of SECRET_MANAGER (default: '')" + description = "Id of the Secret for Splunk HEC token. Required if `splunk_hec_token_source` is type of SECRET_MANAGER" default = "" validation { condition = can(regex("^projects\\/[^\\n\\r\\/]+\\/secrets\\/[^\\n\\r\\/]+\\/versions\\/[^\\n\\r\\/]+$", var.splunk_hec_token_secret_id)) || var.splunk_hec_token_secret_id == "" @@ -120,13 +120,13 @@ variable "splunk_hec_token_secret_id" { variable "dataflow_template_version" { type = string - description = "(Optional) Dataflow template release version (default 'latest'). Override this for version pinning e.g. '2021-08-02-00_RC00'. Must specify version only since template GCS path will be deduced automatically: 'gs://dataflow-templates/`version`/Cloud_PubSub_to_Splunk'" + description = "Dataflow template release version (default 'latest'). Override this for version pinning e.g. '2021-08-02-00_RC00'. Must specify version only since template GCS path will be deduced automatically: 'gs://dataflow-templates/`version`/Cloud_PubSub_to_Splunk'" default = "latest" } variable "dataflow_worker_service_account" { type = string - description = "(Optional) Name of Dataflow worker service account to be created and used to execute job operations. In the default case of creating a new service account (`use_externally_managed_dataflow_sa=false`), this parameter must be 6-30 characters long, and match the regular expression [a-z]([-a-z0-9]*[a-z0-9]). If the parameter is empty, worker service account defaults to project's Compute Engine default service account. If using external service account (`use_externally_managed_dataflow_sa=true`), this parameter must be the full email address of the external service account." + description = "Name of Dataflow worker service account to be created and used to execute job operations. In the default case of creating a new service account (`use_externally_managed_dataflow_sa=false`), this parameter must be 6-30 characters long, and match the regular expression [a-z]([-a-z0-9]*[a-z0-9]). If the parameter is empty, worker service account defaults to project's Compute Engine default service account. If using external service account (`use_externally_managed_dataflow_sa=true`), this parameter must be the full email address of the external service account." default = "" validation { @@ -145,54 +145,67 @@ variable "dataflow_job_name" { variable "dataflow_job_machine_type" { type = string - description = "(Optional) Dataflow job worker machine type (default 'n1-standard-4')" + description = "Dataflow job worker machine type" default = "n1-standard-4" } variable "dataflow_job_machine_count" { - description = "(Optional) Dataflow job max worker count (default 2)" + description = "Dataflow job max worker count" type = number default = 2 } variable "dataflow_job_parallelism" { - description = "(Optional) Maximum parallel requests to Splunk (default 8)" + description = "Maximum parallel requests to Splunk" type = number default = 8 } variable "dataflow_job_batch_count" { - description = "(Optional) Batch count of messages in single request to Splunk (default 50)" + description = "Batch count of messages in single request to Splunk" type = number default = 50 } variable "dataflow_job_disable_certificate_validation" { - description = "(Optional) Boolean to disable SSL certificate validation (default `false`)" + description = "Boolean to disable SSL certificate validation" type = bool default = false } variable "dataflow_job_udf_gcs_path" { type = string - description = "(Optional) GCS path for JavaScript file (default No UDF used)" + description = "GCS path for JavaScript file" default = "" } variable "dataflow_job_udf_function_name" { type = string - description = "(Optional) Name of JavaScript function to be called (default No UDF used)" + description = "Name of JavaScript function to be called" default = "" } variable "deploy_replay_job" { type = bool - description = "(Optional) Determines if replay pipeline should be deployed or not (default: `false`)" + description = "Determines if replay pipeline should be deployed or not" default = false } variable "use_externally_managed_dataflow_sa" { type = bool default = false - description = "(Optional) Determines if the worker service account provided by `dataflow_worker_service_account` variable should be created by this module (default) or is managed outside of the module. In the latter case, user is expected to apply and manage the service account IAM permissions over external resources (e.g. Cloud KMS key or Secret version) before running this module." + description = "Determines if the worker service account provided by `dataflow_worker_service_account` variable should be created by this module (default) or is managed outside of the module. In the latter case, user is expected to apply and manage the service account IAM permissions over external resources (e.g. Cloud KMS key or Secret version) before running this module." +} + +variable "gcs_kms_key_name" { + type = string + description = < Date: Tue, 21 Mar 2023 15:58:23 +0000 Subject: [PATCH 2/2] Update doc for CMEK --- README.md | 17 ++++------------- variables.tf | 26 +++++++++++++------------- 2 files changed, 17 insertions(+), 26 deletions(-) diff --git a/README.md b/README.md index 8520385..eaeb898 100644 --- a/README.md +++ b/README.md @@ -35,7 +35,7 @@ These deployment templates are provided as is, without warranty. See [Copyright | [dataflow_template_version](#input_dataflow_template_version) | Dataflow template release version (default 'latest'). Override this for version pinning e.g. '2021-08-02-00_RC00'. Must specify version only since template GCS path will be deduced automatically: 'gs://dataflow-templates/`version`/Cloud_PubSub_to_Splunk' | `string` | `"latest"` | no | | [dataflow_worker_service_account](#input_dataflow_worker_service_account) | Name of Dataflow worker service account to be created and used to execute job operations. In the default case of creating a new service account (`use_externally_managed_dataflow_sa=false`), this parameter must be 6-30 characters long, and match the regular expression [a-z]([-a-z0-9]*[a-z0-9]). If the parameter is empty, worker service account defaults to project's Compute Engine default service account. If using external service account (`use_externally_managed_dataflow_sa=true`), this parameter must be the full email address of the external service account. | `string` | `""` | no | | [deploy_replay_job](#input_deploy_replay_job) | Determines if replay pipeline should be deployed or not | `bool` | `false` | no | -| [gcs_kms_key_name](#input_gcs_kms_key_name) | The `id` of a Cloud KMS key that will be used to encrypt objects inserted into temporary bucket.
User must sure that `roles/cloudkms.cryptoKeyEncrypterDecrypter` is granted to this key for Cloud Storage Service Identity. | `string` | `""` | no | +| [gcs_kms_key_name](#input_gcs_kms_key_name) | Cloud KMS key resource ID, to be used as default encryption key for the temporary storage bucket used by the Dataflow job.
If set, make sure to pre-authorize Cloud Storage service agent associated with that bucket to use that key for encrypting and decrypting. | `string` | `""` | no | | [primary_subnet_cidr](#input_primary_subnet_cidr) | The CIDR Range of the primary subnet | `string` | `"10.128.0.0/20"` | no | | [scoping_project](#input_scoping_project) | Cloud Monitoring scoping project ID to create dashboard under.
This assumes a pre-existing scoping project whose metrics scope contains the `project` where dataflow job is to be deployed.
See [Cloud Monitoring settings](https://cloud.google.com/monitoring/settings) for more details on scoping project.
If parameter is empty, scoping project defaults to value of `project` parameter above. | `string` | `""` | no | | [splunk_hec_token](#input_splunk_hec_token) | Splunk HEC token. Must be defined if `splunk_hec_token_source` if type of `PLAINTEXT` or `KMS`. | `string` | `""` | no | @@ -164,19 +164,10 @@ To delete resources created by Terraform, run the following then confirm: $ terraform destroy ``` -### CMEK -Project support providing of Customer managed encription keys for different services. -All CMEK configurations are not compartible with -1. GCS User is responsible for granting of `roles/cloudkms.cryptoKeyEncrypterDecrypter` for Cloud Storage service identity. - -### TODOs - -* Expose logging level knob -* ~~Support KMS-encrypted HEC token~~ -* ~~Create replay pipeline~~ -* ~~Create secure network for self-contained setup if existing network is not provided~~ -* ~~Add Cloud Monitoring dashboard~~ +### Using customer-managed encryption keys (CMEK) +For those who require CMEK, this module accepts CMEK keys for the following services: +- Cloud Storage: see `gcs_kms_key_name` input parameter. You are responsible for granting Cloud Storage service agent the role Cloud KMS CryptoKey Encrypter/Decrypter (`roles/cloudkms.cryptoKeyEncrypterDecrypter`) in order to use the provided Cloud KMS key for encrypting and decrypting objects in the temporary storage bucket. The Cloud KMS key must be available in the location that the temporary bucket is created in (specified in `var.region`). For more details, see [Use customer-managed encryption keys](https://cloud.google.com/storage/docs/encryption/using-customer-managed-keys) in Cloud Storage docs. ### Authors diff --git a/variables.tf b/variables.tf index e3a034d..a2a2985 100644 --- a/variables.tf +++ b/variables.tf @@ -116,6 +116,19 @@ variable "splunk_hec_token_secret_id" { } } +variable "gcs_kms_key_name" { + type = string + description = <