Skip to content

Conversation

@whabanks
Copy link
Contributor

Summary | Résumé

This PR tweaks the sns-sms-success-rate-canadian-numbers-us-west-2-critical alarm in the same manner as the warning alarm adding a volume threshold to ensure that the alarm does not trigger due to a reduction in sending volume.

Related Issues | Cartes liées

Before merging this PR

Read code suggestions left by the
cds-ai-codereviewer bot. Address
valid suggestions and shortly write down reasons to not address others. To help
with the classification of the comments, please use these reactions on each of the
comments made by the AI review:

Classification Reaction Emoticon
Useful +1 👍
Noisy eyes 👀
Hallucination confused 😕
Wrong but teachable rocket 🚀
Wrong and incorrect -1 👎

The classifications will be extracted and summarized into an analysis of how helpful
or not the AI code review really is.

Test instructions | Instructions pour tester la modification

TODO: Fill in test instructions for the reviewer.

Release Instructions | Instructions pour le déploiement

None.

Reviewer checklist | Liste de vérification du réviseur

  • This PR does not break existing functionality.
  • This PR does not violate GCNotify's privacy policies.
  • This PR does not raise new security concerns. Refer to our GC Notify Risk Register document on our Google drive.
  • This PR does not significantly alter performance.
  • Additional required documentation resulting of these changes is covered (such as the README, setup instructions, a related ADR or the technical documentation).

⚠ If boxes cannot be checked off before merging the PR, they should be moved to the "Release Instructions" section with appropriate steps required to verify before release. For example, changes to celery code may require tests on staging to verify that performance has not been affected.

- Added the threshold and additional metrics similar to the warning version of this alarm
[review]
@whabanks whabanks requested a review from jimleroyer as a code owner April 23, 2025 16:48
@github-actions
Copy link

Updating alarms ⏰? Great! Please update the Google Sheet and add a 👍 to this message after 🙏

label = "Guarded Alarm Condition"
return_data = true
expression = "IF(successRate < 0.85 AND messagesPublished > 75, 0, 1)"
expression = "IF(successRate < 0.85 AND messagesPublished > 85, 0, 1)"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a typo on my part

@whabanks whabanks requested a review from P0NDER0SA April 23, 2025 16:52
@whabanks whabanks requested a review from ben851 April 23, 2025 16:52
@github-actions
Copy link

Updating alarms ⏰? Great! Please update the Google Sheet and add a 👍 to this message after 🙏

1 similar comment
@github-actions
Copy link

Updating alarms ⏰? Great! Please update the Google Sheet and add a 👍 to this message after 🙏

Copy link
Contributor

@P0NDER0SA P0NDER0SA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!

Copy link
Member

@andrewleith andrewleith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although I do find it quite hard to read/validate metrics/alarms/etc. in general

}

metric_query {
id = "alarmCondition"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it not matter that there are two metric_query's with the same id?

Copy link
Contributor Author

@whabanks whabanks Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the metric_query's are scoped to the metric alarms themselves and shouldn't collide. @P0NDER0SA is that correct?

@jimleroyer jimleroyer requested a review from Copilot May 13, 2025 17:40
@github-actions
Copy link

Updating alarms ⏰? Great! Please update the Google Sheet and add a 👍 to this message after 🙏

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the SNS SMS success rate alarm for Canadian numbers in the US West 2 region by tweaking threshold values and switching to metric queries for improved alarm evaluation.

  • Updated the volume threshold in the alarm condition for SNS SMS success rate.
  • Refactored the critical alarm resource to use metric queries instead of direct metric configuration.
Comments suppressed due to low confidence (2)

aws/common/cloudwatch_alarms_sms.tf:143

  • Verify that the updated threshold for messagesPublished (now 85) meets the intended alerting criteria compared to the previous value.
expression  = "IF(successRate < 0.85 AND messagesPublished > 85, 0, 1)"

aws/common/cloudwatch_alarms_sms.tf:177

  • Confirm that setting the threshold as a string '1' is supported and behaves correctly in numeric comparisons within this CloudWatch alarm configuration.
threshold           = "1"

@github-actions
Copy link

github-actions bot commented Jun 2, 2025

staging: common

✅   Terraform Init: success
✅   Terraform Validate: success
✅   Terraform Format: success
✅   Terraform Plan: success
✅   Conftest: success

Plan: 0 to add, 2 to change, 0 to destroy
Show summary
CHANGE NAME
update aws_cloudwatch_metric_alarm.sns-sms-success-rate-canadian-numbers-us-west-2-critical[0]
aws_cloudwatch_metric_alarm.sns-sms-success-rate-canadian-numbers-us-west-2-warning[0]
Show plan
Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_cloudwatch_metric_alarm.sns-sms-success-rate-canadian-numbers-us-west-2-critical[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "sns-sms-success-rate-canadian-numbers-us-west-2-critical" {
      ~ dimensions                            = {
          - "Country" = "CA" -> null
          - "SMSType" = "Transactional" -> null
        }
        id                                    = "sns-sms-success-rate-canadian-numbers-us-west-2-critical"
      - metric_name                           = "SMSSuccessRate" -> null
      - namespace                             = "AWS/SNS" -> null
      - period                                = 43200 -> null
      - statistic                             = "Average" -> null
        tags                                  = {}
      ~ threshold                             = 0.75 -> 1
        # (16 unchanged attributes hidden)

      + metric_query {
          + id          = "messagesPublished"
          + label       = "Messages Published"
          + return_data = false
            # (2 unchanged attributes hidden)

          + metric {
              + dimensions  = {
                  + "Country" = "CA"
                  + "SMSType" = "Transactional"
                }
              + metric_name = "NumberOfMessagesPublished"
              + namespace   = "AWS/SNS"
              + period      = 43200
              + stat        = "Sum"
                # (1 unchanged attribute hidden)
            }
        }
      + metric_query {
          + id          = "successRate"
          + label       = "Success Rate"
          + return_data = false
            # (2 unchanged attributes hidden)

          + metric {
              + dimensions  = {
                  + "Country" = "CA"
                  + "SMSType" = "Transactional"
                }
              + metric_name = "SMSSuccessRate"
              + namespace   = "AWS/SNS"
              + period      = 43200
              + stat        = "Average"
                # (1 unchanged attribute hidden)
            }
        }
      + metric_query {
          + expression  = "IF(successRate < 0.75 AND messagesPublished > 75, 0, 1)"
          + id          = "alarmCondition"
          + label       = "Guarded Alarm Condition"
          + return_data = true
            # (1 unchanged attribute hidden)
        }
    }

  # aws_cloudwatch_metric_alarm.sns-sms-success-rate-canadian-numbers-us-west-2-warning[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "sns-sms-success-rate-canadian-numbers-us-west-2-warning" {
        id                                    = "sns-sms-success-rate-canadian-numbers-us-west-2-warning"
        tags                                  = {}
        # (22 unchanged attributes hidden)

      - metric_query {
          - id          = "messagesPublished" -> null
          - label       = "Messages Published" -> null
          - period      = 0 -> null
          - return_data = false -> null
            # (2 unchanged attributes hidden)

          - metric {
              - dimensions  = {
                  - "Country" = "CA"
                  - "SMSType" = "Transactional"
                } -> null
              - metric_name = "NumberOfMessagesPublished" -> null
              - namespace   = "AWS/SNS" -> null
              - period      = 43200 -> null
              - stat        = "Sum" -> null
                # (1 unchanged attribute hidden)
            }
        }
      - metric_query {
          - id          = "successRate" -> null
          - label       = "Success Rate" -> null
          - period      = 0 -> null
          - return_data = false -> null
            # (2 unchanged attributes hidden)

          - metric {
              - dimensions  = {
                  - "Country" = "CA"
                  - "SMSType" = "Transactional"
                } -> null
              - metric_name = "SMSSuccessRate" -> null
              - namespace   = "AWS/SNS" -> null
              - period      = 43200 -> null
              - stat        = "Average" -> null
                # (1 unchanged attribute hidden)
            }
        }
      - metric_query {
          - expression  = "IF(successRate < 0.85 AND messagesPublished > 75, 0, 1)" -> null
          - id          = "alarmCondition" -> null
          - label       = "Guarded Alarm Condition" -> null
          - period      = 0 -> null
          - return_data = true -> null
            # (1 unchanged attribute hidden)
        }
      + metric_query {
          + expression  = "IF(successRate < 0.85 AND messagesPublished > 85, 0, 1)"
          + id          = "alarmCondition"
          + label       = "Guarded Alarm Condition"
          + return_data = true
            # (1 unchanged attribute hidden)
        }
      + metric_query {
          + id          = "messagesPublished"
          + label       = "Messages Published"
          + return_data = false

          + metric {
              + dimensions  = {
                  + "Country" = "CA"
                  + "SMSType" = "Transactional"
                }
              + metric_name = "NumberOfMessagesPublished"
              + namespace   = "AWS/SNS"
              + period      = 43200
              + stat        = "Sum"
            }
        }
      + metric_query {
          + id          = "successRate"
          + label       = "Success Rate"
          + return_data = false

          + metric {
              + dimensions  = {
                  + "Country" = "CA"
                  + "SMSType" = "Transactional"
                }
              + metric_name = "SMSSuccessRate"
              + namespace   = "AWS/SNS"
              + period      = 43200
              + stat        = "Average"
            }
        }
    }

Plan: 0 to add, 2 to change, 0 to destroy.

Warning: Argument is deprecated

  with aws_s3_bucket.csv_bucket,
  on s3.tf line 5, in resource "aws_s3_bucket" "csv_bucket":
   5: resource "aws_s3_bucket" "csv_bucket" {

server_side_encryption_configuration is deprecated. Use the
aws_s3_bucket_server_side_encryption_configuration resource instead.

(and 69 more similar warnings elsewhere)

─────────────────────────────────────────────────────────────────────────────

Saved the plan to: plan.tfplan

To perform exactly these actions, run the following command to apply:
    terraform apply "plan.tfplan"
Show Conftest results
WARN - plan.json - main - Missing Common Tags: ["aws_athena_workgroup.ad_hoc"]
WARN - plan.json - main - Missing Common Tags: ["aws_athena_workgroup.build_tables"]
WARN - plan.json - main - Missing Common Tags: ["aws_athena_workgroup.primary"]
WARN - plan.json - main - Missing Common Tags: ["aws_athena_workgroup.support"]
WARN - plan.json - main - Missing Common Tags: ["aws_budgets_budget.cloudwatch_data_scanned"]
WARN - plan.json - main - Missing Common Tags: ["aws_budgets_budget.notify_global"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_event_rule.aws_health[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.route53_resolver_query_log[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.sns_deliveries[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.sns_deliveries_failures[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.sns_deliveries_failures_us_west_2[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.sns_deliveries_us_west_2[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.bulk-bulk-not-being-processed-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.bulk-bulk-not-being-processed-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.bulk-inflights-not-being-processed-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.bulk-inflights-not-being-processed-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.bulk-not-being-processed-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.bulk-not-being-processed-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.contact-3-500-error-15-minutes-critical[0]"]
WARN - plan.json - main - Missing Common...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants