MGDAPI-5690 CRO Redis snapshots tagging logics - avoid replace existing identical tags #888

valerymo · 2025-07-22T12:50:25Z

Overview

Jira: https://issues.redhat.com/browse/MGDAPI-5690

We want to avoid a situation where the CRO applies the same tags to Redis snapshots during every reconciliation cycle.
Instead, we aim to apply only new tags or tags that have changed.
This PR includes changes to the Redis snapshot tagging logic to support this behavior.

NOTES
This PR requires adding the following permission to your AWS user policy:
elasticache:ListTagsForResource

However, we are hitting an AWS limitation:
Inline policies for users, groups, or roles are limited to 2,048 characters per policy.

To address this limitation, we removed two currently unused permissions.
The following permissions can be safely removed (see comments below):

iam:CreateServiceLinkedRole

No direct usage found in any provider code
This permission might have been added for potential future functionality, but it is not currently used
It's only required for CloudWatch alarm operations. However, the RHOAM operator uses CloudWatch only for metrics collection via the GetMetricData API call

cloudwatch:ListMetrics

Although the permission is defined, the application never makes the corresponding AWS API call
The method func (r *RealCloudWatchClient) ListMetrics(...) exists only to satisfy an interface contract and is not invoked anywhere in the application

Verification

check that your AWS user has elasticache:ListTagsForResource permission in the Permissions policies

$ oc get credentialsrequest cloud-resources-aws-credentials -n cloud-resource-operator -o yaml | grep -E "user:|policy:"
$ aws iam get-user-policy --user-name <user-name> --policy-name <policy-name> | grep -i listtagsforresource |grep elast

#expected to see:
  "elasticache:ListTagsForResource"

Clone this branch
Run make cluster/prepare
Run make run
Check that tags created on Redis snapshots
Ensure no redundant tag operations on Redis backups/snapshots.
Tags should be created or updated only if missing or different — e.g., on the first cycle or when tags change.
On later cycles with no config changes, no tag action should occur.
Expected logs:
In case of new or changed tag(s):

INFO[0030] creating or updating tags on elasticache nodes and snapshots
INFO[0031] Successfully applied 1 new/updated tags to cluster arn:aws:elasticache<redis-cluster-name>

In case of no changes:

INFO[0026] creating or updating tags on elasticache nodes and snapshots 
INFO[0028] Redis cluster arn:aws:elasticache:<redis-cluster-name>: no tag changes required

openshift-ci · 2025-07-22T12:50:38Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign carlkyrillos for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov · 2025-07-23T09:48:52Z

Codecov Report

❌ Patch coverage is 75.75758% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.88%. Comparing base (5450c62) to head (fd000af).

Files with missing lines	Patch %	Lines
pkg/providers/aws/provider_redis.go	75.75%	7 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #888      +/-   ##
==========================================
+ Coverage   67.83%   67.88%   +0.05%     
==========================================
  Files          42       42              
  Lines        5326     5350      +24     
==========================================
+ Hits         3613     3632      +19     
- Misses       1350     1354       +4     
- Partials      363      364       +1

Files with missing lines	Coverage Δ
pkg/providers/aws/credentials.go	`90.19% <ø> (ø)`
pkg/providers/aws/provider_redis.go	`60.22% <75.75%> (+0.64%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

austincunningham · 2025-07-28T07:08:02Z

pkg/providers/aws/credentials.go

 				"elasticache:ModifyCacheSubnetGroup",
 				"elasticache:DeleteCacheSubnetGroup",
 				"elasticache:ModifyReplicationGroup",
+				"elasticache:ListTagsForResource",


In the passed we reached an aws limit of permissions we could add to a single account. Just wondering are you seeing these limits when you add this permission. Think it was the main reason why we didn't proceed with this change in the passed.

Yes, that was the reason. I added ListTagsForResource in AWS, but removed something just for testing. I did the same in CRO, and it's working for me now. Logging has also improved.
However, I'm continuing the investigation — even though the CRO logs look good, there are still "AddTag..." events showing up in CloudTrail.
Thank you!

Hey @austincunningham, Just a small fix applied — I had forgotten to add a condition for snapshots:
if len(snapshotList.Snapshots) > 0 && len(filteredTags) > 0.
No more tagging events appearing in CloudTrail.
Thank you
(remains - check unit tests, after latest updates)

Unit tests should be ok now.

Permissions removed:

iam:CreateServiceLinkedRole

No direct usage found in any provider code

Only referenced in vendor documentation for CloudWatch alarms: "// If you are an IAM user, you must have iam:CreateServiceLinkedRole to create a composite alarm that has Systems Manager OpsItem actions."

This permission might have been added for future functionality but isn't currently used

Seems we can safely remove iam:CreateServiceLinkedRole because:

It's only needed for CloudWatch alarm operations

RHOAM operator only uses CloudWatch for metrics collection (GetMetricData)

cloudwatch:ListMetrics
cloudwatch:ListMetrics can be safely removed because:

The permission is for the AWS API call, but The application never makes that API call

The method func (r *RealCloudWatchClient) ListMetrics(...) exists only to satisfy the interface contract

austincunningham · 2025-09-26T09:48:15Z

pkg/providers/aws/credentials.go

-				"cloudwatch:ListMetrics",
-				"cloudwatch:GetMetricData",
+				//"iam:CreateServiceLinkedRole", // Only needed for CloudWatch alarms (not used)
+				//"cloudwatch:ListMetrics",	// Only needed for metric discovery (not used)


I would be inclined to leave ListMetrics in as it is referenced in the interface

As the changes are metrics based, think I will test this in rhoam to confirm that the alerting is working.

something up with metrics when deployed on rhoam.

steps I took

edited the org in the makefile to my quay.io org

ran make image/push

install the rhoam addon on a ccs cluster with useclusterstorage: 'false'

patched the rhoam cr with the cluster package workaround

oc -n redhat-rhoam-operator patch rhmis.integreatly.org rhoam \ --type=merge --subresource=status \ -p '{"status":{ "preflightMessage":"preflight checks passed", "stage":"Preflight Checks", "preflightStatus": "successful", "stages":{} }}'

manually updated the operator images in the rhoam csv and the cro csv to point to the one I just built

port forward the prometheus service to port 9089

oc port-forward services/rhoam-prometheus 9089:9090 -n redhat-rhoam-operator-observability

checked the status targets and found that the serviceMonitor/redhat-rhoam-operator-observability/cloud-resource-operator-metrics/0 was down.

So we are not serving metrics for prometheus to consume.

I changed the image back to the normal one and metrics endpoint was exposed and working again.

Might be worth checking an image built of master in cro to see if that has the same issue.

update - done. Test - TODO. Thank you

austincunningham · 2025-09-26T09:53:47Z

pkg/providers/aws/credentials.go

-				"cloudwatch:ListMetrics",
-				"cloudwatch:GetMetricData",
+				//"iam:CreateServiceLinkedRole", // Only needed for CloudWatch alarms (not used)
+				//"cloudwatch:ListMetrics",	// Only needed for metric discovery (not used)


As the changes are metrics based, think I will test this in rhoam to confirm that the alerting is working.

austincunningham · 2025-09-26T10:24:20Z

pkg/providers/aws/provider_redis.go

 	if err != nil {
-		msg := "failed to add tags to aws elasticache :"
-		return croType.StatusMessage(msg), err
+		msg := "Failed to filter already applied tags"


Suggested change

msg := "Failed to filter already applied tags"

msg := "failed to filter already applied tags"

small thing , a convention that we always use lower case in error messages. Although we don't always appear to follow it .

Done. Thank you

austincunningham · 2025-09-26T10:59:03Z

pkg/providers/aws/provider_redis.go

+			msg := "failed to add tags to aws elasticache :"
+			return croType.StatusMessage(msg), err
+		}
+		logrus.Infof("Successfully applied %d new/updated tags to cluster %s", len(filteredTags), arn)


Don't put the arn in the log messages potential security hole as it exposes the account number and region

maybe use CacheClusterId instead

Done. Thank you

austincunningham · 2025-09-26T10:59:29Z

pkg/providers/aws/provider_redis.go

+		}
+		logrus.Infof("Successfully applied %d new/updated tags to cluster %s", len(filteredTags), arn)
+	} else {
+		logrus.Infof("Redis cluster %s: no tag changes required", arn)


same here don't put the arn in the log message.

Done, thank you

austincunningham · 2025-09-26T11:00:02Z

pkg/providers/aws/provider_redis.go

+	})
+	if err != nil {
+		// If we can't list tags (permission issue), fall back to applying all tags
+		logrus.Warnf("Could not list existing tags for %s: %v. Will attempt to apply all tags (may result in unnecessary API calls for already-applied tags).", resourceARN, err)


Same here don't put the arn in the log message.

Done. Thank you

valerymo force-pushed the MGDAPI-5690-1 branch from b490dc5 to 948b52a Compare July 23, 2025 07:55

valerymo force-pushed the MGDAPI-5690-1 branch 3 times, most recently from 467a683 to dc4a74e Compare July 27, 2025 15:27

austincunningham reviewed Jul 28, 2025

View reviewed changes

valerymo force-pushed the MGDAPI-5690-1 branch 2 times, most recently from e137f15 to d549f6f Compare July 28, 2025 09:04

MGDAPI-5690 CRO Redis snapshots tagging logics update

04ae6cf

valerymo force-pushed the MGDAPI-5690-1 branch from d549f6f to 04ae6cf Compare July 28, 2025 14:06

MGDAPI-5690 CRO Redis snapshots tagging - remove unused AWS permissions

668a6c7

valerymo force-pushed the MGDAPI-5690-1 branch from 67205c7 to 668a6c7 Compare July 29, 2025 07:48

austincunningham reviewed Sep 26, 2025

View reviewed changes

austincunningham requested changes Sep 26, 2025

View reviewed changes

valerymo force-pushed the MGDAPI-5690-1 branch 2 times, most recently from df38981 to 294e780 Compare September 28, 2025 09:08

MGDAPI-5690 CRO Redis snapshots tagging - review

fd000af

valerymo force-pushed the MGDAPI-5690-1 branch from 294e780 to fd000af Compare September 28, 2025 12:27

	msg := "Failed to filter already applied tags"
	msg := "failed to filter already applied tags"

MGDAPI-5690 CRO Redis snapshots tagging logics - avoid replace existing identical tags #888

Are you sure you want to change the base?

MGDAPI-5690 CRO Redis snapshots tagging logics - avoid replace existing identical tags #888

Conversation

valerymo commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Verification

Uh oh!

openshift-ci bot commented Jul 22, 2025

Uh oh!

codecov bot commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

valerymo Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

valerymo Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

austincunningham Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

steps I took

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

austincunningham Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

valerymo commented Jul 22, 2025 •

edited

Loading

codecov bot commented Jul 23, 2025 •

edited

Loading

valerymo Jul 28, 2025 •

edited

Loading

valerymo Jul 29, 2025 •

edited

Loading

austincunningham Sep 26, 2025 •

edited

Loading

austincunningham Sep 26, 2025 •

edited

Loading