-
Notifications
You must be signed in to change notification settings - Fork 42
MGDAPI-5690 CRO Redis snapshots tagging logics - avoid replace existing identical tags #888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #888 +/- ##
==========================================
+ Coverage 67.83% 67.88% +0.05%
==========================================
Files 42 42
Lines 5326 5350 +24
==========================================
+ Hits 3613 3632 +19
- Misses 1350 1354 +4
- Partials 363 364 +1
🚀 New features to boost your workflow:
|
467a683 to
dc4a74e
Compare
| "elasticache:ModifyCacheSubnetGroup", | ||
| "elasticache:DeleteCacheSubnetGroup", | ||
| "elasticache:ModifyReplicationGroup", | ||
| "elasticache:ListTagsForResource", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the passed we reached an aws limit of permissions we could add to a single account. Just wondering are you seeing these limits when you add this permission. Think it was the main reason why we didn't proceed with this change in the passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was the reason. I added ListTagsForResource in AWS, but removed something just for testing. I did the same in CRO, and it's working for me now. Logging has also improved.
However, I'm continuing the investigation — even though the CRO logs look good, there are still "AddTag..." events showing up in CloudTrail.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @austincunningham, Just a small fix applied — I had forgotten to add a condition for snapshots:
if len(snapshotList.Snapshots) > 0 && len(filteredTags) > 0.
No more tagging events appearing in CloudTrail.
Thank you
(remains - check unit tests, after latest updates)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit tests should be ok now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Permissions removed:
- iam:CreateServiceLinkedRole
- No direct usage found in any provider code
- Only referenced in vendor documentation for CloudWatch alarms: "// If you are an IAM user, you must have iam:CreateServiceLinkedRole to create a composite alarm that has Systems Manager OpsItem actions."
- This permission might have been added for future functionality but isn't currently used
Seems we can safely remove iam:CreateServiceLinkedRole because:
- It's only needed for CloudWatch alarm operations
- RHOAM operator only uses CloudWatch for metrics collection (GetMetricData)
- cloudwatch:ListMetrics
cloudwatch:ListMetrics can be safely removed because:
- The permission is for the AWS API call, but The application never makes that API call
- The method
func (r *RealCloudWatchClient) ListMetrics(...)exists only to satisfy the interface contract
e137f15 to
d549f6f
Compare
pkg/providers/aws/credentials.go
Outdated
| "cloudwatch:ListMetrics", | ||
| "cloudwatch:GetMetricData", | ||
| //"iam:CreateServiceLinkedRole", // Only needed for CloudWatch alarms (not used) | ||
| //"cloudwatch:ListMetrics", // Only needed for metric discovery (not used) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be inclined to leave ListMetrics in as it is referenced in the interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the changes are metrics based, think I will test this in rhoam to confirm that the alerting is working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something up with metrics when deployed on rhoam.
steps I took
- edited the org in the makefile to my quay.io org
- ran
make image/push - install the rhoam addon on a ccs cluster with useclusterstorage: 'false'
- patched the rhoam cr with the cluster package workaround
oc -n redhat-rhoam-operator patch rhmis.integreatly.org rhoam \
--type=merge --subresource=status \
-p '{"status":{
"preflightMessage":"preflight checks passed",
"stage":"Preflight Checks",
"preflightStatus": "successful",
"stages":{}
}}'- manually updated the operator images in the rhoam csv and the cro csv to point to the one I just built
- port forward the prometheus service to port 9089
oc port-forward services/rhoam-prometheus 9089:9090 -n redhat-rhoam-operator-observability- checked the status targets and found that the serviceMonitor/redhat-rhoam-operator-observability/cloud-resource-operator-metrics/0 was down.
So we are not serving metrics for prometheus to consume.
I changed the image back to the normal one and metrics endpoint was exposed and working again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth checking an image built of master in cro to see if that has the same issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update - done. Test - TODO. Thank you
pkg/providers/aws/credentials.go
Outdated
| "cloudwatch:ListMetrics", | ||
| "cloudwatch:GetMetricData", | ||
| //"iam:CreateServiceLinkedRole", // Only needed for CloudWatch alarms (not used) | ||
| //"cloudwatch:ListMetrics", // Only needed for metric discovery (not used) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the changes are metrics based, think I will test this in rhoam to confirm that the alerting is working.
pkg/providers/aws/provider_redis.go
Outdated
| if err != nil { | ||
| msg := "failed to add tags to aws elasticache :" | ||
| return croType.StatusMessage(msg), err | ||
| msg := "Failed to filter already applied tags" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| msg := "Failed to filter already applied tags" | |
| msg := "failed to filter already applied tags" |
small thing , a convention that we always use lower case in error messages. Although we don't always appear to follow it .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thank you
pkg/providers/aws/provider_redis.go
Outdated
| msg := "failed to add tags to aws elasticache :" | ||
| return croType.StatusMessage(msg), err | ||
| } | ||
| logrus.Infof("Successfully applied %d new/updated tags to cluster %s", len(filteredTags), arn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't put the arn in the log messages potential security hole as it exposes the account number and region
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use CacheClusterId instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thank you
pkg/providers/aws/provider_redis.go
Outdated
| } | ||
| logrus.Infof("Successfully applied %d new/updated tags to cluster %s", len(filteredTags), arn) | ||
| } else { | ||
| logrus.Infof("Redis cluster %s: no tag changes required", arn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here don't put the arn in the log message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thank you
pkg/providers/aws/provider_redis.go
Outdated
| }) | ||
| if err != nil { | ||
| // If we can't list tags (permission issue), fall back to applying all tags | ||
| logrus.Warnf("Could not list existing tags for %s: %v. Will attempt to apply all tags (may result in unnecessary API calls for already-applied tags).", resourceARN, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here don't put the arn in the log message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thank you
df38981 to
294e780
Compare
294e780 to
fd000af
Compare
Overview
Jira: https://issues.redhat.com/browse/MGDAPI-5690
We want to avoid a situation where the CRO applies the same tags to Redis snapshots during every reconciliation cycle.
Instead, we aim to apply only new tags or tags that have changed.
This PR includes changes to the Redis snapshot tagging logic to support this behavior.
NOTES
This PR requires adding the following permission to your AWS user policy:
elasticache:ListTagsForResource
However, we are hitting an AWS limitation:
Inline policies for users, groups, or roles are limited to 2,048 characters per policy.
To address this limitation, we removed two currently unused permissions.
The following permissions can be safely removed (see comments below):
Verification
make cluster/preparemake runTags should be created or updated only if missing or different — e.g., on the first cycle or when tags change.
On later cycles with no config changes, no tag action should occur.
Expected logs: