-
Notifications
You must be signed in to change notification settings - Fork 207
fix: reconcile loop to watch migrations if earlier submission had failed #3344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix: reconcile loop to watch migrations if earlier submission had failed #3344
Conversation
@hasethuraman: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: hasethuraman The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @hasethuraman. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ok-to-test
/retest |
2b8069d
to
b15d419
Compare
What type of PR is this?
/kind bug
/kind feature
What this PR does / why we need it:
If the label was never applied, migration won’t be monitored (this PR does not add retries for the initiation start migration failure due to any kubernetes transient issue).
Added the label‑driven recovery in a loop that re-establishes migration monitoring (emitting SKUMigrationStarted, SKUMigrationProgress, SKUMigrationCompleted events) for Premium_LRS → PremiumV2_LRS disk migrations after a controller restart or an earlier transient failure that prevented the in-memory monitor from being created.
With this the background goroutine (controller instances only, i.e. NodeID == "") sleeps 30s after startup, then every 10 minutes calls recoverMigrationMonitorsFromLabels(...).
Any PersistentVolume with label disk.csi.azure.com/migration-in-progress=true and CSI driver = this driver gets a new in-memory monitoring task (unless one is already active).
Manual recovery path:
If users see no events due to a transient start failure, they can add the label themselves and the next scan will attach monitoring.
Which issue(s) this PR fixes:
Fixes #

Migration monitor failed to start due to transient issue and so events dont show up
In below test result, we can see after 10 mins the recovery found a volume with label as I manually added to test the behaviour post transient failure.
Requirements:
Special notes for your reviewer:
Release note: