Skip to content

Conversation

@fabriziopandini
Copy link
Member

What this PR does / why we need it:
This PR adds the logic to handle in-place updates when performing machine deployment rollouts.

Please note that:

  • Nothing changes when performing a rollout and in-place feature gate is false, or MachineSet cannot update in-place
  • In place is always considered as potentially disruptive
    • in place must respect maxUnavailable
    • if maxUnavailable is zero, a new machine must be created, then as soon as there is “buffer” for in-place, in-place update is done
  • when in-place is possible, the system should try to in-place update as many machines as possible.
    • maxSurge is not fully used (it is used only for scale up by one if maxUnavailable =0)
    • if there is a scale up in the middle of a rollout, creation of new machines must be limited taking into account machines that can be updated in-place.
  • the implementation respects the existing set of responsibilities of each controller
    • MD ctrl manages MS
      • MD enforces maxUnavailable, maxSurge
      • As a consequence it decides when to scale up newMS, when to scale down oldMS
      • When there is a decision to scale down, MD should check if this can be done via in-place vs delete/recreate. If in-place is possible:
        • Old MS will be informed to move machines to the newMS
    • MS ctrl manages a subset of Machines
      • When scaling down, if required to move, old MS is responsible for moving a Machine to newMS (not included in this PR)
      • newMS will take over moved machine and complete the upgrade workflow (not included in this PR)
  • Nothing changes when using rollout strategy on delete
  • With this PR, we are now testing about 1.4k rollout scenarios!
  • The current PR always assumes that MachineSet cannot update in-place (this will be changed by a follow up PR)

Which issue(s) this PR fixes:
Part of #12291

/area machinedeployment

@k8s-ci-robot k8s-ci-robot added area/machinedeployment Issues or PRs related to machinedeployments cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 15, 2025
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 15, 2025
@fabriziopandini
Copy link
Member Author

/test pull-cluster-api-e2e-main

Copy link
Contributor

@alexander-demicev alexander-demicev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a lot for taking care of this

@fabriziopandini
Copy link
Member Author

/test pull-cluster-api-e2e-main

@fabriziopandini fabriziopandini added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 20, 2025
@sbueringer
Copy link
Member

Great work, thank you!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 21, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 23d69c986166e5b6b5874ee59dd6649ad338a0a9

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 21, 2025
@k8s-ci-robot k8s-ci-robot merged commit 8e2b1ba into kubernetes-sigs:main Oct 21, 2025
19 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.12 milestone Oct 21, 2025
@fabriziopandini fabriziopandini deleted the add-in-place-to-rollout-planner branch November 11, 2025 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/machinedeployment Issues or PRs related to machinedeployments cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants