Skip to content

Conversation

@justinsb
Copy link
Contributor

@justinsb justinsb commented Aug 16, 2025

  • Initial spike: GCPMachinePool

  • GCPMachinePool: generated code/manifests

This continues the work started by @BrennenMM7 in #901 . I also combined in the support from cluster-api-provider-aws to see what we want to borrow from that, and will whittle the code we don't need from cluster-api-provider-aws away.


NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Aug 16, 2025
@netlify
Copy link

netlify bot commented Aug 16, 2025

Deploy Preview for kubernetes-sigs-cluster-api-gcp ready!

Name Link
🔨 Latest commit 81fd414
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-sigs-cluster-api-gcp/deploys/69204aa9dd62b90008c8d931
😎 Deploy Preview https://deploy-preview-1506--kubernetes-sigs-cluster-api-gcp.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 16, 2025
@k8s-ci-robot k8s-ci-robot requested a review from cpanato August 16, 2025 14:33
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from dims August 16, 2025 14:33
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 16, 2025
@justinsb
Copy link
Contributor Author

This PR is WIP while I whittle down the unneeded code from cluster-api-provider-aws and generally make this reviewable. But I am uploading as this is a checkpoint that works (in a limited way!)

@justinsb justinsb force-pushed the machinepool branch 5 times, most recently from 428790f to 5906e99 Compare August 16, 2025 19:42
@justinsb justinsb changed the title WIP: Minimal MachinePool support Minimal MachinePool support Aug 23, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 23, 2025
@justinsb
Copy link
Contributor Author

Removing the WIP. I will still try to whittle down the code by extracting helpers etc, but it's already approaching the reviewable ballpark!

@justinsb
Copy link
Contributor Author

So the linter is blowing up on the TODO comments. How do we want to track next steps in code? If we don't want to do // TODO because golangci, maybe we do // TASK?

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Aug 23, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 29, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 6, 2025
@k8s-ci-robot
Copy link
Contributor

@damdo: GitHub didn't allow me to assign the following users: barbacbd.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @cpanato @salasberryfin @damdo @barbacbd @theobarberbany

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@justinsb
Copy link
Contributor Author

Thanks to @salasberryfin for merging the other two PRs, apidiff is now completing. It's actually passing, so it is not testing our public API, but rather our internal API, which is ... not what I assumed.

But in any case, please take a look - would be great to get this in!

@damdo
Copy link
Member

damdo commented Oct 24, 2025

@justinsb are you expecting for this to go in first and the E2Es for @bochengchu to go in after or? What's the best strategy here :) LMK

@justinsb
Copy link
Contributor Author

Hi! So there are two workstreams: MachinePool and HA Internal Load Balancer.

I'm doing MachinePool and @bochengchu is doing HA Internal Load Balancer.

(Currently) MachinePool is split into two PRs:

It seemed like a good idea at the time to split out the tests to keep the code smaller, though actually as a reviewer I'm not sure it would have made my life easier, so ... sorry :-). LMK if you want me to combine them, but #1539 is passing on top of this PR, so if we could approve this one and I will rebase the tests, it would be great to get this in!

I can look now at @bochengchu 's PRs. I still have approval rights in this repo as a cluster-lifecycle lead, so I can approve them if nobody objects. I was waiting for an e2e test before doing so. I think the implementation is in #1533 and the tests are in #1550. In that case I think we are right to split them because I will ok-to-test #1550 now, we expect tests to fail until #1533 is in, and then I guess we can ask for the tests to be rebased on top of the implementation. Then we have two test runs, one with the fix, one without, and hopefully the one with the fix passes and the other fails :-)

TLDR: For MachinePool, an lgtm/approve would be great here, I can then rebase the MachinePool tests. If you and others don't object, I can review and approve the Load Balancer fixes.

@chrischdi
Copy link
Member

Hey :-) did a first round of review on the PR

One question: do we need some further validation (via webhook or something) for the spec fields? (if there's maybe prior art for GCPMachine or so?

I did run KAL for checking the API and that got some findings you may want to fix:

../cluster-api/hack/tools/bin/golangci-lint-kube-api-linter run --new-from-rev 20859ca1 --config ../cluster-api/.golangci-kal.yml

Some of the findings are definetly there to ignore like the nobools on .status.ready (because its the contract requiring it like that).

Output:

exp/api/v1beta1/gcpmachinepool_types.go:35:2: commentstart: godoc for field ProviderIDList should start with 'providerIDList ...' (kubeapilinter)
	// ProviderIDList are the identification IDs of machine instances provided by the provider.
	^
exp/api/v1beta1/gcpmachinepool_types.go:38:2: maxlength: field ProviderIDList array element must have a maximum length, add kubebuilder:validation:items:MaxLength marker (kubeapilinter)
	ProviderIDList []string `json:"providerIDList,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:40:2: commentstart: godoc for field InstanceType should start with 'instanceType ...' (kubeapilinter)
	// InstanceType is the type of instance to create. Example: n1.standard-2
	^
exp/api/v1beta1/gcpmachinepool_types.go:41:2: maxlength: field InstanceType must have a maximum length, add kubebuilder:validation:MaxLength marker (kubeapilinter)
	InstanceType string `json:"instanceType"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:43:2: commentstart: godoc for field Subnet should start with 'subnet ...' (kubeapilinter)
	// Subnet is a reference to the subnetwork to use for this instance. If not specified,
	^
exp/api/v1beta1/gcpmachinepool_types.go:46:2: maxlength: field Subnet must have a maximum length, add kubebuilder:validation:MaxLength marker (kubeapilinter)
	Subnet *string `json:"subnet,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:58:2: commentstart: godoc for field ImageFamily should start with 'imageFamily ...' (kubeapilinter)
	// ImageFamily is the full reference to a valid image family to be used for this machine.
	^
exp/api/v1beta1/gcpmachinepool_types.go:60:2: maxlength: field ImageFamily must have a maximum length, add kubebuilder:validation:MaxLength marker (kubeapilinter)
	ImageFamily *string `json:"imageFamily,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:62:2: commentstart: godoc for field Image should start with 'image ...' (kubeapilinter)
	// Image is the full reference to a valid image to be used for this machine.
	^
exp/api/v1beta1/gcpmachinepool_types.go:65:2: maxlength: field Image must have a maximum length, add kubebuilder:validation:MaxLength marker (kubeapilinter)
	Image *string `json:"image,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:67:2: commentstart: godoc for field AdditionalLabels should start with 'additionalLabels ...' (kubeapilinter)
	// AdditionalLabels is an optional set of tags to add to an instance, in addition to the ones added by default by the
	^
exp/api/v1beta1/gcpmachinepool_types.go:73:2: commentstart: godoc for field AdditionalMetadata should start with 'additionalMetadata ...' (kubeapilinter)
	// AdditionalMetadata is an optional set of metadata to add to an instance, in addition to the ones added by default by the
	^
exp/api/v1beta1/gcpmachinepool_types.go:78:2: maxlength: field AdditionalMetadata must have a maximum items, add kubebuilder:validation:MaxItems marker (kubeapilinter)
	AdditionalMetadata []capg.MetadataItem `json:"additionalMetadata,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:84:2: commentstart: godoc for field PublicIP should start with 'publicIP ...' (kubeapilinter)
	// PublicIP specifies whether the instance should get a public IP.
	^
exp/api/v1beta1/gcpmachinepool_types.go:87:2: nobools: field PublicIP pointer should not use a bool. Use a string type with meaningful constant values as an enum. (kubeapilinter)
	PublicIP *bool `json:"publicIP,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:89:2: commentstart: godoc for field AdditionalNetworkTags should start with 'additionalNetworkTags ...' (kubeapilinter)
	// AdditionalNetworkTags is a list of network tags that should be applied to the
	^
exp/api/v1beta1/gcpmachinepool_types.go:93:2: maxlength: field AdditionalNetworkTags array element must have a maximum length, add kubebuilder:validation:items:MaxLength marker (kubeapilinter)
	AdditionalNetworkTags []string `json:"additionalNetworkTags,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:95:2: commentstart: godoc for field ResourceManagerTags should start with 'resourceManagerTags ...' (kubeapilinter)
	// ResourceManagerTags is an optional set of tags to apply to GCP resources managed
	^
exp/api/v1beta1/gcpmachinepool_types.go:101:2: commentstart: godoc for field RootDeviceSize should start with 'rootDeviceSize ...' (kubeapilinter)
	// RootDeviceSize is the size of the root volume in GB.
	^
exp/api/v1beta1/gcpmachinepool_types.go:104:2: optionalfields: field RootDeviceSize has a valid zero value (0), but the validation is not complete (e.g. minimum/maximum). The field should be a pointer to allow the zero value to be set. If the zero value is not a valid use case, complete the validation and remove the pointer. (kubeapilinter)
	RootDeviceSize int64 `json:"rootDeviceSize,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:106:2: commentstart: godoc for field RootDeviceType should start with 'rootDeviceType ...' (kubeapilinter)
	// RootDeviceType is the type of the root volume.
	^
exp/api/v1beta1/gcpmachinepool_types.go:116:2: commentstart: godoc for field AdditionalDisks should start with 'additionalDisks ...' (kubeapilinter)
	// AdditionalDisks are optional non-boot attached disks.
	^
exp/api/v1beta1/gcpmachinepool_types.go:118:2: maxlength: field AdditionalDisks must have a maximum items, add kubebuilder:validation:MaxItems marker (kubeapilinter)
	AdditionalDisks []capg.AttachedDiskSpec `json:"additionalDisks,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:120:2: commentstart: godoc for field ServiceAccount should start with 'serviceAccounts ...' (kubeapilinter)
	// ServiceAccount specifies the service account email and which scopes to assign to the machine.
	^
exp/api/v1beta1/gcpmachinepool_types.go:125:2: commentstart: godoc for field Preemptible should start with 'preemptible ...' (kubeapilinter)
	// Preemptible defines if instance is preemptible
	^
exp/api/v1beta1/gcpmachinepool_types.go:127:2: nobools: field Preemptible should not use a bool. Use a string type with meaningful constant values as an enum. (kubeapilinter)
	Preemptible bool `json:"preemptible,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:129:2: commentstart: godoc for field ProvisioningModel should start with 'provisioningModel ...' (kubeapilinter)
	// ProvisioningModel defines if instance is spot.
	^
exp/api/v1beta1/gcpmachinepool_types.go:136:2: commentstart: godoc for field IPForwarding should start with 'ipForwarding ...' (kubeapilinter)
	// IPForwarding Allows this instance to send and receive packets with non-matching destination or source IPs.
	^
exp/api/v1beta1/gcpmachinepool_types.go:141:2: forbiddenmarkers: field IPForwarding has forbidden marker "kubebuilder:default=Enabled" (kubeapilinter)
	IPForwarding *capg.IPForwarding `json:"ipForwarding,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:143:2: commentstart: godoc for field ShieldedInstanceConfig should start with 'shieldedInstanceConfig ...' (kubeapilinter)
	// ShieldedInstanceConfig is the Shielded VM configuration for this machine
	^
exp/api/v1beta1/gcpmachinepool_types.go:147:2: commentstart: godoc for field OnHostMaintenance should start with 'onHostMaintenance ...' (kubeapilinter)
	// OnHostMaintenance determines the behavior when a maintenance event occurs that might cause the instance to reboot.
	^
exp/api/v1beta1/gcpmachinepool_types.go:153:2: commentstart: godoc for field ConfidentialCompute should start with 'confidentialCompute ...' (kubeapilinter)
	// ConfidentialCompute Defines whether the instance should have confidential compute enabled or not, and the confidential computing technology of choice.
	^
exp/api/v1beta1/gcpmachinepool_types.go:165:2: commentstart: godoc for field RootDiskEncryptionKey should start with 'rootDiskEncryptionKey ...' (kubeapilinter)
	// RootDiskEncryptionKey defines the KMS key to be used to encrypt the root disk.
	^
exp/api/v1beta1/gcpmachinepool_types.go:169:2: commentstart: godoc for field GuestAccelerators should start with 'guestAccelerators ...' (kubeapilinter)
	// GuestAccelerators is a list of the type and count of accelerator cards
	^
exp/api/v1beta1/gcpmachinepool_types.go:172:2: maxlength: field GuestAccelerators must have a maximum items, add kubebuilder:validation:MaxItems marker (kubeapilinter)
	GuestAccelerators []capg.Accelerator `json:"guestAccelerators,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:177:2: commentstart: godoc for field Ready should start with 'ready ...' (kubeapilinter)
	// Ready is true when the provider resource is ready.
	^
exp/api/v1beta1/gcpmachinepool_types.go:179:2: nobools: field Ready should not use a bool. Use a string type with meaningful constant values as an enum. (kubeapilinter)
	Ready bool `json:"ready"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:181:2: commentstart: godoc for field Replicas should start with 'replicas ...' (kubeapilinter)
	// Replicas is the most recently observed number of replicas
	^
exp/api/v1beta1/gcpmachinepool_types.go:183:2: optionalfields: field Replicas has a valid zero value (0), but the validation is not complete (e.g. minimum/maximum). The field should be a pointer to allow the zero value to be set. If the zero value is not a valid use case, complete the validation and remove the pointer. (kubeapilinter)
	Replicas int32 `json:"replicas"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:185:2: commentstart: godoc for field Conditions should start with 'conditions ...' (kubeapilinter)
	// Conditions defines current service state of the GCPMachinePool.
	^
exp/api/v1beta1/gcpmachinepool_types.go:187:2: conditions: Conditions field in GCPMachinePoolStatus must be a slice of metav1.Condition (kubeapilinter)
	Conditions clusterv1.Conditions `json:"conditions,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:189:2: commentstart: godoc for field FailureReason should start with 'failureReason ...' (kubeapilinter)
	// FailureReason will be set in the event that there is a terminal problem
	^
exp/api/v1beta1/gcpmachinepool_types.go:206:2: maxlength: field FailureReason must have a maximum length, add kubebuilder:validation:MaxLength marker (kubeapilinter)
	FailureReason *string `json:"failureReason,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:208:2: commentstart: godoc for field FailureMessage should start with 'failureMessage ...' (kubeapilinter)
	// FailureMessage will be set in the event that there is a terminal problem
	^
exp/api/v1beta1/gcpmachinepool_types.go:225:2: maxlength: field FailureMessage must have a maximum length, add kubebuilder:validation:MaxLength marker (kubeapilinter)
	FailureMessage *string `json:"failureMessage,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:238:2: commentstart: field metav1.ObjectMeta is missing godoc comment (kubeapilinter)
	metav1.ObjectMeta `json:"metadata,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:240:2: commentstart: field Spec is missing godoc comment (kubeapilinter)
	Spec   GCPMachinePoolSpec   `json:"spec,omitempty"`
	^
exp/api/v1beta1/gcpmachinepool_types.go:241:2: commentstart: field Status is missing godoc comment (kubeapilinter)
	Status GCPMachinePoolStatus `json:"status,omitempty"`
	^
48 issues:
* kubeapilinter: 48

main.go Outdated

if feature.Gates.Enabled(capifeature.MachinePool) {
setupLog.Info("Enabling MachinePool reconcilers")
gcpMachinePoolConcurrency := gcpMachineConcurrency // FUTURE: Use our own flag while feature-gated?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to use a seperate flag :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we don't do this yet (while the controller is experimental)

Comment on lines 53 to 57
// Not meaningful for MachinePool
// // ProviderID is the unique identifier as specified by the cloud provider.
// // +optional
// ProviderID *string `json:"providerID,omitempty"`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProviderIDList is the replacement here :-)

Suggested change
// Not meaningful for MachinePool
// // ProviderID is the unique identifier as specified by the cloud provider.
// // +optional
// ProviderID *string `json:"providerID,omitempty"`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a function of spec, so I'm not sure that providerIDList belongs in status (expensive to keep up to date, and still never up to date)

It's also interesting that we don't need it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, we have providerIDList. I will remove providerID - good call.

I'll try to keep my feelings about providerIDList separate 😂

Comment on lines 189 to 225
// FailureReason will be set in the event that there is a terminal problem
// reconciling the MachinePool and will contain a succinct value suitable
// for machine interpretation.
//
// This field should not be set for transitive errors that a controller
// faces that are expected to be fixed automatically over
// time (like service outages), but instead indicate that something is
// fundamentally wrong with the MachinePool's spec or the configuration of
// the controller, and that manual intervention is required. Examples
// of terminal errors would be invalid combinations of settings in the
// spec, values that are unsupported by the controller, or the
// responsible controller itself being critically misconfigured.
//
// Any transient errors that occur during the reconciliation of MachinePools
// can be added as events to the MachinePool object and/or logged in the
// controller's output.
// +optional
FailureReason *string `json:"failureReason,omitempty"`

// FailureMessage will be set in the event that there is a terminal problem
// reconciling the MachinePool and will contain a more verbose string suitable
// for logging and human consumption.
//
// This field should not be set for transitive errors that a controller
// faces that are expected to be fixed automatically over
// time (like service outages), but instead indicate that something is
// fundamentally wrong with the MachinePool's spec or the configuration of
// the controller, and that manual intervention is required. Examples
// of terminal errors would be invalid combinations of settings in the
// spec, values that are unsupported by the controller, or the
// responsible controller itself being critically misconfigured.
//
// Any transient errors that occur during the reconciliation of MachinePools
// can be added as events to the MachinePool object and/or logged in the
// controller's output.
// +optional
FailureMessage *string `json:"failureMessage,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corresponding fields in CAPI where this bubbles up to are deprecated, should we remove them here too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Citing from the WIP contract.

The use of failureReason and failureMessage should not be used for new InfraMachinePool implementations. In other areas of the Cluster API, starting from the v1beta2 contract version, there is no more special treatment for provider’s terminal failures within Cluster API.

https://deploy-preview-12971--kubernetes-sigs-cluster-api.netlify.app/developer/providers/contracts/infra-machinepool#inframachinepool-terminal-failures

PR: kubernetes-sigs/cluster-api#12971

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove - thanks.

Comment on lines 185 to 187
// Conditions defines current service state of the GCPMachinePool.
// +optional
Conditions clusterv1.Conditions `json:"conditions,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we think about using metav1.Conditions? (considering clusterv1.Conditions gets removed we could already now start using metav1.Conditions and don't have to migrate later on)?

It's a question of consistency too, we maybe want to use clusterv1.Conditions, but I don't see a real reason why we can't use metav1.Conditions already now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give it a go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it worked well - nice suggestion, saves us from having to go the v1beta2.conditions route

Comment on lines 258 to 266
// GetConditions returns the observations of the operational state of the GCPMachinePool resource.
func (r *GCPMachinePool) GetConditions() clusterv1.Conditions {
return r.Status.Conditions
}

// SetConditions sets the underlying service state of the GCPMachinePool to the predescribed clusterv1.Conditions.
func (r *GCPMachinePool) SetConditions(conditions clusterv1.Conditions) {
r.Status.Conditions = conditions
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do metav1, we would have to rewrite these and fulfill the new signature / names / interface

"sigs.k8s.io/controller-runtime/pkg/log"
)

// Reconcile reconcile machine instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also commentsh ere are wrong :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - thanks!

"sigs.k8s.io/controller-runtime/pkg/log"
)

// Reconcile reconcile machine instance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs proper comments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed - thanks


namePrefix := baseKey.Name
suffix := hashHex[:16]
name := namePrefix + suffix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Theoretically, what if we hit a colission? It would use the old template right?

Should we have the full hash on a tag or something so we could compare? Or other ways to detect and react?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the full hash as a label. We can at least detect it. If it ever happens (it seems quite unlikely at 64 bits) we can add stronger comparison logic.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 12, 2025
@justinsb
Copy link
Contributor Author

One question: do we need some further validation (via webhook or something) for the spec fields? (if there's maybe prior art for GCPMachine or so?

Let's add it later if we need something. We're adding experimental support, so I think we want small PRs that merge quickly (though this work is probably several years old now)

I did run KAL for checking the API and that got some findings you may want to fix:

Can we add KAL as a test in another PR? Did you run it on our existing resources also? Because IIRC these fields are copied from GCPMachine, so I assume that most of these apply across the whole codebase.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 20, 2025
@justinsb justinsb force-pushed the machinepool branch 8 times, most recently from 8785c60 to ff158e4 Compare November 21, 2025 02:55
@justinsb
Copy link
Contributor Author

Thanks for the very thorough review @chrischdi - hopefully this will be passing tests in a few more small iterations :-)

I think you've identified that CAPG is not yet where say CAPV is, but I don't think we should aim to fix everything all at once. I propose opening issues for the many (valid) things you are highlighting, so that we can work through them and get them merged in O(days). At least that's my hope, some of them look to be pretty fundamental, but at least MachinePool is experimental so we won't be breaking anyone as we fix these things.

@justinsb justinsb force-pushed the machinepool branch 2 times, most recently from 3a0a4f2 to 5bf9ae4 Compare November 21, 2025 04:33
justinsb and others added 4 commits November 21, 2025 11:19
Co-authored-by: Christian Schlotter <[email protected]>
In order for nodes to be associated to the MachinePool, we need to populate the
spec.providerIDList field.  This field is known to the MachinePool controller.

Co-authored-by: Christian Schlotter <[email protected]>
@justinsb
Copy link
Contributor Author

Looks like it was OOMing, made comment change to pick up more memory for prow jobs (thanks @damdo and @salasberryfin )

@justinsb
Copy link
Contributor Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants