Run scheduler predicates in parallel #8729

x13n · 2025-10-31T19:23:38Z

Setting this initially to 4 goroutines. Setting parallelism higher than 4 seems to yield diminishing returns at this point:

$ go test -bench=BenchmarkRunFiltersUntilPassingNode ./simulator/clustersnapshot/predicate/
(...)
BenchmarkRunFiltersUntilPassingNode/parallelism-1-16                 141           8206978 ns/op
BenchmarkRunFiltersUntilPassingNode/parallelism-2-16                 153           7123724 ns/op
BenchmarkRunFiltersUntilPassingNode/parallelism-4-16                 183           6997209 ns/op
BenchmarkRunFiltersUntilPassingNode/parallelism-8-16                 174           7161056 ns/op
BenchmarkRunFiltersUntilPassingNode/parallelism-16-16                178           7068643 ns/op

This is because the function is currently dominated by ListNodeInfos which causes frequent memory allocation during WrapSchedulerNodeInfo calls. Since NodeInfo is an interface now, we should be able to avoid costly object wrapping on listing, at which point it may make sense to bump this parallelism further.

What type of PR is this?

What this PR does / why we need it:

Performance improvement. Until now, all scheduler predicates were executed in a single thread.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

New --predicate-parallelism flag allowing CA to use more threads to run scheduler predicates.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

jackfrancis · 2025-10-31T19:28:26Z

cluster-autoscaler/config/flags/flags.go

 	forceDeleteFailedNodes                       = flag.Bool("force-delete-failed-nodes", false, "Whether to enable force deletion of failed nodes, regardless of the min size of the node group the belong to.")
 	enableDynamicResourceAllocation              = flag.Bool("enable-dynamic-resource-allocation", false, "Whether logic for handling DRA (Dynamic Resource Allocation) objects is enabled.")
 	clusterSnapshotParallelism                   = flag.Int("cluster-snapshot-parallelism", 16, "Maximum parallelism of cluster snapshot creation.")
+	predicateParallelism                         = flag.Int("predicate-parallelism", 4, "Maximum parallelism of scheduler predicate checking.")


should we fail if someone passes in 0 (and/or should we enforce an upper boundary?)

Good idea - added validation for lower bound. For upper bound I don't see the need to add an artificial limit, so not checking it.

Setting this initially to 4 goroutines. Setting parallelism higher than 4 seems to yield diminishing returns at this point: $ go test -bench=BenchmarkRunFiltersUntilPassingNode ./simulator/clustersnapshot/predicate/ (...) BenchmarkRunFiltersUntilPassingNode/parallelism-1-16 141 8206978 ns/op BenchmarkRunFiltersUntilPassingNode/parallelism-2-16 153 7123724 ns/op BenchmarkRunFiltersUntilPassingNode/parallelism-4-16 183 6997209 ns/op BenchmarkRunFiltersUntilPassingNode/parallelism-8-16 174 7161056 ns/op BenchmarkRunFiltersUntilPassingNode/parallelism-16-16 178 7068643 ns/op This is because the function is currently dominated by ListNodeInfos which causes frequent memory allocation during WrapSchedulerNodeInfo calls. Since NodeInfo is an interface now, we should be able to avoid costly object wrapping on listing, at which point it may make sense to bump this parallelism further.

jackfrancis · 2025-11-03T21:15:04Z

cluster-autoscaler/simulator/clustersnapshot/predicate/plugin_runner_test.go

+	return NewSchedulerPluginRunner(fwHandle, snapshot, 1), snapshot, nil
+}
+
+func BenchmarkRunFiltersUntilPassingNode(b *testing.B) {


Do we want to add something like go test -run=^$ -bench=. ./... to a make target so we can start getting visibility into benchmark tests?

make benchmark? Yeah, I think that makes sense. Will follow up with a PR.

jackfrancis

/lgtm
/approve

Ran benchmarks manually

k8s-ci-robot · 2025-11-03T21:15:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, x13n

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/OWNERS~~ [jackfrancis,x13n]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from feiskyer and vadasambar October 31, 2025 19:23

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/needs-area labels Oct 31, 2025

jackfrancis reviewed Oct 31, 2025

View reviewed changes

x13n force-pushed the master branch from 101479c to 81af5c4 Compare November 3, 2025 10:08

jackfrancis reviewed Nov 3, 2025

View reviewed changes

jackfrancis approved these changes Nov 3, 2025

View reviewed changes

k8s-ci-robot assigned jackfrancis Nov 3, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 3, 2025

k8s-ci-robot merged commit 9e22656 into kubernetes:master Nov 3, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run scheduler predicates in parallel #8729

Run scheduler predicates in parallel #8729

Uh oh!

x13n commented Oct 31, 2025 •

edited

Loading

Uh oh!

jackfrancis Oct 31, 2025

Uh oh!

x13n Nov 3, 2025

Uh oh!

jackfrancis Nov 3, 2025

Uh oh!

x13n Nov 4, 2025

Uh oh!

jackfrancis left a comment

Uh oh!

k8s-ci-robot commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Run scheduler predicates in parallel #8729

Run scheduler predicates in parallel #8729

Uh oh!

Conversation

x13n commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

jackfrancis Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

x13n Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

jackfrancis Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

x13n Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

jackfrancis left a comment

Choose a reason for hiding this comment

Uh oh!

k8s-ci-robot commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

x13n commented Oct 31, 2025 •

edited

Loading