Skip to content

Conversation

@lokielse
Copy link
Contributor

@lokielse lokielse commented Nov 12, 2025

Description

This PR enhances the security posture of the KAI Scheduler operator by restricting RBAC permissions for webhook configurations and Custom Resource Definitions (CRDs) through the use of resourceNames field.

Changes

RBAC Operator Permissions (deployments/kai-scheduler/templates/rbac/operator.yaml):

  1. Webhook Configurations - Added resourceNames restrictions for:

    • mutating-kai-admission
    • validating-kai-admission
    • kai-queue-validation-v2
    • kai-podgroup-validation-v2alpha2
  2. Custom Resource Definitions - Added resourceNames restrictions for:

    • queues.scheduling.run.ai

Rationale

By adding the resourceNames field to the RBAC ClusterRole, we follow the principle of least privilege by restricting the operator's permissions to only the specific resources it needs to manage, rather than allowing blanket access to all webhook configurations and CRDs in the cluster.

Special thanks to @pabbanihanthkumarpab from the CCOE team for the valuable insights during the installation of KAI Scheduler into our enterprise cluster, which highlighted the importance of this security enhancement.

Related Issues

Fixes #

Checklist

Note: Ensure your PR title follows the Conventional Commits format (e.g., feat(scheduler): add new feature)

  • Self-reviewed
  • Added/updated tests (if needed)
  • Updated CHANGELOG.md (if needed)
  • Updated documentation (if needed)

Breaking Changes

No breaking changes. This is a security enhancement that restricts permissions to be more specific without affecting functionality.

Additional Notes

Security Considerations

This change improves cluster security by:

  • Preventing the operator from modifying unrelated webhook configurations
  • Limiting CRD access to only the resources managed by KAI Scheduler
  • Reducing the attack surface in case of operator compromise

Testing Recommendations

  • Verify operator can still create/update/delete the specified webhook configurations
  • Confirm CRD operations continue to work correctly
  • Validate that the operator cannot access other webhook configurations or CRDs outside the specified list

@lokielse lokielse requested a review from enoodle November 14, 2025 03:53
@lokielse lokielse marked this pull request as draft November 14, 2025 04:22
@lokielse lokielse marked this pull request as ready for review November 14, 2025 06:09
@github-actions
Copy link

Merging this branch will not change overall coverage

Impacted Packages Coverage Δ 🤖
github.com/NVIDIA/KAI-scheduler/pkg/operator/controller 0.00% (ø)

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/NVIDIA/KAI-scheduler/pkg/operator/controller/config_controller.go 0.00% (ø) 35 0 35

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Copy link
Collaborator

@enoodle enoodle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you for all your contributions! Please update the change log next time. I will do it on another PR soon for this change.

@enoodle enoodle enabled auto-merge (squash) November 18, 2025 09:11
@enoodle enoodle merged commit c498c71 into NVIDIA:main Nov 18, 2025
4 checks passed
@github-actions
Copy link

Merging this branch will not change overall coverage

Impacted Packages Coverage Δ 🤖
github.com/NVIDIA/KAI-scheduler/pkg/operator/controller 0.00% (ø)

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/NVIDIA/KAI-scheduler/pkg/operator/controller/config_controller.go 0.00% (ø) 35 0 35

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants