Skip to content

Conversation

@agrawalkhushi18
Copy link
Contributor

This change updates the resource_policy module to correctly shim its output for the gke-node-pool module. We are mapping the accelerator_topology in workload_policy to the tpu_topology field in the output. This ensures that the gke-node-pool module receives the topology it expects, allowing its internal tpu-definitions module to successfully validate that this is a TPU node pool.

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@agrawalkhushi18 agrawalkhushi18 requested review from a team and samskillman as code owners November 3, 2025 10:01
@agrawalkhushi18 agrawalkhushi18 added the release-module-improvements Added to release notes under the "Module Improvements" heading. label Nov 3, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @agrawalkhushi18, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the resource_policy module by exposing the TPU topology as an output. This change is critical for ensuring seamless integration and correct validation of TPU-enabled node pools within the gke-node-pool module, allowing it to properly identify and configure TPU resources based on the provided workload policy.

Highlights

  • New Output Field: Introduced a new tpu_topology field within the placement_policy output of the resource_policy module.
  • GKE Node Pool Compatibility: Mapped the accelerator_topology from the workload_policy variable to the new tpu_topology output, ensuring the gke-node-pool module receives the expected topology information.
  • TPU Node Pool Validation: Enabled the gke-node-pool module's internal tpu-definitions module to successfully validate TPU node pools by providing the necessary topology data.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds the tpu_topology field to the placement_policy output in the resource-policy module. This change is necessary for the downstream gke-node-pool module to correctly identify TPU node pools. The logic is correct. I've added one suggestion to improve the code's readability and maintainability by reducing code repetition.

shubpal07
shubpal07 previously approved these changes Nov 3, 2025
Copy link
Contributor

@shubpal07 shubpal07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shubpal07
Copy link
Contributor

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly adds tpu_topology to the output of the resource_policy module, allowing the gke-node-pool module to properly configure TPU node pools. A key improvement is the more robust method for identifying TPU node pools by checking the machine type against an authoritative list. The changes are well-aligned with the goal of improving TPU configuration. I have one suggestion to improve code readability in the tpu-definition module.

@agrawalkhushi18 agrawalkhushi18 force-pushed the resource_policy_changes branch from fde910a to c864135 Compare November 5, 2025 13:06
@shubpal07
Copy link
Contributor

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several valuable improvements for handling TPU node pools. The changes are well-thought-out and enhance the robustness and maintainability of the modules.

Key improvements include:

  • The resource-policy module now correctly outputs the tpu_topology, which is essential for configuring TPU pod slices in GKE.
  • The gke-node-pool module is updated to conditionally apply the tpu_topology only when a TPU node pool is being created, preventing potential errors with non-TPU node pools.
  • The logic for identifying a TPU node pool in the tpu-definition module has been significantly refactored. It now uses an authoritative map of TPU machine types, which is a much more reliable method than the previous approach of checking for a placement policy attribute. This decoupling improves robustness.
  • Documentation and comments have been updated to reflect these changes, which is great for future maintenance.

Overall, this is a high-quality contribution that strengthens the toolkit's support for TPUs. The code is clean, and the logic is sound. I have no further recommendations for changes.

Copy link
Contributor

@SwarnaBharathiMantena SwarnaBharathiMantena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@agrawalkhushi18 agrawalkhushi18 merged commit 0894963 into GoogleCloudPlatform:develop Nov 6, 2025
12 of 67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-module-improvements Added to release notes under the "Module Improvements" heading.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants