-
Notifications
You must be signed in to change notification settings - Fork 259
Add tpu_topology as an output value for workload_policy #4813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tpu_topology as an output value for workload_policy #4813
Conversation
Summary of ChangesHello @agrawalkhushi18, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds the tpu_topology field to the placement_policy output in the resource-policy module. This change is necessary for the downstream gke-node-pool module to correctly identify TPU node pools. The logic is correct. I've added one suggestion to improve the code's readability and maintainability by reducing code repetition.
shubpal07
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
816a076 to
a80dc40
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly adds tpu_topology to the output of the resource_policy module, allowing the gke-node-pool module to properly configure TPU node pools. A key improvement is the more robust method for identifying TPU node pools by checking the machine type against an authoritative list. The changes are well-aligned with the goal of improving TPU configuration. I have one suggestion to improve code readability in the tpu-definition module.
fde910a to
c864135
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces several valuable improvements for handling TPU node pools. The changes are well-thought-out and enhance the robustness and maintainability of the modules.
Key improvements include:
- The
resource-policymodule now correctly outputs thetpu_topology, which is essential for configuring TPU pod slices in GKE. - The
gke-node-poolmodule is updated to conditionally apply thetpu_topologyonly when a TPU node pool is being created, preventing potential errors with non-TPU node pools. - The logic for identifying a TPU node pool in the
tpu-definitionmodule has been significantly refactored. It now uses an authoritative map of TPU machine types, which is a much more reliable method than the previous approach of checking for a placement policy attribute. This decoupling improves robustness. - Documentation and comments have been updated to reflect these changes, which is great for future maintenance.
Overall, this is a high-quality contribution that strengthens the toolkit's support for TPUs. The code is clean, and the logic is sound. I have no further recommendations for changes.
SwarnaBharathiMantena
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
0894963
into
GoogleCloudPlatform:develop
This change updates the resource_policy module to correctly shim its output for the
gke-node-poolmodule. We are mapping theaccelerator_topologyin workload_policy to thetpu_topologyfield in the output. This ensures that the gke-node-pool module receives the topology it expects, allowing its internaltpu-definitionsmodule to successfully validate that this is a TPU node pool.Submission Checklist
NOTE: Community submissions can take up to 2 weeks to be reviewed.
Please take the following actions before submitting this pull request.