-
Notifications
You must be signed in to change notification settings - Fork 259
Delete ddn-exascaler filesystem module reference blueprints #4790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Delete ddn-exascaler filesystem module reference blueprints #4790
Conversation
Summary of ChangesHello @sudheer-quad, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on removing the deprecated DDN EXAScaler filesystem module and all its associated components from the repository. The removal encompasses the module's core Terraform files, documentation, and all related testing and deployment blueprints and configurations. This change is a direct consequence of the module's deprecation, aiming to simplify the project structure and eliminate unsupported elements, guiding users towards actively maintained solutions. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request removes the deprecated DDN-EXAScaler filesystem module, along with its associated documentation, Terraform code, and test files. This is a good step towards cleaning up the codebase.
However, the removal appears to be incomplete. I've found several lingering references to the DDN-EXAScaler module in files that were not modified in this pull request. Leaving these references will lead to broken links and potentially failing deployments for users who try to use examples that reference the deleted module.
Please update the following files to remove all references to DDN-EXAScaler:
docs/network_storage.md: This file still contains links and compatibility information for theDDN-EXAScalermodule.examples/hpc-enterprise-slurm.yaml: This example blueprint still attempts to use thecommunity/modules/file-system/DDN-EXAScalermodule.
Addressing these dangling references is critical to ensure a clean removal of the deprecated module.
|
/gcbrun |
|
/gcbrun |
cboneti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left several comments, but not in every test file.
Directionally, I would like us to assess when we should replace ddn exascaler for managed lustre, versus removing the example. Furthermore, the testing coverage of managed lustre must be the same or superior to the unmanaged version. Please review the tests.
tools/cloud-build/daily-tests/ansible_playbooks/test-validation/test-lustre-slurm.yml
Show resolved
Hide resolved
| @@ -1,21 +0,0 @@ | |||
| # Copyright 2023 Google LLC | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should remain since we want to keep the lustre vm example, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, managed lustre vm is not configured with the above test validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here: https://github.com/GoogleCloudPlatform/cluster-toolkit/blob/main/tools/cloud-build/daily-tests/tests/pfs-managed-lustre-vm.yml#L27, we should have:
- test-validation/test-mounts.yml
- test-validation/test-lustre-vm.yml
cboneti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few new comments.
| modules: | ||
| - id: network1 | ||
| source: modules/network/vpc | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as exascaler-existing-vpc
0384682 to
82be1fb
Compare
|
Please feel free to reassign this PR to me when you feel it is ready to review again. |
|
/gcbrun |
cboneti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done another round of reviews, resolved old comments and add a few new comments, PTAL.
tools/cloud-build/daily-tests/ansible_playbooks/test-validation/test-lustre-slurm.yml
Show resolved
Hide resolved
| @@ -1,21 +0,0 @@ | |||
| # Copyright 2023 Google LLC | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here: https://github.com/GoogleCloudPlatform/cluster-toolkit/blob/main/tools/cloud-build/daily-tests/tests/pfs-managed-lustre-vm.yml#L27, we should have:
- test-validation/test-mounts.yml
- test-validation/test-lustre-vm.yml
|
/gcbrun |
82eb6c7 to
79f2067
Compare
| remote_mount: lustrefs | ||
| size_gib: 36000 | ||
|
|
||
| - id: startup-script |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was in the internal, test-only blueprint: tools/cloud-build/daily-tests/blueprints/lustre-vm.yaml. It is ok to have test code in the internal blueprint, but I would not leave it here, in a public blueprint. It is confusing as it serves no practical purpose for our customers.
I recommend removing this startup script from here and then change tools/cloud-build/daily-tests/ansible_playbooks/test-validation/test-managed-lustre-vm.yml to do another meaningful test. For instance, something like this (untested), it could be added to slurm (in addition to current test) and vms. This is a full playbook, you might need to use just the tasks...
- name: Lustre Integration Test
hosts: lustre_clients
vars:
mount_point: "/lustre"
test_file: "{{ mount_point }}/ansible_test_{{ 9999 | random }}.txt"
tasks:
# 1. Magic Number Check
- name: Verify Filesystem Type
ansible.builtin.command: stat -f -c %T "{{ mount_point }}"
register: fs_type
changed_when: false
failed_when: fs_type.stdout != 'lustre'
# 2. Write Test (Canary File)
- name: Write Canary File to Lustre
ansible.builtin.copy:
content: "Lustre Write Test Successful"
dest: "{{ test_file }}"
# 3. Read Test (Verify Content)
- name: Read Canary File
ansible.builtin.slurp:
src: "{{ test_file }}"
register: file_content
- name: Assert Content Matches
ansible.builtin.assert:
that:
- "file_content['content'] | b64decode == 'Lustre Write Test Successful'"
# 4. Cleanup
- name: Remove Canary File
ansible.builtin.file:
path: "{{ test_file }}"
state: absent
Submission Checklist
NOTE: Community submissions can take up to 2 weeks to be reviewed.
Please take the following actions before submitting this pull request.