Skip to content

Conversation

@sudheer-quad
Copy link
Contributor

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@sudheer-quad sudheer-quad requested review from a team and samskillman as code owners October 27, 2025 06:36
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @sudheer-quad, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on removing the deprecated DDN EXAScaler filesystem module and all its associated components from the repository. The removal encompasses the module's core Terraform files, documentation, and all related testing and deployment blueprints and configurations. This change is a direct consequence of the module's deprecation, aiming to simplify the project structure and eliminate unsupported elements, guiding users towards actively maintained solutions.

Highlights

  • DDN EXAScaler Module Removal: The entire community/modules/file-system/DDN-EXAScaler module, including its Terraform configurations, variables, outputs, and documentation, has been removed from the repository. This action aligns with the module's deprecation notice, which indicated its planned removal by July 1, 2025.
  • Associated Test Infrastructure Deleted: All related test infrastructure for the DDN EXAScaler module has been deleted. This includes Ansible playbooks for Lustre testing with Slurm and VMs, as well as Cloud Build blueprints and test configurations that referenced the deprecated module.
  • Codebase Streamlining: This cleanup streamlines the codebase by removing an unsupported and deprecated file system module, reducing maintenance overhead and focusing resources on recommended alternatives like the GCP Managed Lustre module.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the deprecated DDN-EXAScaler filesystem module, along with its associated documentation, Terraform code, and test files. This is a good step towards cleaning up the codebase.

However, the removal appears to be incomplete. I've found several lingering references to the DDN-EXAScaler module in files that were not modified in this pull request. Leaving these references will lead to broken links and potentially failing deployments for users who try to use examples that reference the deleted module.

Please update the following files to remove all references to DDN-EXAScaler:

  • docs/network_storage.md: This file still contains links and compatibility information for the DDN-EXAScaler module.
  • examples/hpc-enterprise-slurm.yaml: This example blueprint still attempts to use the community/modules/file-system/DDN-EXAScaler module.

Addressing these dangling references is critical to ensure a clean removal of the deprecated module.

@aslam-quad
Copy link
Contributor

/gcbrun

@aslam-quad aslam-quad added the release-chore To not include into release notes label Oct 27, 2025
@aslam-quad
Copy link
Contributor

/gcbrun

@cboneti cboneti self-assigned this Oct 28, 2025
Copy link
Member

@cboneti cboneti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left several comments, but not in every test file.
Directionally, I would like us to assess when we should replace ddn exascaler for managed lustre, versus removing the example. Furthermore, the testing coverage of managed lustre must be the same or superior to the unmanaged version. Please review the tests.

@@ -1,21 +0,0 @@
# Copyright 2023 Google LLC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should remain since we want to keep the lustre vm example, right?

Copy link
Contributor Author

@sudheer-quad sudheer-quad Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, managed lustre vm is not configured with the above test validation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here: https://github.com/GoogleCloudPlatform/cluster-toolkit/blob/main/tools/cloud-build/daily-tests/tests/pfs-managed-lustre-vm.yml#L27, we should have:

  • test-validation/test-mounts.yml
  • test-validation/test-lustre-vm.yml

Copy link
Member

@cboneti cboneti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few new comments.

modules:
- id: network1
source: modules/network/vpc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as exascaler-existing-vpc

@cboneti cboneti assigned sudheer-quad and unassigned cboneti Oct 30, 2025
@sudheer-quad sudheer-quad force-pushed the DDN-EXAScaler_deletion branch from 0384682 to 82be1fb Compare October 31, 2025 05:14
@cboneti
Copy link
Member

cboneti commented Oct 31, 2025

Please feel free to reassign this PR to me when you feel it is ready to review again.

@aslam-quad
Copy link
Contributor

/gcbrun

Copy link
Member

@cboneti cboneti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done another round of reviews, resolved old comments and add a few new comments, PTAL.

@@ -1,21 +0,0 @@
# Copyright 2023 Google LLC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here: https://github.com/GoogleCloudPlatform/cluster-toolkit/blob/main/tools/cloud-build/daily-tests/tests/pfs-managed-lustre-vm.yml#L27, we should have:

  • test-validation/test-mounts.yml
  • test-validation/test-lustre-vm.yml

@kadupoornima
Copy link
Contributor

/gcbrun

@sudheer-quad sudheer-quad force-pushed the DDN-EXAScaler_deletion branch from 82eb6c7 to 79f2067 Compare November 11, 2025 08:20
remote_mount: lustrefs
size_gib: 36000

- id: startup-script
Copy link
Member

@cboneti cboneti Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was in the internal, test-only blueprint: tools/cloud-build/daily-tests/blueprints/lustre-vm.yaml. It is ok to have test code in the internal blueprint, but I would not leave it here, in a public blueprint. It is confusing as it serves no practical purpose for our customers.
I recommend removing this startup script from here and then change tools/cloud-build/daily-tests/ansible_playbooks/test-validation/test-managed-lustre-vm.yml to do another meaningful test. For instance, something like this (untested), it could be added to slurm (in addition to current test) and vms. This is a full playbook, you might need to use just the tasks...

- name: Lustre Integration Test
  hosts: lustre_clients
  vars:
    mount_point: "/lustre"
    test_file: "{{ mount_point }}/ansible_test_{{ 9999 | random }}.txt"
  
  tasks:
    # 1. Magic Number Check
    - name: Verify Filesystem Type
      ansible.builtin.command: stat -f -c %T "{{ mount_point }}"
      register: fs_type
      changed_when: false
      failed_when: fs_type.stdout != 'lustre'

    # 2. Write Test (Canary File)
    - name: Write Canary File to Lustre
      ansible.builtin.copy:
        content: "Lustre Write Test Successful"
        dest: "{{ test_file }}"

    # 3. Read Test (Verify Content)
    - name: Read Canary File
      ansible.builtin.slurp:
        src: "{{ test_file }}"
      register: file_content

    - name: Assert Content Matches
      ansible.builtin.assert:
        that:
          - "file_content['content'] | b64decode == 'Lustre Write Test Successful'"

    # 4. Cleanup
    - name: Remove Canary File
      ansible.builtin.file:
        path: "{{ test_file }}"
        state: absent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-chore To not include into release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants