Skip to content

Refactor on DoMINO utils.py to improve code quality #985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jul 8, 2025

Conversation

peterdsharpe
Copy link
Collaborator

@peterdsharpe peterdsharpe commented Jun 20, 2025

PhysicsNeMo Pull Request

Major changes:

  • Adds complete function docstrings with argument definitions, return type definitions, notes on what exceptions can raise, type hints, and example usages.
  • Refactors to use pathlib.Path over os.path for file path management (preferred since Python 3.4) [1, 2]
  • Fixes functions that have mutable default values (e.g., defaulting an argument to a list, rather than a tuple) - this should essentially-never be used, and leads to very-difficult-to-trace bugs.
  • Removes a usage of from {package} import *, which is not necessary here and should essentially-never be used.
  • Fixes missing requirements from DoMINO requirements.txt, to facilitate reproducible training
  • Removes logic throughout that handled cases where PyVista/VTK/CuPy is not installed; this is unnecessary as the top level package dependencies already assert that these should be present (in "all"), and the pre-packaged container already contains these.
  • Updates type hinting style to use latest Black/Ruff style recommendations for modern Python (e.g., list instead of from typing import List; use | rather than Union; prefer Type | None to Optional)
  • Applies black formatting to utils.py, to bring it into compliance with repository-wide pre-commit hooks requirements. (Note: all future PRs should comply with this per CI, and we should not allow PRs to merge that bypass this check.)

Changes not yet implemented in this PR; these should be implemented in a future PR:

  • DoMINO utils.py itself has no unit test coverage (and is only indirectly tested by the DoMINO forward pass test).
  • Randomness in data processing and forward passes of model architectures should always be seeded for reproducibility; currently, it is not.
  • Many VTK utilities could be refactored onto PyVista via pv.wrap(), which would reduce verbosity and help long-term maintainability.
  • In DoMINO model.py, fixing instances that do torch.cat() on each iteration of Python-level for loops to incrementally append to tensors (we should instead build up a list[torch.Tensor] to cat all-at-once, or vectorize)

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.

Dependencies

@peterdsharpe peterdsharpe marked this pull request as ready for review June 20, 2025 12:50
@peterdsharpe peterdsharpe requested a review from ktangsali June 20, 2025 13:00
@peterdsharpe
Copy link
Collaborator Author

Note: this PR should ideally be merged after #973; however, the changes are largely orthogonal between these two, so there should be minimal merge issues.

@peterdsharpe peterdsharpe added the 3 - Ready for Review Ready for review by team label Jun 20, 2025
…arameter for adjustable sampling bias; updates docstring with detailed explanation and examples.
…mples in calculate_center_of_mass, standardize, nd_interpolator, pad, and pad_inp functions; adjusts k-nearest neighbors parameter in nd_interpolator for flexibility; corrects boolean checks in pad and pad_inp examples.
@peterdsharpe
Copy link
Collaborator Author

/blossom-ci

Copy link
Collaborator

@coreyjadams coreyjadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning this up so extensively, @peterdsharpe . I have a few nits that will make our lives easier long term, but overall this looks great.

  1. You have introduced a hard dependency on CUPY, which may be fine or maybe we want to avoid. If we do it, let's put it in requirements explicitly. (Its already there in the cuml dep)
  2. Cupy has a built in array selection function - let's use that if we're requiring cupy and not have a wrapper.
  3. You have made unit tests in the docstrings. That's 98% of the way to unit test in the test suite too, can we do that? They will both actually get tested, but it will help raise the test coverage bar a little.
  4. For PyVista and VTK, what about factoring this utils.py file into two? One for pure-array and pure-python functions, and one which has an additional dependency on PV and VTK? This will allow more usage without those deps, which I think aren't needed once you have a preprocessed training set. As it stands, I think if someone handed me a curator-processed data set and I wanted to train DoMINO, I'd need VTK just to import some of these functions.

peterdsharpe and others added 3 commits July 7, 2025 15:55
This commit introduces a new test file `test_domino_utils.py` that includes comprehensive unit tests for various functions in the domino utils module. Each test verifies the functionality of the corresponding utility function using examples from the documentation, ensuring correctness and reliability.
@peterdsharpe
Copy link
Collaborator Author

Tests added: daac91a

…imize area_weighted_shuffle_array for consistent array handling. Remove redundant test for array_type.
@peterdsharpe
Copy link
Collaborator Author

CuML hard-dependency removed: d2b11c7

@peterdsharpe
Copy link
Collaborator Author

/blossom-ci

@peterdsharpe peterdsharpe merged commit 21b93af into NVIDIA:main Jul 8, 2025
1 check passed
@peterdsharpe peterdsharpe deleted the psharpe/code-quality branch July 8, 2025 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants