Skip to content

Conversation

@Dibyajyoti-Chakraborty
Copy link
Collaborator

PhysicsNeMo Pull Request

Description

Added option to multiple workers per GPU in generate_data.py for diffusion_FWI. Now it can use multiple workers (default 8) for each GPU for faster generation.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 13, 2025

Greptile Overview

Greptile Summary

This PR adds a --n_workers command-line argument to the diffusion_FWI data generation script, allowing multiple worker processes per GPU (default 8) for improved parallel processing throughput. The change modifies the multiprocessing pool to use num_gpus * n_workers total processes while distributing work across GPUs using modulo-based assignment. While this could improve performance by better utilizing GPU resources, there are critical resource management concerns with multiple workers potentially overwhelming individual GPUs.

PR Description Notes:

  • The checklist indicates tests, documentation, changelog, and linked issues are not addressed
  • No new dependencies are introduced

Important Files Changed

Filename Score Overview
examples/geophysics/diffusion_fwi/data/generate_data.py 2/5 Added configurable workers per GPU but introduces potential GPU memory conflicts and has formatting issues

Critical Issues Identified:

  1. GPU Memory Management Risk: Multiple workers assigned to the same GPU through modulo operation (i % num_gpus) could cause CUDA out-of-memory errors since there's no coordination or memory limiting between workers on the same device.

  2. Missing Resource Constraints: No mechanism to prevent GPU oversubscription or manage memory allocation when multiple workers target the same GPU simultaneously.

  3. Code Quality Issues:

    • Inconsistent formatting in the logging statement with unnecessary line continuation and extra spaces
    • The argument help text has a minor grammatical issue ("Num" should be "Number")
  4. Missing Documentation: The PR checklist shows documentation and changelog updates are incomplete, which is important for user-facing CLI changes.

Recommendation: Before merging, implement proper GPU memory management (e.g., memory limits per worker, GPU locks, or torch multiprocessing) and address the formatting issues. Consider adding validation to ensure n_workers doesn't exceed reasonable limits for available GPU memory.

Confidence Score: 2/5 - While the performance optimization intent is valid, the implementation has significant resource management risks that could cause runtime failures.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 417 to 419
args: list[tuple[str, str, int, int]] = [
(filepath, output_path, i % num_gpus, user_source_frequency)
for i, filepath in enumerate(file_list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Multiple workers will be assigned to the same GPU (via modulo), but each worker loads models/data onto the same GPU device without coordination. This could cause CUDA out-of-memory errors. Have you tested this with multiple workers per GPU to ensure GPU memory usage doesn't exceed available VRAM?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been tested. Default n_wrokers are set according to that.

String formatting in logging

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant