Skip to content

🐛[BUG]: @StaticCaptureEvaluateNoGrad decorator can cause NaN values to show up during inference #870

@jerrylin96

Description

@jerrylin96

Version

24.01

On which installation method(s) does this occur?

Source

Describe the issue

Inferencing my squeezeformer using the @StaticCaptureEvaluateNoGrad decorator can cause NaN values. When inferencing with default pytorch syntax, I do not encounter any such problem.

@StaticCaptureEvaluateNoGrad(model=model, use_graphs=False)
def eval_step_forward(my_model, invar):
    return my_model(invar)
...
In [19]: output_modulus = eval_step_forward(model, data_input)

In [20]: torch.isnan(output_modulus).any()
Out[20]: tensor(True, device='cuda:0')

In [21]: with torch.no_grad():
    ...:     output_pytorch = model(data_input)
    ...: 

In [22]: torch.isnan(output_pytorch).any()
Out[22]: tensor(False, device='cuda:0')

Minimum reproducible example

Relevant log output

Environment details

Using a container on NERSC perlmutter:

#SBATCH --image=nvcr.io/nvidia/modulus/modulus:24.01

Metadata

Metadata

Assignees

No one assigned

    Labels

    ? - Needs TriageNeed team to review and classifybugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions