Skip to content

Conversation

@ScottTodd
Copy link
Member

@ScottTodd ScottTodd commented Nov 21, 2025

Warning

Because we have test failures, this PR will stop promotion from v2-staging to v2 on Windows for GPU families where we have test runners like gfx1151 and gfx110X, as is already done on Linux.

Motivation

Progress on #2258 and #1073. This changes the test_pytorch_wheels.yml workflow from only running our PyTorch smoke tests to running the full set in our run_linux_pytorch_tests.py script.

Technical Details

Due to #999, I added a force_exit_with_code() hack to run_pytorch_tests.py. Since the test process does not terminate on its own, even after all test cases complete, I kill the process with os.kill(). I tried to use nicer methods like sys.exit() and os._exit() but these were not sufficient. A consequence of this is that the exit code of the process is now always 15 (SIGTERM) on Windows, so the script now writes exit_code.txt to the current directory for the test_pytorch_wheels.yml workflow to use.

Two test cases caused additional issues:

  • test_cublas_config_nondeterministic_alert_cuda in test_torch.py
  • test_graph_error in test_cuda.py

These test cases should be fixed or conditionally skipped in the upstream pytorch test files. Until then, I marked them as skipped using our new test filtering under a new "platform/windows" category.

Test Plan

Test Result

The specific set of tests running and their current results on my gfx1100 system for PyTorch 2.9 are:

test file results
test_nn.py 9 failed, 1361 passed, 879 skipped, 2 deselected, 3 xfailed in 183.09s
test_torch.py 786 passed, 187 skipped, 3 deselected in 87.59s
test_cuda.py 18 failed, 157 passed, 57 skipped, 7 deselected, 3 xfailed in 42.21s
test_unary_ufuncs.py 167 passed, 23080 skipped, 61 deselected in 80.92s
test_binary_ufuncs.py 1 failed, 12487 passed, 331 skipped, 38 xfailed in 188.67s
test_autograd.py 1 failed, 639 passed, 8 skipped, 2 deselected, 1 xfailed in 165.35s
overall* 28 failed, 15635 passed, 24540 skipped, 75 deselected, 45 xfailed in 689.72s

* (Numbers might not quite add up there since I ran at a few different pytorch commits)

Submission Checklist

Copy link
Contributor

@HereThereBeDragons HereThereBeDragons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks already quite good to me. here a couple of comments for improvement:

i wonder if it will be needed to extend our test skipping depending on the platform and extending it with

skip_tests/generic.py
skip_tests/generic_linux.py
skip_tests/generic_win.py
skip_tests/pytorch_2.9.py
skip_tests/pytorch_2.9_linux.py
skip_tests/pytorch_2.9_win.py

i think it is also worth considering if we can add a comment how torch_version looks from the format or maybe rename it to torch_rocm_version to clarify it is the 2.9.0+rocm7.10a... in the build_..pytorch_wheels.yml. as you already add some comments in test_pytorch_wheels.yml about it

"""Forces termination to work around https://github.com/ROCm/TheRock/issues/999."""
import signal

retcode_file = Path("exit_code.txt")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment in r497 if we need the printing of the error code here in the first place.

if yes:
maybe rename to pytorch_pytest_exit_code.txt? and do we need to upload it somewhere to the artifacts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to run_pytorch_tests_exit_code.txt, matching the file name. If we add other test scripts they can (ughhh) use the same pattern.

and do we need to upload it somewhere to the artifacts?

I don't think we need to upload this exit code file. We should generate test reports and upload those. The test reports will then be authoritative for result status.

Maybe if we do that we can add continue-on-error: true to Linux too and have a common step that runs after tests that checks the results in the reports and we can ignore the exit code altogether.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well.. then we can also change the script of not returning the pytest return code? and just use the printing of the error code we already use?

]

retcode = pytest.main(pytorch_args)
print(f"Pytest finished with return code: {retcode}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am already printing there the return code. maybe we do not need the extra file for windows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would need to capture stdout somehow (e.g. pipe the output to a file) to use this print() for return code handling.

I'm considering putting this code in a python script, but I really don't want this hack to live for long:

      - name: (Windows) Read and propagate exit code
        if: ${{ runner.os == 'Windows' }}
        run: |
          if [ -f run_pytorch_tests_exit_code.txt ]; then
            EXIT_CODE=$(cat run_pytorch_tests_exit_code.txt)
            echo "Exit code from file: $EXIT_CODE"
            exit $EXIT_CODE
          else
            echo "No run_pytorch_tests_exit_code.txt found"
            exit 1
          fi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are already capturing it in the Run PyTorch tests

= 38 failed, 15608 passed, 24557 skipped, 75 deselected, 45 xfailed in 1354.49s (0:22:34) =
Pytest finished with return code: 1        <<<<< this line here
Writing retcode 1 to 'exit_code.txt'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub Actions logs stdout, but we can't (I don't think?) access it unless we capture it ourselves somehow too?

Ah... there's an idea. We could write to GITHUB_OUTPUT instead of track our own custom file. I think that would be a bit too roundabout though 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean? it is captured as part of the Run PyTorch test step. this was an extract of the ci runner.

E.g. https://github.com/ROCm/TheRock/actions/runs/19635001346/job/56229351773#step:11:40789

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Captured" meaning we can do something with it (e.g. have the value in an environment variable, a file, a bash variable, etc.). We can't just parse through all of stdout from a prior job step to determine if a step should pass or fail, unless I'm missing some way that steps can read stdout from prior steps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[TBD] More complete release workflow runs for Windows and Linux?

@HereThereBeDragons @araravik-psd would you like me to trigger a full ROCm dev release for this PR to test across all pytorch versions and supported gfx families, or is spot checking with jobs like https://github.com/ROCm/TheRock/actions/runs/19586629648/job/56096766330 sufficient and then we'll see results from the next nightly release?

As this is now, the "release gating" will stop promoting packages from v2-staging to v2 for Windows once this PR is merged, until we get all test failures addressed (#2156). That is already the case for Linux nightly releases.

Copy link
Contributor

@HereThereBeDragons HereThereBeDragons Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up to you. i would just wait for the nightlies.

just considering the runtime you dont get signals before tomorrow anyway

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how unstable the tests appeared when I was testing, I think I will split this into two PRs:

  1. The external-builds/pytorch/* changes allowing for running tests on Windows locally
  2. The .github/workflows/test_pytorch_wheels.yml changes that include those tests on our Windows runners

That way we can more easily revert just the workflow changes as needed while keeping support for testing locally.

Copy link
Contributor

@HereThereBeDragons HereThereBeDragons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see discussion comments

Copy link
Contributor

@jayhawk-commits jayhawk-commits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good as a starter to get windows results

Comment on lines +148 to +149
# Skip tests that hang. Perhaps related to processes not terminating
# on their own: https://github.com/ROCm/TheRock/issues/999.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing more tests hang on 'nightly' than just on 'release/2.9':

https://github.com/ROCm/TheRock/actions/runs/19651083796/job/56278015582

Mon, 24 Nov 2025 22:27:09 GMT external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_nvtx PASSED [0.0007s] [  8%]
Mon, 24 Nov 2025 22:27:09 GMT external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_out_of_memory PASSED [0.0014s] [  8%]
Mon, 24 Nov 2025 22:27:09 GMT external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_out_of_memory_retry FAILED [0.7945s] [  8%]
Mon, 24 Nov 2025 22:27:09 GMT external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_pinned_memory_empty_cache PASSED [0.0043s] [  8%]
Mon, 24 Nov 2025 23:02:02 GMT external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_pinned_memory_use_background_threads [TORCH_VITAL] CUDA.used		 true
Mon, 24 Nov 2025 23:02:02 GMT [TORCH_VITAL] Dataloader.basic_unit_test		 TEST_VALUE_STRING
Mon, 24 Nov 2025 23:02:02 GMT [TORCH_VITAL] Dataloader.enabled		 True
Mon, 24 Nov 2025 23:02:03 GMT Error: The operation was canceled.

I'll pin that down through local testing and push another test skip before merging this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed (hopefully) by skipping two more tests - one timeout and one crash. Testing again at https://github.com/ROCm/TheRock/actions/runs/19653570950/job/56285463242 before merge.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing a bunch of crashes on CI runners that I can't reproduce locally. I'll have to debug more tomorrow, can't merge this yet.

Latest is https://github.com/ROCm/TheRock/actions/runs/19654631051/job/56288648443#step:12:5886

external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_hip_device_count PASSED [6.0132s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_huge_index SKIPPED [0.0007s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_index_out_of_bounds_exception_cuda SKIPPED [0.0005s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_invalid_status_for_legacy_api FAILED [0.0005s] [  8%]
Traceback (most recent call last):
  File "<string>", line 35, in <module>
  File "<string>", line 22, in fork_and_check_is_pinned
  File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\multiprocessing\context.py", line 337, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\multiprocessing\popen_spawn_win32.py", line 95, in __init__
    reduction.dump(process_obj, to_child)
  File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't get local object 'fork_and_check_is_pinned.<locals>.worker'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\multiprocessing\spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\multiprocessing\spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
EOFError: Ran out of input
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_is_pinned_no_context FAILED [1.4856s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_lazy_init PASSED [1.4903s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_manual_seed PASSED [0.0038s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_matmul_device_mismatch PASSED [0.0012s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_matmul_memory_use PASSED [0.0143s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_max_large_axis SKIPPED [0.0005s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_mean_fp16 PASSED [0.0008s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_memory_allocation PASSED [0.2754s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_memory_stats PASSED [0.5461s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_memory_stats_of_multiple_generators_and_graphs PASSED [0.8383s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_min_max_inits PASSED [0.0020s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_multi_device_context_manager SKIPPED [0.0001s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_multi_device_stream_context_manager SKIPPED [0.0001s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_multinomial_ext PASSED [0.0034s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_multinomial_invalid_probs_cuda SKIPPED [0.0001s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_noncontiguous_pinned_memory PASSED [0.0006s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_norm_type_conversion PASSED [0.0017s] [  8%]
[W1125 01:12:59.000000000 nvtx.cpp:75] Warning: Warning: roctracer isn't available on Windows (function operator())
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_nvtx PASSED [0.0005s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_out_of_memory PASSED [0.0012s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_out_of_memory_retry FAILED [0.7698s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_pinned_memory_empty_cache PASSED [0.0044s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_pinned_memory_with_cudaregister PASSED [0.0195s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_pinned_memory_with_cudaregister_multithread PASSED [0.0176s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_preferred_blas_library_settings PASSED [3.0720s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_prod_large PASSED [0.0018s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_randint_generation_for_large_numel PASSED [1.3064s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_randint_randomness_for_large_range PASSED [0.1446s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_random_no_reused_random_states_float32 PASSED [0.5986s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_random_no_reused_random_states_float64 PASSED [0.4658s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_record_stream PASSED [0.0513s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_record_stream_on_shifted_view PASSED [12.3244s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_reduction_gpu_memory_accessing PASSED [0.0009s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_repeat_graph_capture_cublas_workspace_memory PASSED [0.9795s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_rocm_backward_pass_guard PASSED [0.0012s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_set_per_process_memory_fraction FAILED [0.0882s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_specify_improper_device_name PASSED [0.0129s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_stream_compatibility PASSED [0.0008s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_stream_context_manager PASSED [0.0007s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_stream_event_repr PASSED [0.0006s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_streaming_backwards_callback FAILED [0.0545s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_streaming_backwards_multiple_streams PASSED [0.0103s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_streaming_backwards_sync PASSED [0.0017s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_streaming_backwards_sync_graph_root FAILED [0.0517s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_streams FAILED [0.0008s] [  8%]
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_sum_fp16 FAILED [0.0007s] [  8%]
Windows fatal exception: access violation

Thread 0x00001290 (most recent call first):
  <no Python frame>

Thread 0x00001e44 (most recent call first):
  File "B:\runner\_work\TheRock\TheRock\external-builds\pytorch\pytorch\test\test_cuda.py", line 1577 in test_tiny_half_norm_
  File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\unittest\case.py", line 589 in _callTestMethod
  File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\unittest\case.py", line 634 in run
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\torch\testing\_internal\common_utils.py", line 3484 in _run_custom
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\torch\testing\_internal\common_utils.py", line 3514 in run
  File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\unittest\case.py", line 690 in __call__
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\unittest.py", line 351 in runtest
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\runner.py", line 174 in pytest_runtest_call
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_callers.py", line 121 in _multicall
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_manager.py", line 120 in _hookexec
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_hooks.py", line 512 in __call__
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\runner.py", line 242 in <lambda>
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\runner.py", line 341 in from_call
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\runner.py", line 241 in call_and_report
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\runner.py", line 132 in runtestprotocol
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\runner.py", line 113 in pytest_runtest_protocol
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_callers.py", line 121 in _multicall
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_manager.py", line 120 in _hookexec
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_hooks.py", line 512 in __call__
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\main.py", line 362 in pytest_runtestloop
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_callers.py", line 121 in _multicall
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_manager.py", line 120 in _hookexec
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_hooks.py", line 512 in __call__
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\main.py", line 337 in _main
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\main.py", line 283 in wrap_session
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\main.py", line 330 in pytest_cmdline_main
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_callers.py", line 121 in _multicall
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_manager.py", line 120 in _hookexec
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_hooks.py", line 512 in __call__
  File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\config\__init__.py", line 175 in main
  File "B:\runner\_work\TheRock\TheRock\external-builds\pytorch\run_pytorch_tests.py", line 499 in main
  File "B:\runner\_work\TheRock\TheRock\external-builds\pytorch\run_pytorch_tests.py", line 531 in <module>
Exception Code: 0xC0000005
0x00007FF97A670983, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x920983 byte(s), hipHccModuleLaunchKernel() + 0x59B5F3 byte(s)
0x00007FF97A1A4315, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x454315 byte(s), hipHccModuleLaunchKernel() + 0xCEF85 byte(s)
0x00007FF97A1DEF47, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x48EF47 byte(s), hipHccModuleLaunchKernel() + 0x109BB7 byte(s)
0x00007FF97A1DDEC6, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x48DEC6 byte(s), hipHccModuleLaunchKernel() + 0x108B36 byte(s)
0x00007FF97A1DE1B4, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x48E1B4 byte(s), hipHccModuleLaunchKernel() + 0x108E24 byte(s)
0x00007FF97A1CB105, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x47B105 byte(s), hipHccModuleLaunchKernel() + 0xF5D75 byte(s)
0x00007FF97A14010F, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x3F010F byte(s), hipHccModuleLaunchKernel() + 0x6AD7F byte(s)
0x00007FF97A140231, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x3F0231 byte(s), hipHccModuleLaunchKernel() + 0x6AEA1 byte(s)
0x00007FF97A163A86, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x413A86 byte(s), hipHccModuleLaunchKernel() + 0x8E6F6 byte(s)
0x00007FF97A0FB0FF, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x3AB0FF byte(s), hipHccModuleLaunchKernel() + 0x25D6F byte(s)
0x00007FF9B1A4E8D7, C:\Windows\System32\KERNEL32.DLL(0x00007FF9B1A20000) + 0x2E8D7 byte(s), BaseThreadInitThunk() + 0x17 byte(s)
0x00007FF9B232C53C, C:\Windows\SYSTEM32\ntdll.dll(0x00007FF9B22A0000) + 0x8C53C byte(s), RtlUserThreadStart() + 0x2C byte(s)
B:\runner\_work\_temp\211244a6-284a-4d32-9dec-bf7bac56d6e0.sh: line 1:   507 Segmentation fault      python ./external-builds/pytorch/run_pytorch_tests.py
external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_tiny_half_norm_ 
Error: Process completed with exit code 139.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

4 participants