Run unit tests in test_pytorch_wheels.yml on Windows #2265

ScottTodd · 2025-11-21T23:14:40Z

Warning

Because we have test failures, this PR will stop promotion from v2-staging to v2 on Windows for GPU families where we have test runners like gfx1151 and gfx110X, as is already done on Linux.

Motivation

Progress on #2258 and #1073. This changes the test_pytorch_wheels.yml workflow from only running our PyTorch smoke tests to running the full set in our run_linux_pytorch_tests.py script.

Technical Details

Due to #999, I added a force_exit_with_code() hack to run_pytorch_tests.py. Since the test process does not terminate on its own, even after all test cases complete, I kill the process with os.kill(). I tried to use nicer methods like sys.exit() and os._exit() but these were not sufficient. A consequence of this is that the exit code of the process is now always 15 (SIGTERM) on Windows, so the script now writes exit_code.txt to the current directory for the test_pytorch_wheels.yml workflow to use.

Two test cases caused additional issues:

test_cublas_config_nondeterministic_alert_cuda in test_torch.py
test_graph_error in test_cuda.py

These test cases should be fixed or conditionally skipped in the upstream pytorch test files. Until then, I marked them as skipped using our new test filtering under a new "platform/windows" category.

Test Plan

Ran tests locally with python D:/projects/TheRock/external-builds/pytorch/run_pytorch_tests.py --pytorch-dir D:/b/pytorch --amdgpu-family=gfx110X-dgpu > C:\Users\Nod-Shark16\.therock\logs\run_pytorch_tests_%date%_%time::=%.txt 2>&1
Test runs on CI machines:
- gfx1151, torch 2.9, test_torch.py only: https://github.com/ROCm/TheRock/actions/runs/19585433265/job/56093159807
- gfx1151, torch 2.9, all tests: https://github.com/ROCm/TheRock/actions/runs/19586629648/job/56096766330
- gfx1151, torch 2.9, all tests: https://github.com/ROCm/TheRock/actions/runs/19646907550/job/56264208842
- gfx1101, torch 2.9, all tests: https://github.com/ROCm/TheRock/actions/runs/19650137666/job/56274952833
- gfx1101, torch 2.10, all tests: https://github.com/ROCm/TheRock/actions/runs/19651083796/job/56278015582 (hang)
- gfx1101, torch 2.10, all tests: https://github.com/ROCm/TheRock/actions/runs/19653123290/job/56284144180 (crash)
- gfx1101, torch 2.10, all tests: https://github.com/ROCm/TheRock/actions/runs/19653570950/job/56285463242 (crash)
- gfx1101, torch 2.10, all tests: https://github.com/ROCm/TheRock/actions/runs/19653870784/job/56286378431 (crash)
- gfx1101, torch 2.10, all tests: https://github.com/ROCm/TheRock/actions/runs/19654631051/job/56288648443
Watch the next nightly releases

Test Result

The specific set of tests running and their current results on my gfx1100 system for PyTorch 2.9 are:

test file	results
`test_nn.py`	9 failed, 1361 passed, 879 skipped, 2 deselected, 3 xfailed in 183.09s
`test_torch.py`	786 passed, 187 skipped, 3 deselected in 87.59s
`test_cuda.py`	18 failed, 157 passed, 57 skipped, 7 deselected, 3 xfailed in 42.21s
`test_unary_ufuncs.py`	167 passed, 23080 skipped, 61 deselected in 80.92s
`test_binary_ufuncs.py`	1 failed, 12487 passed, 331 skipped, 38 xfailed in 188.67s
`test_autograd.py`	1 failed, 639 passed, 8 skipped, 2 deselected, 1 xfailed in 165.35s

overall*	28 failed, 15635 passed, 24540 skipped, 75 deselected, 45 xfailed in 689.72s

* (Numbers might not quite add up there since I ran at a few different pytorch commits)

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…ot-dir

.github/workflows/test_pytorch_wheels.yml

HereThereBeDragons

overall looks already quite good to me. here a couple of comments for improvement:

i wonder if it will be needed to extend our test skipping depending on the platform and extending it with

skip_tests/generic.py
skip_tests/generic_linux.py
skip_tests/generic_win.py
skip_tests/pytorch_2.9.py
skip_tests/pytorch_2.9_linux.py
skip_tests/pytorch_2.9_win.py

i think it is also worth considering if we can add a comment how torch_version looks from the format or maybe rename it to torch_rocm_version to clarify it is the 2.9.0+rocm7.10a... in the build_..pytorch_wheels.yml. as you already add some comments in test_pytorch_wheels.yml about it

external-builds/pytorch/run_pytorch_tests.py

HereThereBeDragons · 2025-11-24T13:09:48Z

external-builds/pytorch/run_pytorch_tests.py

+    """Forces termination to work around https://github.com/ROCm/TheRock/issues/999."""
+    import signal
+
+    retcode_file = Path("exit_code.txt")


see my comment in r497 if we need the printing of the error code here in the first place.

if yes:
maybe rename to pytorch_pytest_exit_code.txt? and do we need to upload it somewhere to the artifacts?

Renamed to run_pytorch_tests_exit_code.txt, matching the file name. If we add other test scripts they can (ughhh) use the same pattern.

and do we need to upload it somewhere to the artifacts?

I don't think we need to upload this exit code file. We should generate test reports and upload those. The test reports will then be authoritative for result status.

https://docs.pytest.org/en/stable/how-to/output.html#creating-junitxml-format-files

https://github.com/pytorch/pytorch/blob/33d4cf4fcb7f0cba6191b242dae53b48057e05b9/test/run_test.py#L1266-L1270

https://github.com/pytorch/pytorch/blob/33d4cf4fcb7f0cba6191b242dae53b48057e05b9/test/run_test.py#L598-L605

Maybe if we do that we can add continue-on-error: true to Linux too and have a common step that runs after tests that checks the results in the reports and we can ignore the exit code altogether.

well.. then we can also change the script of not returning the pytest return code? and just use the printing of the error code we already use?

HereThereBeDragons · 2025-11-24T13:11:06Z

external-builds/pytorch/run_pytorch_tests.py

        ]

    retcode = pytest.main(pytorch_args)
    print(f"Pytest finished with return code: {retcode}")


i am already printing there the return code. maybe we do not need the extra file for windows?

We would need to capture stdout somehow (e.g. pipe the output to a file) to use this print() for return code handling.

I'm considering putting this code in a python script, but I really don't want this hack to live for long:

- name: (Windows) Read and propagate exit code if: ${{ runner.os == 'Windows' }} run: | if [ -f run_pytorch_tests_exit_code.txt ]; then EXIT_CODE=$(cat run_pytorch_tests_exit_code.txt) echo "Exit code from file: $EXIT_CODE" exit $EXIT_CODE else echo "No run_pytorch_tests_exit_code.txt found" exit 1 fi

we are already capturing it in the Run PyTorch tests

= 38 failed, 15608 passed, 24557 skipped, 75 deselected, 45 xfailed in 1354.49s (0:22:34) = Pytest finished with return code: 1 <<<<< this line here Writing retcode 1 to 'exit_code.txt'

GitHub Actions logs stdout, but we can't (I don't think?) access it unless we capture it ourselves somehow too?

Ah... there's an idea. We could write to GITHUB_OUTPUT instead of track our own custom file. I think that would be a bit too roundabout though 🤔

what do you mean? it is captured as part of the Run PyTorch test step. this was an extract of the ci runner.

E.g. https://github.com/ROCm/TheRock/actions/runs/19635001346/job/56229351773#step:11:40789

"Captured" meaning we can do something with it (e.g. have the value in an environment variable, a file, a bash variable, etc.). We can't just parse through all of stdout from a prior job step to determine if a step should pass or fail, unless I'm missing some way that steps can read stdout from prior steps.

…orch-windows-tests-2

ScottTodd · 2025-11-24T19:42:02Z

.github/workflows/test_pytorch_wheels.yml

[TBD] More complete release workflow runs for Windows and Linux?

@HereThereBeDragons @araravik-psd would you like me to trigger a full ROCm dev release for this PR to test across all pytorch versions and supported gfx families, or is spot checking with jobs like https://github.com/ROCm/TheRock/actions/runs/19586629648/job/56096766330 sufficient and then we'll see results from the next nightly release?

As this is now, the "release gating" will stop promoting packages from v2-staging to v2 for Windows once this PR is merged, until we get all test failures addressed (#2156). That is already the case for Linux nightly releases.

up to you. i would just wait for the nightlies.

just considering the runtime you dont get signals before tomorrow anyway

Given how unstable the tests appeared when I was testing, I think I will split this into two PRs:

The external-builds/pytorch/* changes allowing for running tests on Windows locally

The .github/workflows/test_pytorch_wheels.yml changes that include those tests on our Windows runners

That way we can more easily revert just the workflow changes as needed while keeping support for testing locally.

HereThereBeDragons

see discussion comments

jayhawk-commits

looks good as a starter to get windows results

…orch-windows-tests-2

ScottTodd · 2025-11-24T23:03:12Z

external-builds/pytorch/skip_tests/generic.py

+        # Skip tests that hang. Perhaps related to processes not terminating
+        # on their own: https://github.com/ROCm/TheRock/issues/999.


Seeing more tests hang on 'nightly' than just on 'release/2.9':

https://github.com/ROCm/TheRock/actions/runs/19651083796/job/56278015582

Mon, 24 Nov 2025 22:27:09 GMT external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_nvtx PASSED [0.0007s] [ 8%] Mon, 24 Nov 2025 22:27:09 GMT external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_out_of_memory PASSED [0.0014s] [ 8%] Mon, 24 Nov 2025 22:27:09 GMT external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_out_of_memory_retry FAILED [0.7945s] [ 8%] Mon, 24 Nov 2025 22:27:09 GMT external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_pinned_memory_empty_cache PASSED [0.0043s] [ 8%] Mon, 24 Nov 2025 23:02:02 GMT external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_pinned_memory_use_background_threads [TORCH_VITAL] CUDA.used true Mon, 24 Nov 2025 23:02:02 GMT [TORCH_VITAL] Dataloader.basic_unit_test TEST_VALUE_STRING Mon, 24 Nov 2025 23:02:02 GMT [TORCH_VITAL] Dataloader.enabled True Mon, 24 Nov 2025 23:02:03 GMT Error: The operation was canceled.

I'll pin that down through local testing and push another test skip before merging this.

Fixed (hopefully) by skipping two more tests - one timeout and one crash. Testing again at https://github.com/ROCm/TheRock/actions/runs/19653570950/job/56285463242 before merge.

Seeing a bunch of crashes on CI runners that I can't reproduce locally. I'll have to debug more tomorrow, can't merge this yet.

Latest is https://github.com/ROCm/TheRock/actions/runs/19654631051/job/56288648443#step:12:5886

external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_hip_device_count PASSED [6.0132s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_huge_index SKIPPED [0.0007s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_index_out_of_bounds_exception_cuda SKIPPED [0.0005s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_invalid_status_for_legacy_api FAILED [0.0005s] [ 8%] Traceback (most recent call last): File "<string>", line 35, in <module> File "<string>", line 22, in fork_and_check_is_pinned File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) ^^^^^^^^^^^^^^^^^ File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\multiprocessing\context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\multiprocessing\context.py", line 337, in _Popen return Popen(process_obj) ^^^^^^^^^^^^^^^^^^ File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\multiprocessing\popen_spawn_win32.py", line 95, in __init__ reduction.dump(process_obj, to_child) File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't get local object 'fork_and_check_is_pinned.<locals>.worker' Traceback (most recent call last): File "<string>", line 1, in <module> File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\multiprocessing\spawn.py", line 122, in spawn_main exitcode = _main(fd, parent_sentinel) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\multiprocessing\spawn.py", line 132, in _main self = reduction.pickle.load(from_parent) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ EOFError: Ran out of input external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_is_pinned_no_context FAILED [1.4856s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_lazy_init PASSED [1.4903s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_manual_seed PASSED [0.0038s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_matmul_device_mismatch PASSED [0.0012s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_matmul_memory_use PASSED [0.0143s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_max_large_axis SKIPPED [0.0005s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_mean_fp16 PASSED [0.0008s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_memory_allocation PASSED [0.2754s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_memory_stats PASSED [0.5461s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_memory_stats_of_multiple_generators_and_graphs PASSED [0.8383s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_min_max_inits PASSED [0.0020s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_multi_device_context_manager SKIPPED [0.0001s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_multi_device_stream_context_manager SKIPPED [0.0001s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_multinomial_ext PASSED [0.0034s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_multinomial_invalid_probs_cuda SKIPPED [0.0001s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_noncontiguous_pinned_memory PASSED [0.0006s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_norm_type_conversion PASSED [0.0017s] [ 8%] [W1125 01:12:59.000000000 nvtx.cpp:75] Warning: Warning: roctracer isn't available on Windows (function operator()) external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_nvtx PASSED [0.0005s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_out_of_memory PASSED [0.0012s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_out_of_memory_retry FAILED [0.7698s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_pinned_memory_empty_cache PASSED [0.0044s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_pinned_memory_with_cudaregister PASSED [0.0195s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_pinned_memory_with_cudaregister_multithread PASSED [0.0176s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_preferred_blas_library_settings PASSED [3.0720s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_prod_large PASSED [0.0018s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_randint_generation_for_large_numel PASSED [1.3064s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_randint_randomness_for_large_range PASSED [0.1446s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_random_no_reused_random_states_float32 PASSED [0.5986s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_random_no_reused_random_states_float64 PASSED [0.4658s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_record_stream PASSED [0.0513s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_record_stream_on_shifted_view PASSED [12.3244s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_reduction_gpu_memory_accessing PASSED [0.0009s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_repeat_graph_capture_cublas_workspace_memory PASSED [0.9795s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_rocm_backward_pass_guard PASSED [0.0012s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_set_per_process_memory_fraction FAILED [0.0882s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_specify_improper_device_name PASSED [0.0129s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_stream_compatibility PASSED [0.0008s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_stream_context_manager PASSED [0.0007s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_stream_event_repr PASSED [0.0006s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_streaming_backwards_callback FAILED [0.0545s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_streaming_backwards_multiple_streams PASSED [0.0103s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_streaming_backwards_sync PASSED [0.0017s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_streaming_backwards_sync_graph_root FAILED [0.0517s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_streams FAILED [0.0008s] [ 8%] external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_sum_fp16 FAILED [0.0007s] [ 8%] Windows fatal exception: access violation Thread 0x00001290 (most recent call first): <no Python frame> Thread 0x00001e44 (most recent call first): File "B:\runner\_work\TheRock\TheRock\external-builds\pytorch\pytorch\test\test_cuda.py", line 1577 in test_tiny_half_norm_ File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\unittest\case.py", line 589 in _callTestMethod File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\unittest\case.py", line 634 in run File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\torch\testing\_internal\common_utils.py", line 3484 in _run_custom File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\torch\testing\_internal\common_utils.py", line 3514 in run File "B:\runner\_work\_tool\Python\3.12.10\x64\Lib\unittest\case.py", line 690 in __call__ File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\unittest.py", line 351 in runtest File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\runner.py", line 174 in pytest_runtest_call File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_callers.py", line 121 in _multicall File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_manager.py", line 120 in _hookexec File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_hooks.py", line 512 in __call__ File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\runner.py", line 242 in <lambda> File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\runner.py", line 341 in from_call File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\runner.py", line 241 in call_and_report File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\runner.py", line 132 in runtestprotocol File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\runner.py", line 113 in pytest_runtest_protocol File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_callers.py", line 121 in _multicall File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_manager.py", line 120 in _hookexec File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_hooks.py", line 512 in __call__ File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\main.py", line 362 in pytest_runtestloop File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_callers.py", line 121 in _multicall File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_manager.py", line 120 in _hookexec File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_hooks.py", line 512 in __call__ File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\main.py", line 337 in _main File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\main.py", line 283 in wrap_session File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\main.py", line 330 in pytest_cmdline_main File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_callers.py", line 121 in _multicall File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_manager.py", line 120 in _hookexec File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\pluggy\_hooks.py", line 512 in __call__ File "B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_pytest\config\__init__.py", line 175 in main File "B:\runner\_work\TheRock\TheRock\external-builds\pytorch\run_pytorch_tests.py", line 499 in main File "B:\runner\_work\TheRock\TheRock\external-builds\pytorch\run_pytorch_tests.py", line 531 in <module> Exception Code: 0xC0000005 0x00007FF97A670983, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x920983 byte(s), hipHccModuleLaunchKernel() + 0x59B5F3 byte(s) 0x00007FF97A1A4315, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x454315 byte(s), hipHccModuleLaunchKernel() + 0xCEF85 byte(s) 0x00007FF97A1DEF47, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x48EF47 byte(s), hipHccModuleLaunchKernel() + 0x109BB7 byte(s) 0x00007FF97A1DDEC6, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x48DEC6 byte(s), hipHccModuleLaunchKernel() + 0x108B36 byte(s) 0x00007FF97A1DE1B4, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x48E1B4 byte(s), hipHccModuleLaunchKernel() + 0x108E24 byte(s) 0x00007FF97A1CB105, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x47B105 byte(s), hipHccModuleLaunchKernel() + 0xF5D75 byte(s) 0x00007FF97A14010F, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x3F010F byte(s), hipHccModuleLaunchKernel() + 0x6AD7F byte(s) 0x00007FF97A140231, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x3F0231 byte(s), hipHccModuleLaunchKernel() + 0x6AEA1 byte(s) 0x00007FF97A163A86, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x413A86 byte(s), hipHccModuleLaunchKernel() + 0x8E6F6 byte(s) 0x00007FF97A0FB0FF, B:\runner\_work\TheRock\TheRock\.venv\Lib\site-packages\_rocm_sdk_core\bin\amdhip64_7.dll(0x00007FF979D50000) + 0x3AB0FF byte(s), hipHccModuleLaunchKernel() + 0x25D6F byte(s) 0x00007FF9B1A4E8D7, C:\Windows\System32\KERNEL32.DLL(0x00007FF9B1A20000) + 0x2E8D7 byte(s), BaseThreadInitThunk() + 0x17 byte(s) 0x00007FF9B232C53C, C:\Windows\SYSTEM32\ntdll.dll(0x00007FF9B22A0000) + 0x8C53C byte(s), RtlUserThreadStart() + 0x2C byte(s) B:\runner\_work\_temp\211244a6-284a-4d32-9dec-bf7bac56d6e0.sh: line 1: 507 Segmentation fault python ./external-builds/pytorch/run_pytorch_tests.py external-builds\pytorch\pytorch\test\test_cuda.py::TestCuda::test_tiny_half_norm_ Error: Process completed with exit code 139.

… to crash

ScottTodd added 3 commits November 21, 2025 10:33

[torch] Add --pytorch-dir arg to test script, replacing --the-rock-ro…

c4186cb

…ot-dir

Rename "pytorch_version" to "pytorch_ref" to disambiguate further

a861952

Run pytorch unit tests on Windows

630338f

github-project-automation bot added this to TheRock Triage Nov 21, 2025

github-project-automation bot moved this to TODO in TheRock Triage Nov 21, 2025

ScottTodd mentioned this pull request Nov 21, 2025

ROCm on Windows python processes do not terminate on their own pytorch/pytorch#160759

Open

ScottTodd requested a review from HereThereBeDragons November 22, 2025 00:03

HereThereBeDragons reviewed Nov 24, 2025

View reviewed changes

.github/workflows/test_pytorch_wheels.yml Show resolved Hide resolved

HereThereBeDragons requested changes Nov 24, 2025

View reviewed changes

This was referenced Nov 24, 2025

Rename "pytorch_version" to "pytorch_git_ref" to disambiguate #2239

Merged

[torch] Add --pytorch-dir test arg, replacing --the-rock-root-dir #2217

Merged

ScottTodd requested a review from jayhawk-commits November 24, 2025 18:44

ScottTodd added 2 commits November 24, 2025 10:49

Merge remote-tracking branch 'upstream/main' into users/scotttodd/pyt…

eb14886

…orch-windows-tests-2

Rename exit_code.txt to run_pytorch_tests_exit_code.txt.

136bb2c

ScottTodd commented Nov 24, 2025

View reviewed changes

HereThereBeDragons approved these changes Nov 24, 2025

View reviewed changes

jayhawk-commits approved these changes Nov 24, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into users/scotttodd/pyt…

66bfbad

…orch-windows-tests-2

ScottTodd commented Nov 24, 2025

View reviewed changes

ScottTodd added 4 commits November 24, 2025 15:46

Skip test_pinned_memory_use_background_threads on torch 2.10 windows

4c92a70

Skip test_grad_scaling_autocast_foreach0_fused0_Adam_cuda_float32 due…

64d0680

… to crash

Skip crashing test on CI

666c73d

Skip another test.

6ceefa0

		# Skip tests that hang. Perhaps related to processes not terminating
		# on their own: https://github.com/ROCm/TheRock/issues/999.

Run unit tests in test_pytorch_wheels.yml on Windows #2265

Are you sure you want to change the base?

Run unit tests in test_pytorch_wheels.yml on Windows #2265

Uh oh!

Conversation

ScottTodd commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Uh oh!

HereThereBeDragons left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HereThereBeDragons Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HereThereBeDragons left a comment

Choose a reason for hiding this comment

Uh oh!

jayhawk-commits left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ScottTodd commented Nov 21, 2025 •

edited

Loading

HereThereBeDragons Nov 24, 2025 •

edited

Loading