Skip to content

windows-2022 + WSL runner is mysteriously hanging (disconnection/cancelled) after successfully completing a Python step #12321

Open
@joshuacwnewton

Description

@joshuacwnewton

Description

I maintain a Python software project that installs itself into a conda environment.

For the last several years, we have had a working test suite for WSL using the windows-2019 runner. (Last passing run and associated workflow file

Due to the windows-2019 brownout, we switched over to windows-2022 in this PR: spinalcordtoolbox/spinalcordtoolbox#4907 (comment)

# before
Linux fv-az1346-332 4.4.0-17763-Microsoft #2268-Microsoft Thu Oct 07 16:36:00 PST 2021 x86_64 x86_64 x86_64 GNU/Linux

# after
Linux fv-az1212-522 4.4.0-20348-Microsoft #2849-Microsoft Thu Nov 01 17:32:00 PST 2024 x86_64 x86_64 x86_64 GNU/Linux

However, after merging this change, we immediately started noticing that runs would mysteriously hang / disconnect / cancel with a number of strange errors/annotations:

Sample run 1

Received request to deprovision: The request was cancelled by the remote provider.

**Run Vampire/setup-wsl@v3**
Failed to restore: getCacheEntry failed: connect ETIMEDOUT 20.246.192.124:443

**Run Vampire/setup-wsl@v3**
Failed to save: reserveCache failed: connect ETIMEDOUT 20.246.192.124:443

Sample run 2

The hosted runner encountered an error while running your job. (Error Type: Disconnect).
Failed to restore: getCacheEntry failed: Cache service responded with 503
Failed to save: reserveCache failed: Cache service responded with 503

Even after updating multiple settings (setup-wsl@v3 -> setup-wsl@v5, WSL2 -> WSL1, etc.) the step still hangs:

Sample run 3

The hosted runner encountered an error while running your job. (Error Type: Disconnect).

If you have any advice for how to debug this issue, when there are no logs and no way to access the runner when it freezes, that would be greatly appreciated.

Note: This may be a setup-wsl issue, however a colleague of mine is also encountering similar "freezing runner" issues on macOS+arm, so I'm curious if there is a chance this is a kernel issue and etc.

Platforms affected

  • Azure DevOps
  • GitHub Actions - Standard Runners
  • GitHub Actions - Larger Runners

Runner images affected

  • Ubuntu 22.04
  • Ubuntu 24.04
  • macOS 13
  • macOS 13 Arm64
  • macOS 14
  • macOS 14 Arm64
  • macOS 15
  • macOS 15 Arm64
  • Windows Server 2019
  • Windows Server 2022
  • Windows Server 2025

Image version and build link

  Image: windows-2022
  Version: 20250527.1.0
  Included Software: https://github.com/actions/runner-images/blob/win22/20250527.1/images/windows/Windows2022-Readme.md
  Image Release: https://github.com/actions/runner-images/releases/tag/win22%2F20250527.1

See example links above.

Is it regression?

Kind of? (windows-2019 worked fine)

Expected behavior

Either a clear failure (and a log to debug), or a passing run.

Actual behavior

Step appears to pass (it does not halt midway through execution; it finishes, but gets stuck at the end of the process).

Running with python -v shows that the teardown steps don't occur. But, it is hard to demonstrate this, because when the run gets cancelled, any logs are inaccessible.

Repro steps

See linked workflow file above.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions