Skip to content

Fix occasional thread-local memcheck fail#45

Open
KarlLudwig3485 wants to merge 2 commits intoprimesearch:mainfrom
KarlLudwig3485:threadlocal_memcheck_fail
Open

Fix occasional thread-local memcheck fail#45
KarlLudwig3485 wants to merge 2 commits intoprimesearch:mainfrom
KarlLudwig3485:threadlocal_memcheck_fail

Conversation

@KarlLudwig3485
Copy link
Collaborator

@KarlLudwig3485 KarlLudwig3485 commented Jul 9, 2025

(Always occurs on x86-64 Windows, one report on ARM Linux)

Note that I branched from before ARM Windows was added to the CI, as that broke Windows CI entirely.
No one seems to have noticed, as Windows builds were already broken. Probably ought to be fixed...

Fixes #8

@tdulcet
Copy link
Member

tdulcet commented Jul 9, 2025

Note that I branched from before ARM Windows was added to the CI, as that broke Windows CI entirely.
No one seems to have noticed, as Windows builds were already broken. Probably ought to be fixed...

Thanks for noticing that is was broken. Looks like this was partially my fault, although GitHub also made a breaking change. I fixed it in 1c2c0fd. You should now be able to rebase your PR, so we can run the CI and test your changes.

@tdulcet tdulcet added the bug Something isn't working label Jul 9, 2025
@KarlLudwig3485
Copy link
Collaborator Author

KarlLudwig3485 commented Jul 9, 2025

I fixed it in 1c2c0fd. You should now be able to rebase your PR, so we can run the CI and test your changes.

Sorry to be the bearer of bad news, but ARM Windows CI is still broken. (This issue doesn't affect ARM Windows, though.)
It seems to be building the x86-64 version on the ARM runner. The ARM version has to be built with the CLANGARM64 environment, as GCC doesn't support ARM Windows yet. I don't know enough about GitHub CI to fix it myself, otherwise I would have done before this PR.

Also, why does the ARM Windows CI not build with GMP and hwloc? They're in the repos.
Full disclosure, I don't have an ARM msys2 system to test on, as I make my ARM builds with LLVM-MinGW.

(Always occurs on x86-64 Windows, one report on ARM Linux)
@KarlLudwig3485 KarlLudwig3485 force-pushed the threadlocal_memcheck_fail branch from 842ad20 to 7c9e0ef Compare July 9, 2025 15:55
@tdulcet
Copy link
Member

tdulcet commented Jul 10, 2025

Sorry to be the bearer of bad news, but ARM Windows CI is still broken.

Yeah, I saw that about the Windows ARM runners actually building x86-64. However, based on the CI results for this PR, the Windows ARM builds are OK. It is the Windows x86-64 builds with Clang that are seg faulting. I thought that was also related to #8, but maybe it is #29 or something else.

Also, why does the ARM Windows CI not build with GMP and hwloc? They're in the repos.

The limitations with the Windows ARM builds are due to the runners provided by GitHub Actions, which include fewer tools than the equivalent x86-64 runners, so there is not much we can (easily) do about it. Specifically, MSYS2 is not installed (and does not seem supported on ARM), which means the pacman package manger is not available and thus we cannot install GMP or hwloc. Clang is also not installed. See this list for what is available. I do not have access to any Windows ARM systems, so I am not sure if there is anything we could do short of compiling the tools and libraries ourselves.

@KarlLudwig3485
Copy link
Collaborator Author

Sorry to be the bearer of bad news, but ARM Windows CI is still broken.

Yeah, I saw that about the Windows ARM runners actually building x86-64. However, based on the CI results for this PR, the Windows ARM builds are OK. It is the Windows x86-64 builds with Clang that are seg faulting. I thought that was also related to #8, but maybe it is #29 or something else.

Also, why does the ARM Windows CI not build with GMP and hwloc? They're in the repos.

The limitations with the Windows ARM builds are due to the runners provided by GitHub Actions, which include fewer tools than the equivalent x86-64 runners, so there is not much we can (easily) do about it. Specifically, MSYS2 is not installed (and does not seem supported on ARM), which means the pacman package manger is not available and thus we cannot install GMP or hwloc. Clang is also not installed. See this list for what is available. I do not have access to any Windows ARM systems, so I am not sure if there is anything we could do short of compiling the tools and libraries ourselves.

MSYS2's ARM support is a little confusing. MSYS2 itself only runs on ARM under x64 emulation, but the CLANGARM64 environment is native aarch64.

If the ARM Windows image (which I see is still in beta) doesn't include MSYS2, I don't quite see the point of building the x64 version on ARM. Still, I suppose we can just leave it as-is until MSYS2 gets added, at which point we can test ARM Windows builds in addition to x64-under-emulation.

Back to the issue at hand, I myself don't encounter the segfault with the Clang build, but I am yet to try it with MSYS2.
Does the Clang build use the CLANG64 environment (i.e. a pure LLVM toolchain), or do you just install Clang in the MINGW64 environment, and still use the rest of the GNU toolchain?

The segfault is definitely a separate issue.

@tdulcet
Copy link
Member

tdulcet commented Jul 11, 2025

If the ARM Windows image (which I see is still in beta) doesn't include MSYS2, I don't quite see the point of building the x64 version on ARM.

Yeah, I did not realize until after I had already added the Windows ARM runner that one could not actually build it yet for native ARM on Windows or I probably would not have bothered.

Does the Clang build use the CLANG64 environment (i.e. a pure LLVM toolchain), or do you just install Clang in the MINGW64 environment, and still use the rest of the GNU toolchain?

The latter, it uses the MINGW64 environment, as I was trying to mirror how it is built on Linux. From looking closer at the output, it actually seems to be #7. If we update the CI to skip the "teensy" self-test on Windows, it would likely resolve the problem.

More concerning is the Windows ASan build, which does use the CLANG64 environment. It is still producing the error (in the Fermat code), for example:

ERROR: Function radix32_ditN_cy_dif1, at line 1234 of file ../src/radix32_ditN_cy_dif1.c
Assertion 'tdat[ithread].rn0 == rn0' failed: thread-local memcheck fail!

@KarlLudwig3485 KarlLudwig3485 force-pushed the threadlocal_memcheck_fail branch from a85b2a4 to 7c9e0ef Compare July 11, 2025 14:07
@KarlLudwig3485
Copy link
Collaborator Author

KarlLudwig3485 commented Jul 15, 2025

I've added a fix for Fermat numbers.

Also, at some point I accidentally pushed a commit to this branch (and then reverted it), which caused the CI to re-run. The Clang build didn't segfault. Now that I've pushed a commit and the CI has run again, the Clang build failed again. I intend to look into it, but I've been having some trouble reproducing it. edit: Nevermind, I've got it

@tdulcet
Copy link
Member

tdulcet commented Jul 15, 2025

The Clang build didn't segfault. Now that I've pushed a commit and the CI has run again, the Clang build failed again.

Some of the GitHub Actions runners are slightly different, which can sometimes affect the reproducibility of volatile bugs like seg faults. I am able to reproduce it on both Windows and Linux. However, as you said, it is a separate issue, so it should not block us from merging this PR.

@Artoria2e5
Copy link

Might be a good idea to pull these changes onto the ld-warnings branch, since the memcheck failure is preventing people from evalulating the effects of that branch's changes on Windows too. Also both cover many of the same files, so perhaps reviewing the sum-effects of these two is a good idea.

@tdulcet
Copy link
Member

tdulcet commented Nov 11, 2025

Might be a good idea to pull these changes onto the ld-warnings branch, since the memcheck failure is preventing people from evalulating the effects of that branch's changes on Windows too.

One should be able to merge the two branches locally for testing. I recall Ken (on the forum) did this and there were no merge conflicts. This is a much smaller PR and will likely be merged before #34, so we could then rebase #34 if needed.

@tdulcet tdulcet force-pushed the main branch 2 times, most recently from 07df4ce to 53cf9dc Compare November 20, 2025 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error with native Windows builds: Assertion failed: thread-local memcheck fail!

3 participants