lifecycle: Add exception handling for CPU affinity by SangKyeong-Jeong · Pull Request #113 · eclipse-score/lifecycle

SangKyeong-Jeong · 2026-03-11T04:14:31Z

When there are 32 CPU cores, an issue arises where the process cannot be executed, and when there are more than 32 cores, a problem occurs where fewer cores than expected are used. To resolve this issue, exception handling is added to set the maximum number of cores using the uint32_t type when there are more than 32 cores.

github-actions · 2026-03-11T04:15:27Z

License Check Results

🚀 The license check job ran with the Bazel command:

bazel run --lockfile_mode=error //:license-check

Status: ⚠️ Needs Review

Click to expand output

[License Check Output]
Extracting Bazel installation...
Starting local Bazel server (8.4.2) and connecting to it...
INFO: Invocation ID: 58dd8f8e-8d14-420b-ad51-aaa82121083f
Computing main repo mapping: 
Computing main repo mapping: 
Computing main repo mapping: 
WARNING: For repository 'score_rust_policies', the root module requires module version score_rust_policies@0.0.3, but got score_rust_policies@0.0.5 in the resolved dependency graph. Please update the version in your MODULE.bazel or set --check_direct_dependencies=off
Loading: 
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
    currently loading: 
Loading: 0 packages loaded
    currently loading: 
Loading: 0 packages loaded
    currently loading: 
Analyzing: target //:license-check (1 packages loaded, 0 targets configured)
Analyzing: target //:license-check (1 packages loaded, 0 targets configured)

Analyzing: target //:license-check (53 packages loaded, 9 targets configured)

Analyzing: target //:license-check (109 packages loaded, 39 targets configured)

Analyzing: target //:license-check (152 packages loaded, 2669 targets configured)

Analyzing: target //:license-check (154 packages loaded, 4255 targets configured)

Analyzing: target //:license-check (165 packages loaded, 7889 targets configured)

Analyzing: target //:license-check (165 packages loaded, 7901 targets configured)

Analyzing: target //:license-check (165 packages loaded, 7901 targets configured)

Analyzing: target //:license-check (165 packages loaded, 7901 targets configured)

Analyzing: target //:license-check (168 packages loaded, 9789 targets configured)

Analyzing: target //:license-check (168 packages loaded, 9789 targets configured)

Analyzing: target //:license-check (168 packages loaded, 9789 targets configured)

Analyzing: target //:license-check (168 packages loaded, 9789 targets configured)

Analyzing: target //:license-check (168 packages loaded, 9789 targets configured)

INFO: Analyzed target //:license-check (170 packages loaded, 10039 targets configured).
[13 / 16] JavaToolchainCompileClasses external/rules_java+/toolchains/platformclasspath_classes; 0s disk-cache, processwrapper-sandbox
[15 / 16] [Sched] Building license.check.license_check.jar ()
INFO: Found 1 target...
Target //:license.check.license_check up-to-date:
  bazel-bin/license.check.license_check
  bazel-bin/license.check.license_check.jar
INFO: Elapsed time: 28.252s, Critical Path: 2.38s
INFO: 16 processes: 12 internal, 3 processwrapper-sandbox, 1 worker.
INFO: Build completed successfully, 16 total actions
INFO: Running command line: bazel-bin/license.check.license_check ./formatted.txt <args omitted>
usage: org.eclipse.dash.licenses.cli.Main [-batch <int>] [-cd <url>]
       [-confidence <int>] [-ef <url>] [-excludeSources <sources>] [-help] [-lic
       <url>] [-project <shortname>] [-repo <url>] [-review] [-summary <file>]
       [-timeout <seconds>] [-token <token>]

github-actions · 2026-03-11T04:18:32Z

The created documentation from the pull request is available at: docu-html

FScholPer · 2026-03-11T10:04:26Z

@SangKyeong-Jeong whats the status here can we review it

FScholPer

Can you please check also the OS Abstraction Layer

daeyoung-jeong-lge · 2026-03-11T23:53:57Z

@FScholPer We will double-check the OSAL implementation files (specifically under src/internal/osal/linux and qnx) to ensure the CPU core limit is handled consistently across different platforms. We'll update the PR if any adjustments are needed.
And when we are ready, is it OK for us to change the status of this issue to "Ready for review"?

SangKyeong-Jeong · 2026-03-12T00:58:32Z

@FScholPer

This issue is unrelated to the OSAL.

When the CPU core count is 32, the return value of the ConfigurationManager::kDefaultProcessorAffinityMask() function is 0, causing an error when setting the CPU affinity to 0.

Shifting a 32-bit integer by 32 bits results in a shift-count-overflow error, which leads to undefined behavior. Consequently, the return value of ConfigurationManager::kDefaultProcessorAffinityMask() becomes 0, causing this issue.

daeyoung-jeong-lge · 2026-03-12T01:05:38Z

@FScholPer
As SangKyeong said, regarding to OSAL, this issue is not related. If we find some issue on OSAL, we will handle it as the separated issue and PR.

Please review this. Thanks.

Shifting a 32-bit number by 32 bits causes undefined behavior, so modify it to perform the shift operation on a 64-bit number and then cast it.

daeyoung-jeong-lge · 2026-03-12T05:16:45Z

We encounter the Formatting checks fail. It looks like the temporal download connection problem, so I want to re-run the batch, but maybe don't have a permission to do.

pawelrutkaq

rebase branch

pawelrutkaq · 2026-03-12T11:18:48Z

src/launch_manager_daemon/src/configuration_manager/configurationmanager.cpp

 const uint32_t ConfigurationManager::kDefaultProcessExecutionError = 1U;
 uint32_t ConfigurationManager::kDefaultProcessorAffinityMask() {
-    return (1U << osal::getNumCores()) - 1U;
+    return static_cast<uint32_t>((1ULL << osal::getNumCores()) - 1ULL);


When there are 64 cores, is there still the issue or ? ;) In general, this affinity in LM supports only up to 32 cores, since later on cpu_mask_ also does not support more. So either we do an assert is there is more than 32 cores, or we do wanirng logs that default affinity is set only to first 32 cores ;)

This PR addresses the Undefined Behavior on 32-core systems. As chungsky mentioned, getNumCores() already caps the count at 32, so >32 core systems are safe with this default.

I know what i am saying that we shall probably add at least a warning (since you are already fixing issues around it) that it was capped somehwere, since defult behaviour for ie 32 and 33 cores is completely different. @NicolasFussberger what do you think ?

I agree, logging a warning could be helpful.
However, that might not be trivial. Probably you do not want to log the warning again and again every time osal::getNumCores() is called but only once during startup. So I think it would require refactoring getNumCores() method or adding another method for validation purposes.

The behaviour in case of > 32 cores seems reasonable to me. LaunchManager will just use the first 32 cores.

We could also add support for 64 cores if that is required, but that is probably a separate task.

After a short investigation it looks like we are hitting an undefined behavior here...
For reference pls have look at C Standard (C18, ISO/IEC 9899:2018) section 6.5.7 Bitwise shift operators and C++ Standard (C++20, ISO/IEC 14882:2020) section 7.6.2.2 Bitwise shift operators [expr.shift].

If the value of the right operand is negative, or greater than or equal to the number of bits in the promoted left operand, the behavior is undefined.

IMHO it looks to me that hardware is not really doing the logical thing here. Please consider following example:

#include <cstdint> #include <iostream> void shiftLeftTest(std::uint32_t num) { std::uint32_t first_result = 1U << num; std::uint32_t second_result = 1U << 32; std::cout << "Shifting left 1U by " << num << std::endl; std::cout << "first_result --> " << first_result << std::endl; std::cout << "second_result --> " << second_result << std::endl; } int main() { shiftLeftTest(32); }

This code will print following text:

Shifting left 1U by 32 first_result --> 1 second_result --> 0

So essentially compiler will calculate something different than hardware.
For this reason maybe we should go for the following code that especially address this case. This may be handy if we support 64 cores in the future.

uint32_t ConfigurationManager::kDefaultProcessorAffinityMask() { constexpr uint32_t BITS_PER_BYTE = 8; uint32_t bitMask = 0U; uint32_t cores = osal::getNumCores(); uint32_t maskSize = sizeof(bitMask) * BITS_PER_BYTE; return ( cores >= maskSizeInBits ? -1U : (1U << cores) - 1U ); }

Any opinion?

Yup, true. For me fine. @SimonKozik / @NicolasFussberger can we create an gtihub issue to support 64 cores and/or provide a warning indication to the user if there is still more core ?

To resolve the Undefined Behavior issue that occurs with 32 cores, is there any problem with the method I applied in the patch, which performs the shift operation using a 64-bit integer (1ULL) and then casts it?

Since the return type of the kDefaultProcessorAffinityMask function is uint32_t, returning -1U seems awkward because it applies a negative sign to an unsigned type.

It is true that 32 cores is a bit hard coded now and changing the maximum to 64, will require some changes in few files. For this reason it will be hard to argue, that we should make this function very future proof.

May main concern is the fact, that the change proposed in this PR is just masking the problem. Not really solving it. So if we in the future increase the mask size to 64-bit, your code will step into the same problem. Am I right here?

I will argue that we should document the root problem we are facing here. This probably will be in the form of a comment.
Apart of that we should have fix.
Fix can be in the form proposed inside my comment, or the fix can be as proposed in PR. But if we are going to go with 1ULL then we should document that we are avoiding the undefined behavior by increasing the size of the type on which we are performing calculation.

I just want to avoid stepping into the same problem when we eventually provide support for 64 cores.

PS.
A big plus of using 1ULL is the fact that we can avoid branch in the calculation.

I agree that if the mask size is increased to 64 bits, we may encounter the same problem again. Currently, the maximum value is limited to 32 in the osal::getNumCores() function, which is why I made this change.
It also seems important to document this so that the issue does not occur again.

I added a ticket here #122 to support 64 cores, linking to the proposal #113 (comment) for bitshifting

…ption-handling

SangKyeong-Jeong · 2026-03-13T01:57:31Z

Done rebasing

FScholPer · 2026-03-13T15:08:43Z

@pawelrutkaq can you please also check again

lifecycle: Add exception handling for CPU affinity

84a8717

SangKyeong-Jeong temporarily deployed to workflow-approval March 11, 2026 04:14 — with GitHub Actions Inactive

SangKyeong-Jeong mentioned this pull request Mar 11, 2026

Bugfix: Launch Manager Process Startup Failure on CPUs with 32 Cores #112

Open

FScholPer reviewed Mar 11, 2026

View reviewed changes

SangKyeong-Jeong marked this pull request as ready for review March 12, 2026 01:02

lifecycle: Use 64-bit operations

21d3d28

Shifting a 32-bit number by 32 bits causes undefined behavior, so modify it to perform the shift operation on a 64-bit number and then cast it.

SangKyeong-Jeong requested a deployment to workflow-approval March 12, 2026 05:00 — with GitHub Actions Waiting

pawelrutkaq requested changes Mar 12, 2026

View reviewed changes

Merge branch 'eclipse-score:main' into lifecycle/bugfix/affinity-exce…

af714a8

…ption-handling

SangKyeong-Jeong temporarily deployed to workflow-approval March 13, 2026 01:47 — with GitHub Actions Inactive

Merge branch 'main' into lifecycle/bugfix/affinity-exception-handling

7c700e5

FScholPer had a problem deploying to workflow-approval March 13, 2026 14:42 — with GitHub Actions Failure

FScholPer had a problem deploying to workflow-approval March 13, 2026 14:42 — with GitHub Actions Error

NicolasFussberger mentioned this pull request Mar 13, 2026

Improvement: Extend LaunchManager to support runmask for 64 cores #122

Open

FScholPer deployed to workflow-approval March 13, 2026 15:09 — with GitHub Actions Active

Conversation

SangKyeong-Jeong commented Mar 11, 2026

Uh oh!

github-actions bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

License Check Results

Uh oh!

github-actions bot commented Mar 11, 2026

Uh oh!

FScholPer commented Mar 11, 2026

Uh oh!

FScholPer left a comment

Choose a reason for hiding this comment

Uh oh!

daeyoung-jeong-lge commented Mar 11, 2026

Uh oh!

SangKyeong-Jeong commented Mar 12, 2026

Uh oh!

daeyoung-jeong-lge commented Mar 12, 2026

Uh oh!

daeyoung-jeong-lge commented Mar 12, 2026

Uh oh!

pawelrutkaq left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daeyoung-jeong-lge Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasFussberger Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SangKyeong-Jeong commented Mar 13, 2026

Uh oh!

FScholPer commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

github-actions bot commented Mar 11, 2026 •

edited

Loading

daeyoung-jeong-lge Mar 12, 2026 •

edited

Loading

NicolasFussberger Mar 13, 2026 •

edited

Loading