Skip to content

Conversation

@BasharRadya
Copy link
Collaborator

@BasharRadya BasharRadya commented Apr 20, 2025

User description

We cannot disable RoCE LAG (deprecated option in OFED 5.1).

Change type

What kind of change does this PR introduce?

  • Bugfix
  • Feature
  • Code style update
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • CI related changes
  • Documentation content changes
  • Tests
  • Other

Check list

  • Code follows the style de facto guidelines of this project
  • Comments have been inserted in hard to understand places
  • Documentation has been updated (if necessary)
  • Test has been added (if possible)

PR Type

Bug fix, Enhancement


Description

  • Removed deprecated RoCE LAG disable path references.

  • Simplified print_roce_lag_warnings function signature and logic.

  • Updated RoCE LAG warning messages for clarity and relevance.

  • Adjusted related function calls to match updated signature.


Changes walkthrough 📝

Relevant files
Enhancement
net_device_val.cpp
Adjusted RoCE LAG warning logic and function calls             

src/core/dev/net_device_val.cpp

  • Removed disable_path parameter from print_roce_lag_warnings calls.
  • Adjusted function calls to match updated signature.
  • Simplified logic for RoCE LAG warning handling.
  • +2/-3     
    utils.cpp
    Simplified `print_roce_lag_warnings` and updated messages

    src/core/util/utils.cpp

  • Removed disable_path parameter from print_roce_lag_warnings.
  • Updated warning messages for clarity and removed deprecated
    instructions.
  • Simplified logic in the warning function.
  • +3/-10   
    utils.h
    Updated `print_roce_lag_warnings` function signature         

    src/core/util/utils.h

  • Updated print_roce_lag_warnings function signature.
  • Removed disable_path parameter from declaration.
  • +2/-2     

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @BasharRadya BasharRadya requested a review from galnoam April 20, 2025 08:32
    @BasharRadya
    Copy link
    Collaborator Author

    bot:retest

    2 similar comments
    @BasharRadya
    Copy link
    Collaborator Author

    bot:retest

    @AlexanderGrissik
    Copy link
    Collaborator

    bot:retest

    @galnoam galnoam requested a review from tomerdbz April 28, 2025 12:41
    @tomerdbz
    Copy link
    Collaborator

    Could you please update the description to detail more about the "why"?

    @tomerdbz
    Copy link
    Collaborator

    /review

    @tomerdbz
    Copy link
    Collaborator

    /describe

    @tomerdbz
    Copy link
    Collaborator

    /improve

    @pr-review-bot-app
    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Possible Issue

    The function print_roce_lag_warnings is invoked without the bond_roce_lag_path argument in some cases, which may lead to incorrect behavior or missing information in the warning logs.

    print_roce_lag_warnings(get_ifname_link(), guid_iter->second.front().c_str(),
                            guid_iter->second.back().c_str());
    Incomplete Warning Message

    The updated print_roce_lag_warnings function no longer includes a message about disabling RoCE LAG or referencing the release notes, which might reduce clarity for users troubleshooting the issue.

    "* " PRODUCT_NAME " cannot offload the device; issue with RoCE LAG.\n");

    @pr-review-bot-app
    Copy link

    PR Description updated to latest commit (d6567d8)

    @pr-review-bot-app
    Copy link

    pr-review-bot-app bot commented Apr 30, 2025

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Add bounds check for safe access

    Verify that guid_iter->second contains at least two elements before accessing
    front() and back() to avoid potential out-of-bounds access.

    src/core/dev/net_device_val.cpp [1482-1483]

    -print_roce_lag_warnings(get_ifname_link(), guid_iter->second.front().c_str(),
    -                        guid_iter->second.back().c_str());
    +if (guid_iter->second.size() >= 2) {
    +    print_roce_lag_warnings(get_ifname_link(), guid_iter->second.front().c_str(),
    +                            guid_iter->second.back().c_str());
    +}
    Suggestion importance[1-10]: 9

    __

    Why: Adding a bounds check ensures that the code does not attempt to access elements of guid_iter->second without verifying their existence, preventing potential out-of-bounds errors. This is a critical improvement for code safety and correctness.

    High
    Validate pointer before function call

    Ensure that get_ifname_link() does not return a null or invalid pointer before
    passing it to print_roce_lag_warnings to prevent undefined behavior.

    src/core/dev/net_device_val.cpp [1534]

    -print_roce_lag_warnings(get_ifname_link());
    +const char* ifname_link = get_ifname_link();
    +if (ifname_link) {
    +    print_roce_lag_warnings(ifname_link);
    +}
    Suggestion importance[1-10]: 8

    __

    Why: Validating the pointer returned by get_ifname_link() before using it in print_roce_lag_warnings prevents undefined behavior in case of a null or invalid pointer. This is an important safeguard for robust code execution.

    Medium
    General
    Improve clarity of warning message

    Clarify the warning message to provide more actionable information about the issue
    with RoCE LAG for better user understanding.

    src/core/util/utils.cpp [183]

    -vlog_printf(VLOG_WARNING, "* " PRODUCT_NAME " cannot offload the device; issue with RoCE LAG.\n");
    +vlog_printf(VLOG_WARNING, "* " PRODUCT_NAME " cannot offload the device due to an issue with RoCE LAG configuration. Please check the configuration and retry.\n");
    Suggestion importance[1-10]: 6

    __

    Why: Enhancing the warning message with more actionable information improves user understanding and usability. While this change is not critical, it contributes to better communication of issues to the user.

    Low

    Copy link
    Collaborator

    @tomerdbz tomerdbz left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Small fixes. See also AI comments which are good imho.

    Main issue I personally have is lacking documentation in redmine ticket, in the description and in the commit message. I can't understand as a reader the justification for the patch.

    We can not disable RoCE LAG (deprecated option in OFED 5.1).
    
    Signed-off-by: Bashar Abdelgafer <[email protected]>
    @BasharRadya BasharRadya force-pushed the remove_roce_lag_warning branch from d6567d8 to e5a2dc2 Compare May 15, 2025 07:33
    @BasharRadya BasharRadya requested a review from tomerdbz May 15, 2025 07:35
    @dpressle
    Copy link
    Collaborator

    bot:retest

    @BasharRadya
    Copy link
    Collaborator Author

    @tomerdbz please review in your convenient time

    @BasharRadya
    Copy link
    Collaborator Author

    bot:retest

    @BasharRadya
    Copy link
    Collaborator Author

    @tomerdbz REMINDER - i think you approved it in VMA but not in XLIO?

    @galnoam galnoam merged commit 133d46f into Mellanox:vNext Jul 14, 2025
    1 check passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    5 participants