Skip to content

Conversation

jfeng-arista
Copy link
Contributor

When a fabric link keep flaky for a long time, Orchagent could crash with the out of range issue. It crashed at the convering the link down count number field from state_db to a uint8_t. This field is a cumulated number of the number of times of a link down, so to prevent out of range issue, it should convert the value to a larger type, such as uint64_t.

The issue is tracked at sonic-net/sonic-buildimage#24020

This PR revisit the related code and convert to related fields to uint64_t.
What I did

Why I did it

How I verified it

Details if related

@mssonicbld
Copy link
Collaborator

/azp run

@jfeng-arista
Copy link
Contributor Author

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jfeng-arista
Copy link
Contributor Author

it looks like failed as

2025-09-17T18:59:11.7239608Z Traceback (most recent call last):
2025-09-17T18:59:11.7239766Z File "/agent/_work/1/s/tests/conftest.py", line 187, in del
2025-09-17T18:59:11.7240063Z neighbors = self.get_keys(self.NEIGH_TABLE)
2025-09-17T18:59:11.7240239Z File "/agent/_work/1/s/tests/dvslib/dvs_database.py", line 138, in get_keys
2025-09-17T18:59:11.7240406Z table = swsscommon.Table(self.db_connection, table_name)
2025-09-17T18:59:11.7240590Z File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 3180, in init
2025-09-17T18:59:11.7240777Z _swsscommon.Table_swiginit(self, _swsscommon.new_Table(*args))
2025-09-17T18:59:11.7240979Z RuntimeError: Unable to connect to redis (unix-socket) - No such file or directory(1): Cannot assign requested address
2025-09-17T18:59:11.7241501Z
2025-09-17T18:59:11.7241780Z Enable tracemalloc to get traceback where the object was allocated.
2025-09-17T18:59:11.7242121Z See https://docs.pytest.org/en/stable/how-to/capture-warnings.html#resource-warnings for more info.
2025-09-17T18:59:11.7242527Z warnings.warn(pytest.PytestUnraisableExceptionWarning(msg))

do not look like related to the change

@jfeng-arista
Copy link
Contributor Author

ERROR p4rt/test_l3.py::TestP4RTL3::test_IPv4RouteWithNexthopAddUpdateDeletePass
ERROR p4rt/test_l3.py::TestP4RTL3::test_IPv6WithWcmpRouteAddUpdateDeletePass

does not look like related to the chagne

@jfeng-arista
Copy link
Contributor Author

/Azp run Azure.sonic-swss

Copy link

Commenter does not have sufficient privileges for PR 3889 in repo sonic-net/sonic-swss

@jfeng-arista
Copy link
Contributor Author

/Azpw run Azure.sonic-swss

@mssonicbld
Copy link
Collaborator

/AzurePipelines run Azure.sonic-swss

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rlhui rlhui added the P0 label Sep 23, 2025
@rlhui rlhui requested a review from vmittal-msft September 23, 2025 04:03
@vmittal-msft
Copy link
Contributor

@jfeng-arista please rebase

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jfeng-arista
Copy link
Contributor Author

updated and all check passed

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@abdosi
Copy link
Contributor

abdosi commented Oct 1, 2025

@prsunny can you help merge this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

6 participants