Skip to content

Conversation

@qiluo-msft
Copy link
Contributor

@qiluo-msft qiluo-msft commented Nov 25, 2025

unwrap() function may crash in runtime.
Drop should not crash.
Clone should not crash, so remove it.
Protect casting Duration

Depends on sonic-net/sonic-dash-ha#132

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@qiluo-msft qiluo-msft marked this pull request as ready for review November 25, 2025 00:46
Copilot AI review requested due to automatic review settings November 25, 2025 00:46
Copilot finished reviewing on behalf of qiluo-msft November 25, 2025 00:49
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Rust code safety by removing risky operations (unwrap(), expect()) and unnecessary type casting. The changes span both Rust FFI bindings and C API implementations, changing file descriptor types from uint32_t to int32_t to match the underlying C++ int getFd() return type, and replacing Duration parameters with u32 timeout_ms for more direct control.

  • Changed file descriptor types from uint32_t to int32_t in C API to match underlying C++ implementation
  • Removed unwrap() calls when creating BorrowedFd since type casting is no longer needed
  • Replaced expect() calls in Drop implementations with eprintln! to prevent panics during cleanup
  • Changed read_data() API from accepting Duration to u32 timeout_ms (breaking change)
  • Removed Clone trait implementation for DbConnector to prevent potential panics

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/c_api_ut.cpp Updated test file descriptors from uint32_t to int32_t
common/c-api/consumerstatetable.h Changed getFd signature to return int32_t instead of uint32_t
common/c-api/consumerstatetable.cpp Removed numeric_cast from getFd implementation
common/c-api/subscriberstatetable.h Changed getFd signature to return int32_t instead of uint32_t
common/c-api/subscriberstatetable.cpp Removed numeric_cast from getFd implementation
common/c-api/zmqconsumerstatetable.h Changed getFd signature to return int32_t instead of uint32_t
common/c-api/zmqconsumerstatetable.cpp Removed numeric_cast from getFd implementation
crates/swss-common/src/types/consumerstatetable.rs Removed Duration import, unwrap() call, and updated Drop to use eprintln!
crates/swss-common/src/types/subscriberstatetable.rs Removed Duration import, unwrap() call, and updated Drop to use eprintln!
crates/swss-common/src/types/zmqconsumerstatetable.rs Removed Duration import, unwrap() call, updated Drop to use eprintln!, and improved comment capitalization
crates/swss-common/src/types/table.rs Updated Drop implementation to use eprintln! instead of expect()
crates/swss-common/src/types/producerstatetable.rs Updated Drop implementation to use eprintln! instead of expect()
crates/swss-common/src/types/zmqclient.rs Updated Drop implementation to use eprintln! instead of expect()
crates/swss-common/src/types/dbconnector.rs Removed Clone trait implementation and updated Drop to use eprintln!

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@qiluo-msft qiluo-msft marked this pull request as draft November 25, 2025 01:06
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).


pub fn read_data(&self, timeout: Duration, interrupt_on_signal: bool) -> Result<SelectResult> {
let timeout_ms = timeout.as_millis().try_into().unwrap();
pub fn read_data(&self, timeout_ms: u32, interrupt_on_signal: bool) -> Result<SelectResult> {
Copy link
Contributor

@saiarcot895 saiarcot895 Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason that Duration is used is to signify to the caller that this argument is a time argument, and so something time-based should be passed in. C language not having a discrete time type shouldn't be a reason not to use it in Rust code. #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would instead recommend handling a duration that is too large gracefully: return an error saying the duration was too big (or similar).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved. and also change return type.

unsafe {
if let Err(e) = swss_try!(SWSSSubscriberStateTable_free(self.ptr)) {
eprintln!("Error dropping SubscriberStateTable: {}", e);
}
Copy link
Contributor

@saiarcot895 saiarcot895 Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for a *_free function to fail? The object got corrupted? I'm not sure if just printing on stderr and continuing the application is safe at that point. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be possible that a programmer mistake of throwing in dtor in this class or any base class or any member class. Crashing a critical daemon is not good for online service, so use eprintln as defensive programming programming.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running after an exception is thrown in dtor is not good either; the runtime state of the program is no longer known or consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to fix exception is thrown in dtor issue in another future PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The future PR is #1115

unsafe {
let fd = swss_try!(p_fd => SWSSConsumerStateTable_getFd(self.ptr, p_fd))?;
let fd = BorrowedFd::borrow_raw(fd.try_into().unwrap());
let fd = BorrowedFd::borrow_raw(fd);
Copy link
Contributor

@yue-fred-gao yue-fred-gao Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

borrow_raw document says "The resource pointed to by fd must remain open for the duration of the returned BorrowedFd, and it must not have the value -1.". We should check the value is not negative. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked for -1, not for negative. Since BorrowedFd and hiredis cares only about -1.


pub fn read_data(&self, timeout: Duration, interrupt_on_signal: bool) -> Result<SelectResult> {
let timeout_ms = timeout.as_millis().try_into().unwrap();
pub fn read_data(&self, timeout_ms: u32, interrupt_on_signal: bool) -> Result<SelectResult> {
Copy link
Contributor

@yue-fred-gao yue-fred-gao Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of changing the argument to u32, you can timeout.as_millis().try_into().map_err(|_| Error::new("Timeout value too large to convert"))? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved. and also change return type.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@qiluo-msft qiluo-msft changed the title Remove risky unwrap(), expect(), and casting Fix risky unwrap(), expect(), and casting Nov 26, 2025
@qiluo-msft qiluo-msft marked this pull request as ready for review November 26, 2025 00:07
let redis = Redis::start();
let pst = ProducerStateTable::new(redis.db_connector(), "table_a")?;
let cst = ConsumerStateTable::new(redis.db_connector(), "table_a", None, None)?;
let pst = ProducerStateTable::new(redis.db_connector(), "table_a").unwrap();
Copy link
Contributor

@saiarcot895 saiarcot895 Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why unwrap() on test code? The old code will already return an error, and the test will be gracefully marked as failed. #Resolved

Copy link
Contributor Author

@qiluo-msft qiluo-msft Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge is that read_data, and other function? will return different Result. So it is easy to unwrap and let testcase fail. The failure message is also kind of graceful

thread 'consumer_producer_state_tables_sync_api_basic_test' panicked at crates/swss-common/tests/sync.rs:76:12:
called `Option::unwrap()` on a `None` value

test result: FAILED. 7 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 2.43s

error: test failed, to rerun pass `-p swss-common --test sync`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason to change return type of read_data? Is it really necessary. It seems a lot of churn due to that

Copy link
Contributor Author

@qiluo-msft qiluo-msft Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that we want to use Duration instead of u32, and the casting error handling introduce new Result type. Similar change also happens on get_fd().

pub fn read_data(&self, timeout: Duration, interrupt_on_signal: bool) -> std::io::Result<SelectResult>

Another option is to keep both function as thin wrapper of c-api and keep the most native data types, then we can keep the Result type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the Result type, which is

pub type Result<T, E = Exception> = std::result::Result<T, E>;
. You can add a constructor to Exception then you can use it in map_err.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@qiluo-msft qiluo-msft merged commit d6ccff7 into sonic-net:master Nov 29, 2025
18 checks passed
@liushilongbuaa
Copy link
Contributor

@qiluo-msft , It seems this PR breaks official build.
Why we didn't update submodule ref but it takes this commit?

@yue-fred-gao
Copy link
Contributor

@qiluo-msft , It seems this PR breaks official build. Why we didn't update submodule ref but it takes this commit?

I think this is because sonic-dash-ha and all rust components are pulling swss-common master. See https://github.com/sonic-net/sonic-dash-ha/blob/b38d8fb68c7e1a1d3c866fdb65e840c7da1021d8/Cargo.toml#L100. sonic-dash-ha has been updated to accommodate the change in swss-common. We need to update its ref in sonic-buildimage.

@yue-fred-gao
Copy link
Contributor

@qiluo-msft , It seems this PR breaks official build. Why we didn't update submodule ref but it takes this commit?

I saw the error in sonic-buildimage. The issue is that libswsscommon is built from src/sonic-swss-common mounted by sonic-buildimage but rust is pulled directly from swss-common master branch. libswsscommon lags behind rust swss-common crate and causes the compilation error. We need to update ref of swss-common in sonic-buildimage to the latest.

Going forward, probably we should change the way rust components pulling swss-common. Instead of always from the latest of the branch, they can use relative path to find swss-common in sonic-buildimage. When we cut a new branch, we don't need to update cargo.toml to the new branch (otherwise, they always use swss-common master branch).

@yue-fred-gao
Copy link
Contributor

@qiluo-msft , I submitted PR sonic-net/sonic-buildimage#24663 to fix the issue. And let's discuss the long term fix I proposed above.

@qiluo-msft qiluo-msft deleted the qiluo/unwrap branch December 1, 2025 17:37
dprital added a commit to dprital/sonic-swss-common that referenced this pull request Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants