-
Notifications
You must be signed in to change notification settings - Fork 322
Fix risky unwrap(), expect(), and casting #1113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Qi Luo <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR improves Rust code safety by removing risky operations (unwrap(), expect()) and unnecessary type casting. The changes span both Rust FFI bindings and C API implementations, changing file descriptor types from uint32_t to int32_t to match the underlying C++ int getFd() return type, and replacing Duration parameters with u32 timeout_ms for more direct control.
- Changed file descriptor types from
uint32_ttoint32_tin C API to match underlying C++ implementation - Removed
unwrap()calls when creatingBorrowedFdsince type casting is no longer needed - Replaced
expect()calls in Drop implementations witheprintln!to prevent panics during cleanup - Changed
read_data()API from acceptingDurationtou32 timeout_ms(breaking change) - Removed Clone trait implementation for DbConnector to prevent potential panics
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/c_api_ut.cpp | Updated test file descriptors from uint32_t to int32_t |
| common/c-api/consumerstatetable.h | Changed getFd signature to return int32_t instead of uint32_t |
| common/c-api/consumerstatetable.cpp | Removed numeric_cast from getFd implementation |
| common/c-api/subscriberstatetable.h | Changed getFd signature to return int32_t instead of uint32_t |
| common/c-api/subscriberstatetable.cpp | Removed numeric_cast from getFd implementation |
| common/c-api/zmqconsumerstatetable.h | Changed getFd signature to return int32_t instead of uint32_t |
| common/c-api/zmqconsumerstatetable.cpp | Removed numeric_cast from getFd implementation |
| crates/swss-common/src/types/consumerstatetable.rs | Removed Duration import, unwrap() call, and updated Drop to use eprintln! |
| crates/swss-common/src/types/subscriberstatetable.rs | Removed Duration import, unwrap() call, and updated Drop to use eprintln! |
| crates/swss-common/src/types/zmqconsumerstatetable.rs | Removed Duration import, unwrap() call, updated Drop to use eprintln!, and improved comment capitalization |
| crates/swss-common/src/types/table.rs | Updated Drop implementation to use eprintln! instead of expect() |
| crates/swss-common/src/types/producerstatetable.rs | Updated Drop implementation to use eprintln! instead of expect() |
| crates/swss-common/src/types/zmqclient.rs | Updated Drop implementation to use eprintln! instead of expect() |
| crates/swss-common/src/types/dbconnector.rs | Removed Clone trait implementation and updated Drop to use eprintln! |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
|
||
| pub fn read_data(&self, timeout: Duration, interrupt_on_signal: bool) -> Result<SelectResult> { | ||
| let timeout_ms = timeout.as_millis().try_into().unwrap(); | ||
| pub fn read_data(&self, timeout_ms: u32, interrupt_on_signal: bool) -> Result<SelectResult> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason that Duration is used is to signify to the caller that this argument is a time argument, and so something time-based should be passed in. C language not having a discrete time type shouldn't be a reason not to use it in Rust code. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would instead recommend handling a duration that is too large gracefully: return an error saying the duration was too big (or similar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved. and also change return type.
| unsafe { | ||
| if let Err(e) = swss_try!(SWSSSubscriberStateTable_free(self.ptr)) { | ||
| eprintln!("Error dropping SubscriberStateTable: {}", e); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean for a *_free function to fail? The object got corrupted? I'm not sure if just printing on stderr and continuing the application is safe at that point. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be possible that a programmer mistake of throwing in dtor in this class or any base class or any member class. Crashing a critical daemon is not good for online service, so use eprintln as defensive programming programming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running after an exception is thrown in dtor is not good either; the runtime state of the program is no longer known or consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to fix exception is thrown in dtor issue in another future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The future PR is #1115
| unsafe { | ||
| let fd = swss_try!(p_fd => SWSSConsumerStateTable_getFd(self.ptr, p_fd))?; | ||
| let fd = BorrowedFd::borrow_raw(fd.try_into().unwrap()); | ||
| let fd = BorrowedFd::borrow_raw(fd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
borrow_raw document says "The resource pointed to by fd must remain open for the duration of the returned BorrowedFd, and it must not have the value -1.". We should check the value is not negative. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checked for -1, not for negative. Since BorrowedFd and hiredis cares only about -1.
|
|
||
| pub fn read_data(&self, timeout: Duration, interrupt_on_signal: bool) -> Result<SelectResult> { | ||
| let timeout_ms = timeout.as_millis().try_into().unwrap(); | ||
| pub fn read_data(&self, timeout_ms: u32, interrupt_on_signal: bool) -> Result<SelectResult> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of changing the argument to u32, you can timeout.as_millis().try_into().map_err(|_| Error::new("Timeout value too large to convert"))? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved. and also change return type.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
crates/swss-common/tests/sync.rs
Outdated
| let redis = Redis::start(); | ||
| let pst = ProducerStateTable::new(redis.db_connector(), "table_a")?; | ||
| let cst = ConsumerStateTable::new(redis.db_connector(), "table_a", None, None)?; | ||
| let pst = ProducerStateTable::new(redis.db_connector(), "table_a").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why unwrap() on test code? The old code will already return an error, and the test will be gracefully marked as failed. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The challenge is that read_data, and other function? will return different Result. So it is easy to unwrap and let testcase fail. The failure message is also kind of graceful
thread 'consumer_producer_state_tables_sync_api_basic_test' panicked at crates/swss-common/tests/sync.rs:76:12:
called `Option::unwrap()` on a `None` value
test result: FAILED. 7 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 2.43s
error: test failed, to rerun pass `-p swss-common --test sync`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the reason to change return type of read_data? Is it really necessary. It seems a lot of churn due to that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is that we want to use Duration instead of u32, and the casting error handling introduce new Result type. Similar change also happens on get_fd().
pub fn read_data(&self, timeout: Duration, interrupt_on_signal: bool) -> std::io::Result<SelectResult>
Another option is to keep both function as thin wrapper of c-api and keep the most native data types, then we can keep the Result type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep the Result type, which is
| pub type Result<T, E = Exception> = std::result::Result<T, E>; |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@qiluo-msft , It seems this PR breaks official build. |
This reverts commit d6ccff7.
I think this is because sonic-dash-ha and all rust components are pulling swss-common master. See https://github.com/sonic-net/sonic-dash-ha/blob/b38d8fb68c7e1a1d3c866fdb65e840c7da1021d8/Cargo.toml#L100. sonic-dash-ha has been updated to accommodate the change in swss-common. We need to update its ref in sonic-buildimage. |
I saw the error in sonic-buildimage. The issue is that libswsscommon is built from src/sonic-swss-common mounted by sonic-buildimage but rust is pulled directly from swss-common master branch. libswsscommon lags behind rust swss-common crate and causes the compilation error. We need to update ref of swss-common in sonic-buildimage to the latest. Going forward, probably we should change the way rust components pulling swss-common. Instead of always from the latest of the branch, they can use relative path to find swss-common in sonic-buildimage. When we cut a new branch, we don't need to update cargo.toml to the new branch (otherwise, they always use swss-common master branch). |
|
@qiluo-msft , I submitted PR sonic-net/sonic-buildimage#24663 to fix the issue. And let's discuss the long term fix I proposed above. |
This reverts commit d6ccff7.
unwrap() function may crash in runtime.
Drop should not crash.
Clone should not crash, so remove it.
Protect casting Duration
Depends on sonic-net/sonic-dash-ha#132