-
Notifications
You must be signed in to change notification settings - Fork 457
man/fi_av, prov/util, verbs, rxd, shm, sockets, fabtests/unit: Define out of range lookup behavior & add negative lookup test #11605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
DISCUSSION: Current behavior for this case is provider specific. Most use the util av code which does not check for this error and will segfault (if on non-debug build) in ofi_mem.h:ofi_bufpool_get_ibuf:501 when attempting to access a region in the table that is out of bounds. Providers like ucx specifically check for this case and return -FI_EINVAL if the requested fi_addr is out of bounds. The next common error case is supported where 4 addresses are inserted into an AV with max 32 entries, fi_addr 3 is removed, and then fi_addr 3 is looked up. This is handled and will return -FI_ADDR_NOTAVAIL. The out of bounds lookup case must either be defined as unsupported in the man page, or a common error such as -FI_EINVAL or -FI_ADDR_NOTAVAIL must be returned to let an application continue if the lookup "fails". I believe returning -FI_ADDR_NOTAVAIL makes the most sense for both of these error cases so the application can handle them both similarly. No decision will be made until @j-xiong returns from vacation but please post thoughts here. |
|
|
|
I don't think there needs to be a man page distinction between "out of bounds" and not a valid fiaddr. I think they should both return the same error that indicates the fiaddr is not valid. |
|
Looking at EFA code, it looks compliant for the proposal? it has the following check before accessing the av entry and the |
Great! I'll tag you when I open the PR that adds the negative test so you can run by hand |
|
Maybe I am wrong, I am not sure whether And I think that assertion will fail if the index is too large It seems UCX checked it as |
|
Maybe we should just fix ofi_bufpool_ibuf_is_valid? I do not see a strong reason we have to assert here, if @aingerson @j-xiong WDYT? |
Changing that function is part of the changeset of the PR that adds the negative test. Still fixing other edge cases before opening it. |
|
@shijin-aws Yeah I think it might be best to remove the assert from the is valid and let the caller assert it if needed depending on the use case |
|
I agree. Assertion is used for conditions that shouldn't happen. Here the value can come from user input. We should handle invalid input instead of making the assertion. |
|
bot:aws:retest |
|
@shijin-aws this PR isn't ready for ci to test yet. I will repush with the full changeset |
f7cd052 to
885101b
Compare
|
Please add |
e10250f to
c35006c
Compare
8893a55 to
74111d0
Compare
bdb31a2 to
56b19c9
Compare
Enforce checking AV[fi_addr] unset to return EINVAL. Signed-off-by: Zach Dworkin <[email protected]>
Enforce checking AV bounds before lookup. Enforce checking AV[fi_addr] unset to return EINVAL. Signed-off-by: Zach Dworkin <[email protected]>
Enforce checking AV bounds before lookup. Enforce checking AV[fi_addr] unset to return EINVAL. Signed-off-by: Zach Dworkin <[email protected]>
MAP has been deprecated so we don't need to test for it anymore. Signed-off-by: Zach Dworkin <[email protected]>
Premise: AV opened with num_good_addr + 1 Insert num_good_addr addresses so 0->num_good_addr-1 are taken Negative Tests: Case 1: Lookup addr of non-inserted location in opened range Lookup fi_addr num_good_addr Expected Result: -EINVAL Case 2: Lookup addr outside of av_open range Lookup fi_addr num_good_addr + 32 Expected Result: -EINVAL Case 3: Lookup fi_addr UINT_MAX Expected Result: -EINVAL Case 4: Remove an addr and then lookup its location Remove fi_addr 0 -> lookup fi_addr 0 Expected Result: -EINVAL Signed-off-by: Zach Dworkin <[email protected]>
Combine declarations, fix calloc, make failure strings better. Signed-off-by: Zach Dworkin <[email protected]>
Attempting fi_av_lookup with a fi_addr larger than the max entry number of the AV will result in undefined behavior. It is expected by an application to not request lookup of an out of bounds fi_addr.