-
Notifications
You must be signed in to change notification settings - Fork 43
Use string_view in address methods
#392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-0.44
Are you sure you want to change the base?
Conversation
When setting or getting the underlying string data used for an `Address` object, avoid extra copying by using `string_view`. During initialization a copy is performed into the `Address` object's local stroage, which makes sense. Though the input to the function need not copy to `string`. It can just be viewed while the data is copied into the object's local memory. Further when converting the `Address` to a string, a view can be used to avoid a copy during the return. If the caller would like to persist the string longer than the lifetime of the `Address` object, they can copy it at that point. Though we can save them a copy before then. Especially if they will use the string while the `Address` object is present.
|
Thanks for submitting this John! Unsurprisingly the tests failed: That's because the Cython API has not been updated as well. Should we make all the necessary changes here, including bumping Cython to 3.1 and leave the PR ready for the time Cython 3.1 is out? |
|
Ah sometimes the Cython API will let these things slide as it is usually just checking that something will work with the underlying C++ API (as opposed to enforcing it) We could certainly do that For now would it make sense to use |
|
|
||
| std::shared_ptr<Address> createAddressFromString(std::string addressString) | ||
| std::shared_ptr<Address> createAddressFromString(std::string_view addressString) | ||
| { | ||
| ucp_address_t* address = reinterpret_cast<ucp_address_t*>(new char[addressString.length()]); | ||
| size_t length = addressString.length(); | ||
| memcpy(address, addressString.c_str(), length); | ||
| ucp_address_t* address = reinterpret_cast<ucp_address_t*>(new char[length]); | ||
| memcpy(address, addressString.data(), length); | ||
| return std::shared_ptr<Address>(new Address(nullptr, address, length)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For additional context, this is used in one place in Cython where we already have something that could be std::string_view compliant (a Python bytes object). Currently in the construction we...
- Copy the
bytesobject into astd::string - Copy the
std::stringto thecreateAddressFromStringfunction call (noconst std::string¤tly) - Allocate and copy the
std::stringto achar[]
ucxx/python/ucxx/ucxx/_lib/libucxx.pyx
Lines 407 to 416 in 4cd6354
| @classmethod | |
| def create_from_buffer(cls, bytes buf) -> UCXAddress: | |
| cdef UCXAddress address = UCXAddress.__new__(UCXAddress) | |
| cdef string address_str = string(<const char*>buf, len(buf)) | |
| with nogil: | |
| address._address = createAddressFromString(address_str) | |
| address._handle = address._address.get().getHandle() | |
| address._length = address._address.get().getLength() | |
| address._string = address._address.get().getString() |
- Note there is a
getStringcall at the end doing one more copy
|
|
||
| std::string Address::getString() const | ||
| std::string_view Address::getString() const | ||
| { | ||
| return std::string{reinterpret_cast<char*>(_handle), _length}; | ||
| return std::string_view{reinterpret_cast<const char*>(_handle), _length}; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAK, this is horrible because now I have to remember to tie the lifetime of the returned object to the Address. And the docstring is now wrong.
I cannot believe this is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point, we can make it the user's responsibility to ensure the lifetime, this is already the case for the UCX handles. I don't have a strong objection either way, I'm a little more inclined towards Lawrence's point simply because that would avoid the potential to dereference an invalid pointer, whereas if attempting to use the address (that has become invalid because the worker is not available anymore) would simply result in an error from UCX/exception from UCXX.
@wence- @jakirkham do you have strong opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT there are only two uses of this method. Both involve the creation of a Cython UCXAddress object
- From
bytesas described above (currently 4 allocations & copies): Usestring_viewin address methods #392 (comment) - From the Cython
UCXWorkerdescribed below
In the Cython UCXWorker case, we take and store a copy of the std::shared_ptr<Address>. The Address in turn owns and keeps alive the ucp_address_t*/char* representation. Additionally we...
- Allocate and copy
ucp_address_t*tostd::string - Copy this
std::stringinto the CythonUCXAddress
ucxx/python/ucxx/ucxx/_lib/libucxx.pyx
Lines 395 to 405 in 4cd6354
| @classmethod | |
| def create_from_worker(cls, UCXWorker worker) -> UCXAddress: | |
| cdef UCXAddress address = UCXAddress.__new__(UCXAddress) | |
| with nogil: | |
| address._address = worker._worker.get().getAddress() | |
| address._handle = address._address.get().getHandle() | |
| address._length = address._address.get().getLength() | |
| address._string = address._address.get().getString() | |
| return address |
In both cases we are doing a fair bit of intermediate object creation and copying. The goal of this PR was to cutdown on that
Though would acknowledge that
- Many ways to achieve the same goal
- Some amount of creation and copying may be needed
That said, think it would be good if we could reduce the object creation and copying. Ideally deferring copying until the final object is constructed
Side note: std::string can be trivially constructed from a std::string_view
Yes, using |
When setting or getting the underlying string data used for an
Addressobject, avoid extra copying by usingstring_view. During initialization a copy is performed into theAddressobject's local stroage, which makes sense. Though the input to the function need not copy tostring. It can just be viewed while the data is copied into the object's local memory. Further when converting theAddressto a string, a view can be used to avoid a copy during the return. If the caller would like to persist the string longer than the lifetime of theAddressobject, they can copy it at that point. Though we can save them a copy before then. Especially if they will use the string while theAddressobject is present.