Skip to content
Discussion options

You must be logged in to vote

Very good question!

First, libcu++ atomics currently rely on implementation details (EDIT: CUDA 13.0 PTX Atomic ABI docs enables any SW to make use of this) which, in currently supported platforms, enable libcu++ to lower:

  • sequentially-consistent stores to fence.sc; st.relaxed; instead of fence.sc; st.release;.
  • sequentially-consistent rmws to fence.sc; atom.acquire; instead of fence.sc; atom.acq_rel;.
    libc++ is closely tied to the implementation (CUDA Toolkit, compiler, driver, hw) and if the above changes, we'll update it accordingly.

Second, you are totally right that the current expansion is not correct according to the model published in the ASPLOS ’19 paper, or the PTX Atomics ABI

Replies: 1 comment 5 replies

Comment options

You must be logged in to vote
5 replies
@admbbs
Comment options

@gonzalobg
Comment options

@admbbs
Comment options

@admbbs
Comment options

@gonzalobg
Comment options

Answer selected by admbbs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants