Skip to content

Conversation

cj401-amd
Copy link

This PR is based on the PR for xla 0.6.0 ROCm/xla#302 and upstream PR openxla/xla#29769

rocprofiler-sdk integration for improved profiling with rocprofiler_force_configure() and annotations, support both time-based and step-based profiling,

  • Keep roctracer(v1) for ROCm version < 6.3
  • rocprofiler-sdk and roctracer are selected at compile time based on ROCm version guard
  • Add unit tests for rocm_collector and rocm_tracer for v3 (ROCm version >= 6.3)
  • still need to figure out how to add more stats related to kernel, e.g., kernel size, occupancy, DMA copy, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant