Skip to content

ENH: GPU idle_time and assorted GPU fixes#148

Merged
lithomas1 merged 3 commits intohpcgroup:developfrom
lithomas1:gpu-idle-time
May 2, 2025
Merged

ENH: GPU idle_time and assorted GPU fixes#148
lithomas1 merged 3 commits intohpcgroup:developfrom
lithomas1:gpu-idle-time

Conversation

@lithomas1
Copy link
Collaborator

  • Account for idle time on GPU (calculated by taking total time and subtracting time in events)
  • Add parent/child relationships for annotations (restores hierarchical nature of NVTX ranges)
  • Refactor exclusive metrics calculation to account for overlap between parent/child event (since child event can be on a different device and be launched after parent event finishes)

@jhdavis8 jhdavis8 requested review from Copilot and jhdavis8 May 2, 2025 17:26
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances GPU trace analysis by accounting for GPU idle time and fixes issues related to GPU event annotations and exclusive metric calculations.

  • Refactors the caller/callee matching logic using groupby and a helper function.
  • Updates exclusive metric calculations to correctly handle overlapping GPU event timings.
  • Revises GPU idle time computation using groupby to streamline the calculation.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
pipit/trace.py Refactored _match_caller_callee logic, updated exclusive metric calculation and idle time computation for improved handling of GPU overlaps and asynchronous execution.
pipit/readers/nsight_sqlite_reader.py Updated DataFrame conversion and sorting, and refined parallelism level assignments based on trace types.

Copy link

@jhdavis8 jhdavis8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @lithomas1 !

@lithomas1 lithomas1 merged commit 97beb97 into hpcgroup:develop May 2, 2025
8 checks passed
@lithomas1 lithomas1 deleted the gpu-idle-time branch May 2, 2025 18:48
@lithomas1
Copy link
Collaborator Author

Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants