Skip to content

Commit 66aa7be

Browse files
authored
SWDEV-566950 - update changelog for ROCM 7.2 (#2006)
1 parent c3108fe commit 66aa7be

File tree

1 file changed

+37
-10
lines changed

1 file changed

+37
-10
lines changed

projects/clr/CHANGELOG.md

Lines changed: 37 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,49 @@ Full documentation for HIP is available at [rocm.docs.amd.com](https://rocm.docs
77
### Added
88

99
* New HIP APIs
10-
- `hipLibraryEnumerateKernels` Return Kernel handles within a library
11-
- `hipKernelGetLibrary` Return Library handle for a hipKernel_t handle
12-
- `hipKernelGetName` Return function name for a hipKernel_t handle
10+
- `hipLibraryEnumerateKernels` returns kernel handles within a library
11+
- `hipKernelGetLibrary` returns library handle for a hipKernel_t handle
12+
- `hipKernelGetName` returns function name for a hipKernel_t handle
1313
- `hipLibraryLoadData` creates library object from code
1414
- `hipLibraryLoadFromFile` creates library object from file
1515
- `hipLibraryUnload` unloads library
1616
- `hipLibraryGetKernel` gets a kernel from library
1717
- `hipLibraryGetKernelCount` gets kernel count in library
1818
- `hipStreamCopyAttributes` copies attributes from source stream to destination stream
19-
- `hipOccupancyAvailableDynamicSMemPerBlock` Returns dynamic shared memory available per block when launching numBlocks blocks on CU.
19+
- `hipOccupancyAvailableDynamicSMemPerBlock` returns dynamic shared memory available per block when launching numBlocks blocks on CU.
20+
* New HIP flags
21+
- `hipMemLocationTypeHost` enables handling virtual memory management in host memory location, in addition to device memory.
22+
- Support for flags in hipGetProcAddress, enables searching for the per-thread version symbols.
23+
- `HIP_GET_PROC_ADDRESS_DEFAULT`
24+
- `HIP_GET_PROC_ADDRESS_LEGACY_STREAM`
25+
- `HIP_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM`
26+
27+
### Resolved issues
28+
29+
* Corrected the calculation of the value of maximum shared memory per multiprocessor, in HIP device properties.
30+
31+
### Optimized
32+
33+
* Graph node scaling:
34+
HIP runtime implements optimized doorbell ring mechanism for certain topologies of graph execution. It enables efficient batching of graph nodes. This enhancement provides better alignment with CUDA Graph optimizations.
35+
HIP also adds a new performance test for HIP graphs with programmable topologies to measure graph performance across different structures. The test evaluates graph instantiation time, first launch time, repeat launch times, and end-to-end execution for various graph topologies. The test implements comprehensive timing measurements including CPU overhead and device execution time.
36+
* Back memory set (memset) optimization:
37+
HIP runtime now implements a back memory set (memset) optimization to improve how memset nodes are processed during graph execution. This enhancement specifically handles varying number of AQL (Architected Queue Language) packets for memset graph node due to graph node set params for AQL batch submission approach.
38+
* Async handler performance improvement:
39+
HIP runtime has removed the lock contention in async handler enqueue path. This enhancement reduces runtime overhead and maximizes GPU throughput, for asynchronous kernel execution, especially in multi-threaded applications.
40+
41+
## HIP 7.1.1 for ROCm 7.1.1
42+
43+
### Added
44+
45+
* Support for the flag hipHostRegisterIoMemory in hipHostRegister, used to register I/O memory with HIP runtime so it can be accessed by the GPU.
46+
47+
### Resolved issues
48+
49+
* Incorrect Compute Unit (CU) mask in logging. HIP runtime now correctly sets the field width for the output print operation. When logging is enabled via the environment variable AMD_LOG_LEVEL, the runtime logs the accurate CU mask.
50+
* A segmentation fault occurred when dynamic queue management mechanism was enabled. HIP runtime now ensures GPU queues aren't NULL during marker submission, preventing crashes and improving robustness.
51+
* An error encountered on hip tear-down after device reset in certain applications due to accessing stale memory objects. HIP runtime now properly releases memory associated with host calls, ensuring reliable device resets.
52+
* A race condition occurred in certain graph-related applications when pending asynchronous signal handlers referenced device memory that had already been released, leading to memory corruption. HIP runtime now uses a reference counting strategy to manage access to device objects in asynchronous event handlers, ensuring safe and reliable memory usage.
2053

2154
## HIP 7.1 for ROCm 7.1
2255

@@ -48,12 +81,6 @@ Full documentation for HIP is available at [rocm.docs.amd.com](https://rocm.docs
4881
- `hipLibraryUnload` Unload library
4982
- `hipLibraryGetKernel` Get a kernel from library
5083
- `hipLibraryGetKernelCount` Get kernel count in library
51-
* Changed HIP APIs
52-
- `hipMemAllocationType` now has hip exclusive enum hipMemAllocationTypeUncached
53-
- `hipMemCreate` now checks for hipMemAllocationTypeUncached enum from
54-
hipMemAllocationType and allocates uncached memory if so
55-
- `hipHostRegister` now supports hipHostRegisterIoMemory flag
56-
* Support for the flag `hipMemLocationTypeHost`, enables handling virtual memory management in host memory location, in addition to device memory.
5784
* Support for nested tile partitioning within cooperative groups, matching NVIDIA CUDA functionality.
5885

5986
### Resolved issues

0 commit comments

Comments
 (0)