You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: projects/clr/CHANGELOG.md
+37-10Lines changed: 37 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,16 +7,49 @@ Full documentation for HIP is available at [rocm.docs.amd.com](https://rocm.docs
7
7
### Added
8
8
9
9
* New HIP APIs
10
-
-`hipLibraryEnumerateKernels`Return Kernel handles within a library
11
-
-`hipKernelGetLibrary`Return Library handle for a hipKernel_t handle
12
-
-`hipKernelGetName`Return function name for a hipKernel_t handle
10
+
-`hipLibraryEnumerateKernels`returns kernel handles within a library
11
+
-`hipKernelGetLibrary`returns library handle for a hipKernel_t handle
12
+
-`hipKernelGetName`returns function name for a hipKernel_t handle
13
13
-`hipLibraryLoadData` creates library object from code
14
14
-`hipLibraryLoadFromFile` creates library object from file
15
15
-`hipLibraryUnload` unloads library
16
16
-`hipLibraryGetKernel` gets a kernel from library
17
17
-`hipLibraryGetKernelCount` gets kernel count in library
18
18
-`hipStreamCopyAttributes` copies attributes from source stream to destination stream
19
-
-`hipOccupancyAvailableDynamicSMemPerBlock` Returns dynamic shared memory available per block when launching numBlocks blocks on CU.
19
+
-`hipOccupancyAvailableDynamicSMemPerBlock` returns dynamic shared memory available per block when launching numBlocks blocks on CU.
20
+
* New HIP flags
21
+
-`hipMemLocationTypeHost` enables handling virtual memory management in host memory location, in addition to device memory.
22
+
- Support for flags in hipGetProcAddress, enables searching for the per-thread version symbols.
23
+
-`HIP_GET_PROC_ADDRESS_DEFAULT`
24
+
-`HIP_GET_PROC_ADDRESS_LEGACY_STREAM`
25
+
-`HIP_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM`
26
+
27
+
### Resolved issues
28
+
29
+
* Corrected the calculation of the value of maximum shared memory per multiprocessor, in HIP device properties.
30
+
31
+
### Optimized
32
+
33
+
* Graph node scaling:
34
+
HIP runtime implements optimized doorbell ring mechanism for certain topologies of graph execution. It enables efficient batching of graph nodes. This enhancement provides better alignment with CUDA Graph optimizations.
35
+
HIP also adds a new performance test for HIP graphs with programmable topologies to measure graph performance across different structures. The test evaluates graph instantiation time, first launch time, repeat launch times, and end-to-end execution for various graph topologies. The test implements comprehensive timing measurements including CPU overhead and device execution time.
36
+
* Back memory set (memset) optimization:
37
+
HIP runtime now implements a back memory set (memset) optimization to improve how memset nodes are processed during graph execution. This enhancement specifically handles varying number of AQL (Architected Queue Language) packets for memset graph node due to graph node set params for AQL batch submission approach.
38
+
* Async handler performance improvement:
39
+
HIP runtime has removed the lock contention in async handler enqueue path. This enhancement reduces runtime overhead and maximizes GPU throughput, for asynchronous kernel execution, especially in multi-threaded applications.
40
+
41
+
## HIP 7.1.1 for ROCm 7.1.1
42
+
43
+
### Added
44
+
45
+
* Support for the flag hipHostRegisterIoMemory in hipHostRegister, used to register I/O memory with HIP runtime so it can be accessed by the GPU.
46
+
47
+
### Resolved issues
48
+
49
+
* Incorrect Compute Unit (CU) mask in logging. HIP runtime now correctly sets the field width for the output print operation. When logging is enabled via the environment variable AMD_LOG_LEVEL, the runtime logs the accurate CU mask.
50
+
* A segmentation fault occurred when dynamic queue management mechanism was enabled. HIP runtime now ensures GPU queues aren't NULL during marker submission, preventing crashes and improving robustness.
51
+
* An error encountered on hip tear-down after device reset in certain applications due to accessing stale memory objects. HIP runtime now properly releases memory associated with host calls, ensuring reliable device resets.
52
+
* A race condition occurred in certain graph-related applications when pending asynchronous signal handlers referenced device memory that had already been released, leading to memory corruption. HIP runtime now uses a reference counting strategy to manage access to device objects in asynchronous event handlers, ensuring safe and reliable memory usage.
20
53
21
54
## HIP 7.1 for ROCm 7.1
22
55
@@ -48,12 +81,6 @@ Full documentation for HIP is available at [rocm.docs.amd.com](https://rocm.docs
48
81
-`hipLibraryUnload` Unload library
49
82
-`hipLibraryGetKernel` Get a kernel from library
50
83
-`hipLibraryGetKernelCount` Get kernel count in library
51
-
* Changed HIP APIs
52
-
-`hipMemAllocationType` now has hip exclusive enum hipMemAllocationTypeUncached
53
-
-`hipMemCreate` now checks for hipMemAllocationTypeUncached enum from
54
-
hipMemAllocationType and allocates uncached memory if so
55
-
-`hipHostRegister` now supports hipHostRegisterIoMemory flag
56
-
* Support for the flag `hipMemLocationTypeHost`, enables handling virtual memory management in host memory location, in addition to device memory.
57
84
* Support for nested tile partitioning within cooperative groups, matching NVIDIA CUDA functionality.
0 commit comments