You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[rocPRIM][hipCUB][rocThrust][rocRAND] CP Versioning and changelog updates for 7.2 release (#3331)
## Motivation
Changelog entries, internal version numbers, and rocPRIM/rocRAND
dependency release branches need to be updated for the 7.2 release.
## Technical Details
Updates the items mentioned above. Note that hipRAND has not noteworthy
changes for 7.2.
## Test Plan
Run a build, make sure there are no cmake errors. View the changelogs to
make sure there are no formatting errors.
## Test Result
No build issues.
## Submission Checklist
---------
Co-authored-by: Wayne Franz <[email protected]>
Co-authored-by: Pratik Basyal <[email protected]>
Copy file name to clipboardExpand all lines: CHANGELOG.md
+30-8Lines changed: 30 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,35 @@
2
2
3
3
Full documentation for rocPRIM is available at [https://rocm.docs.amd.com/projects/rocPRIM/en/latest/](https://rocm.docs.amd.com/projects/rocPRIM/en/latest/).
4
4
5
+
## rocPRIM 4.2.0 for ROCm 7.2
6
+
7
+
### Added
8
+
9
+
* Added missing benchmarks, such that every autotuned specialization is now benchmarked.
10
+
* Added a new cmake option, `BENCHMARK_USE_AMDSMI`. It is set to `OFF` by default. When this option is set to `ON`, it lets benchmarks use AMD SMI to output more GPU statistics.
11
+
* Added the first tested example program for `device_search`, which is linked in the documentation.
12
+
* Added `apply_config_improvements.py`, which generates improved configs by taking the best specializations from old and new configs.
13
+
* Run the script with `--help` for usage instructions, and see `projects/rocprim/docs/concepts/tuning.rst` for documentation.
14
+
* Kernel Tuner proof-of-concept.
15
+
* Enhanced SPIR-V support and performance.
16
+
17
+
### Optimizations
18
+
19
+
* Improved performance of `device_radix_sort` onesweep variant
20
+
21
+
### Resolved issues
22
+
23
+
* Fixed the issue where `rocprim::device_scan_by_key` failed when performing an "in-place" inclusive scan by reusing "keys" as output, by adding a buffer to store the last keys of each block (excluding the last block). This fix only affects the specific case of reusing "keys" as output in an inclusive scan, and does not affect other cases.
24
+
* Fixed benchmark build error on Windows.
25
+
* Fixed offload compress build option.
26
+
* Fixed `float_bit_mask` for `rocprim::half`.
27
+
* Fixed handling of undefined behaviour when `__builtin_clz`, `__builtin_ctz`, and similar builtins are called.
28
+
* Fixed potential build error with `rocprim::detail::histogram_impl`.
29
+
30
+
### Known issues
31
+
32
+
* Potential hang with `rocprim::partition_threeway` with large input data sizes on later ROCm builds. A workaround is currently in place.
33
+
5
34
## rocPRIM 4.1.0 for ROCm 7.1
6
35
7
36
### Added
@@ -12,11 +41,7 @@ Full documentation for rocPRIM is available at [https://rocm.docs.amd.com/projec
12
41
* Added a new cmake option, `BUILD_OFFLOAD_COMPRESS`. When rocPRIM is build with this option enabled, the `--offload-compress` switch is passed to the compiler. This causes the compiler to compress the binary that it generates. Compression can be useful in cases where you are compiling for a large number of targets, since this often results in a large binary. Without compression, in some cases, the generated binary may become so large symbols are placed out of range, resulting in linking errors. The new `BUILD_OFFLOAD_COMPRESS` option is set to `ON` by default.
13
42
* Added a new CMake option `-DUSE_SYSTEM_LIB` to allow tests to be built from `ROCm` libraries provided by the system.
14
43
* Added `rocprim::apply` which applies a function to a `rocprim::tuple`.
15
-
* Added a new cmake option, `BENCHMARK_USE_AMDSMI`. It is set to `OFF` by default. When this option is set to `ON`, it lets benchmarks use AMD SMI to output more GPU statistics.
16
-
* Added missing benchmarks, such that every autotuned specialization is now benchmarked.
17
-
* Added `apply_config_improvements.py`, which generates improved configs by taking the best specializations from old and new configs.
18
-
* Run the script with `--help` for usage instructions, and see `projects/rocprim/docs/concepts/tuning.rst` for documentation.
19
-
* Added the first tested example program for `device_search`, which is linked in the documentation.
44
+
20
45
21
46
### Changed
22
47
@@ -37,7 +62,6 @@ Full documentation for rocPRIM is available at [https://rocm.docs.amd.com/projec
37
62
* Fixed `device_select`, `device_merge`, and `device_merge_sort` not allocating the correct amount of virtual shared memory on the host.
38
63
* Fixed the `->` operator for the `transform_iterator`, the `texture_cache_iterator` and the `arg_index_iterator`, by now returning a proxy pointer.
39
64
* The `arg_index_iterator` also now only returns the internal iterator for the `->`.
40
-
* Fixed the issue where `rocprim::device_scan_by_key` failed when performing an "in-place" inclusive scan by reusing "keys" as output, by adding a buffer to store the last keys of each block (excluding the last block). This fix only affects the specific case of reusing "keys" as output in an inclusive scan, and does not affect other cases.
41
65
42
66
## rocPRIM 4.0.1 for ROCm 7.0.2
43
67
@@ -687,5 +711,3 @@ when the input or inital type was smaller than the output type.
687
711
688
712
* Switched to HIP-Clang as the default compiler
689
713
* CMake searches for rocPRIM locally first; if t's not found, CMake downloads it from GitHub
0 commit comments