Skip to content

Commit a272f74

Browse files
stanleytsang-amdumfranzwprbasyal-amd
authored
[rocPRIM][hipCUB][rocThrust][rocRAND] CP Versioning and changelog updates for 7.2 release (#3331)
## Motivation Changelog entries, internal version numbers, and rocPRIM/rocRAND dependency release branches need to be updated for the 7.2 release. ## Technical Details Updates the items mentioned above. Note that hipRAND has not noteworthy changes for 7.2. ## Test Plan Run a build, make sure there are no cmake errors. View the changelogs to make sure there are no formatting errors. ## Test Result No build issues. ## Submission Checklist --------- Co-authored-by: Wayne Franz <[email protected]> Co-authored-by: Pratik Basyal <[email protected]>
1 parent 32ed0c0 commit a272f74

2 files changed

Lines changed: 31 additions & 9 deletions

File tree

CHANGELOG.md

Lines changed: 30 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,35 @@
22

33
Full documentation for rocPRIM is available at [https://rocm.docs.amd.com/projects/rocPRIM/en/latest/](https://rocm.docs.amd.com/projects/rocPRIM/en/latest/).
44

5+
## rocPRIM 4.2.0 for ROCm 7.2
6+
7+
### Added
8+
9+
* Added missing benchmarks, such that every autotuned specialization is now benchmarked.
10+
* Added a new cmake option, `BENCHMARK_USE_AMDSMI`. It is set to `OFF` by default. When this option is set to `ON`, it lets benchmarks use AMD SMI to output more GPU statistics.
11+
* Added the first tested example program for `device_search`, which is linked in the documentation.
12+
* Added `apply_config_improvements.py`, which generates improved configs by taking the best specializations from old and new configs.
13+
* Run the script with `--help` for usage instructions, and see `projects/rocprim/docs/concepts/tuning.rst` for documentation.
14+
* Kernel Tuner proof-of-concept.
15+
* Enhanced SPIR-V support and performance.
16+
17+
### Optimizations
18+
19+
* Improved performance of `device_radix_sort` onesweep variant
20+
21+
### Resolved issues
22+
23+
* Fixed the issue where `rocprim::device_scan_by_key` failed when performing an "in-place" inclusive scan by reusing "keys" as output, by adding a buffer to store the last keys of each block (excluding the last block). This fix only affects the specific case of reusing "keys" as output in an inclusive scan, and does not affect other cases.
24+
* Fixed benchmark build error on Windows.
25+
* Fixed offload compress build option.
26+
* Fixed `float_bit_mask` for `rocprim::half`.
27+
* Fixed handling of undefined behaviour when `__builtin_clz`, `__builtin_ctz`, and similar builtins are called.
28+
* Fixed potential build error with `rocprim::detail::histogram_impl`.
29+
30+
### Known issues
31+
32+
* Potential hang with `rocprim::partition_threeway` with large input data sizes on later ROCm builds. A workaround is currently in place.
33+
534
## rocPRIM 4.1.0 for ROCm 7.1
635

736
### Added
@@ -12,11 +41,7 @@ Full documentation for rocPRIM is available at [https://rocm.docs.amd.com/projec
1241
* Added a new cmake option, `BUILD_OFFLOAD_COMPRESS`. When rocPRIM is build with this option enabled, the `--offload-compress` switch is passed to the compiler. This causes the compiler to compress the binary that it generates. Compression can be useful in cases where you are compiling for a large number of targets, since this often results in a large binary. Without compression, in some cases, the generated binary may become so large symbols are placed out of range, resulting in linking errors. The new `BUILD_OFFLOAD_COMPRESS` option is set to `ON` by default.
1342
* Added a new CMake option `-DUSE_SYSTEM_LIB` to allow tests to be built from `ROCm` libraries provided by the system.
1443
* Added `rocprim::apply` which applies a function to a `rocprim::tuple`.
15-
* Added a new cmake option, `BENCHMARK_USE_AMDSMI`. It is set to `OFF` by default. When this option is set to `ON`, it lets benchmarks use AMD SMI to output more GPU statistics.
16-
* Added missing benchmarks, such that every autotuned specialization is now benchmarked.
17-
* Added `apply_config_improvements.py`, which generates improved configs by taking the best specializations from old and new configs.
18-
* Run the script with `--help` for usage instructions, and see `projects/rocprim/docs/concepts/tuning.rst` for documentation.
19-
* Added the first tested example program for `device_search`, which is linked in the documentation.
44+
2045

2146
### Changed
2247

@@ -37,7 +62,6 @@ Full documentation for rocPRIM is available at [https://rocm.docs.amd.com/projec
3762
* Fixed `device_select`, `device_merge`, and `device_merge_sort` not allocating the correct amount of virtual shared memory on the host.
3863
* Fixed the `->` operator for the `transform_iterator`, the `texture_cache_iterator` and the `arg_index_iterator`, by now returning a proxy pointer.
3964
* The `arg_index_iterator` also now only returns the internal iterator for the `->`.
40-
* Fixed the issue where `rocprim::device_scan_by_key` failed when performing an "in-place" inclusive scan by reusing "keys" as output, by adding a buffer to store the last keys of each block (excluding the last block). This fix only affects the specific case of reusing "keys" as output in an inclusive scan, and does not affect other cases.
4165

4266
## rocPRIM 4.0.1 for ROCm 7.0.2
4367

@@ -687,5 +711,3 @@ when the input or inital type was smaller than the output type.
687711

688712
* Switched to HIP-Clang as the default compiler
689713
* CMake searches for rocPRIM locally first; if t's not found, CMake downloads it from GitHub
690-
691-

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,7 @@ list(APPEND CMAKE_PREFIX_PATH ${ROCM_PATH} ${ROCM_PATH}/hip ${ROCM_PATH}/llvm ${
184184
find_package(hip REQUIRED CONFIG PATHS ${HIP_DIR} ${ROCM_PATH} /opt/rocm)
185185

186186
# Setup VERSION
187-
set(VERSION_STRING "4.1.0")
187+
set(VERSION_STRING "4.2.0")
188188
rocm_setup_version(VERSION ${VERSION_STRING})
189189
math(EXPR rocprim_VERSION_NUMBER "${rocprim_VERSION_MAJOR} * 100000 + ${rocprim_VERSION_MINOR} * 100 + ${rocprim_VERSION_PATCH}")
190190

0 commit comments

Comments
 (0)