Documentation for rocSPARSE is available at https://rocm.docs.amd.com/projects/rocSPARSE/en/latest/.
- Adds
SpGEAMgeneric routine for computing sparse matrix addition in CSR format - Adds
v2_SpMVgeneric routine for computing sparse matrix vector multiplication. As opposed to the deprecatedrocsparse_spmvroutine, this routine does not use a fallback algorithm if a non-implemented configuration is encountered and will return an error in such a case. For the deprecated routinerocsparse_spmv, the user can enable warning messages in situations where a fallback algorithm is used by either calling upfront the routinerocsparse_enable_debugor exporting the variableROCSPARSE_DEBUG(with the shell commandexport ROCSPARSE_DEBUG=1). - Adds half float mixed precision to
rocsparse_axpbywhere X and Y use float16 and result and the compute type use float - Adds half float mixed precision to
rocsparse_spvvwhere X and Y use float16 and result and the compute type use float - Adds half float mixed precision to
rocsparse_spmvwhere A and X use float16 and Y and the compute type use float - Adds half float mixed precision to
rocsparse_spmmwhere A and B use float16 and C and the compute type use float - Adds half float mixed precision to
rocsparse_sddmmwhere A and B use float16 and C and the compute type use float - Adds half float uniform precision to
rocsparse_scatterandrocsparse_gatherroutines - Adds half float uniform precision to
rocsparse_sddmmroutine - Added
rocsparse_spmv_alg_csr_rowsplitalgorithm. - Added support for gfx950
- Add ROC-TX instrumentation support in rocSPARSE (not available on Windows or in the static library version on Linux).
- Added the
almalinuxOS name to correct the gfortran dependency
- Switch to defaulting to C++17 when building rocSPARSE from source. Previously rocSPARSE was using C++14 by default.
- Reduced the number of template instantiations in the library to further reduce the shared library binary size and improve compile times
- Allow SpGEMM routines to use more shared memory when available. This can speed up performance for matrices with a large number of intermediate products.
- Use of the
rocsparse_spmv_alg_csr_adaptiveorrocsparse_spmv_alg_csr_defaultalgorithms inrocsparse_spmvto perform transposed sparse matrix multiplication (C=alpha*A^T*x+beta*y) resulted in unnecessary analysis on A and needless slowdown during the analysis phase. This has been fixed by skipping the analysis when performing the transposed sparse matrix multiplication. - Improved the user documentation
- Fixed an issue in the public headers where
extern "C"was not wrapped by#ifdef __cplusplus, which caused failures when building C programs with rocSPARSE. - Fixed a memory access fault in the
rocsparse_Xbsrilu0routines. - Fixed failures that could occur in
rocsparse_Xbsrsm_solveorrocsparse_spsmwith BSR format when using host pointer mode. - Fixed ASAN compilation failures
- Fixed failure that occurred when using const descriptor
rocsparse_create_const_csr_descrwith the generic routinerocsparse_sparse_to_sparse. Issue was not observed when using non-const descriptorrocsparse_create_csr_descrwithrocsparse_sparse_to_sparse. - Fixed a memory leak in the rocsparse handle
- The deprecated
rocsparse_spmv_exroutine - The deprecated
rocsparse_sbsrmv_ex,rocsparse_dbsrmv_ex,rocsparse_cbsrmv_ex, androcsparse_zbsrmv_exroutines - The deprecated
rocsparse_sbsrmv_ex_analysis,rocsparse_dbsrmv_ex_analysis,rocsparse_cbsrmv_ex_analysis, androcsparse_zbsrmv_ex_analysisroutines
- Deprecated the
rocsparse_spmvroutine. Users should use therocsparse_v2_spmvroutine going forward. - Deprecated
rocsparse_spmv_alg_csr_streamalgorithm. Users should use therocsparse_spmv_alg_csr_rowsplitalgorithm going forward. - Deprecated the
rocsparse_itilu0_alg_sync_split_fusionalgorithm. Users should use one ofrocsparse_itilu0_alg_async_inplace,rocsparse_itilu0_alg_async_split, orrocsparse_itilu0_alg_sync_splitgoing forward.
- Added support for
rocsparse_matrix_type_triangularinrocsparse_spsv - Added test filters
smoke,regression, andextendedfor emulation tests. - Added
rocsparse_[s|d|c|z]csritilu0_compute_exroutines for iterative ILU - Added
rocsparse_[s|d|c|z]csritsv_solve_exroutines for iterative triangular solve - Added
GPU_TARGETSto replace the now deprecatedAMDGPU_TARGETSin cmake files - Added BSR format to the SpMM generic routine
rocsparse_spmm
- By default, build rocsparse shared library using
--offload-compresscompiler option which compresses the fat binary. This significantly reduces the shared library binary size.
- Improved the performance of
rocsparse_spmmwhen used with row order forBandCdense matrices and the row split algorithm,rocsparse_spmm_alg_csr_row_split. - Improved the adaptive CSR sparse matrix-vector multiplication algorithm when the sparse matrix has many empty rows at the beginning or at the end of the matrix. This improves the routines
rocsparse_spmvandrocsparse_spmv_exwhen the adaptive algorithmrocsparse_spmv_alg_csr_adaptiveis used. - Improved stream CSR sparse matrix-vector multiplication algorithm when the sparse matrix size (number of rows) decreases. This improves the routines
rocsparse_spmvandrocsparse_spmv_exwhen the stream algorithmrocsparse_spmv_alg_csr_streamis used. - Compared to
rocsparse_[s|d|c|z]csritilu0_compute, the routinesrocsparse_[s|d|c|z]csritilu0_compute_exintroduce a number of free iterations. A free iteration is an iteration that does not compute the evaluation of the stopping criteria, if enabled. This allows the user to tune the algorithm for performance improvements. - Compared to
rocsparse_[s|d|c|z]csritsv_solve, the routinesrocsparse_[s|d|c|z]csritsv_solve_exintroduce a number of free iterations. A free iteration is an iteration that does not compute the evaluation of the stopping criteria. This allows the user to tune the algorithm for performance improvements. - Improved user documentation
- Fixed an issue in
rocsparse_spgemm,rocsparse_[s|d|c|z]csrgemm, androcsparse_[s|d|c|z]bsrgemmwhere incorrect results could be produced when rocSPARSE was built with optimization levelO0. This was caused by a bug in the hash tables that could allow keys to be inserted twice. - Fixed an issue in the routine
rocsparse_spgemmwhen usingrocsparse_spgemm_stage_symbolicandrocsparse_spgemm_stage_numeric, where the routine would crash whenalphaandbetawere passed as host pointers and wherebeta != 0. - Fixed compilation error resulting from incorrectly using
reinterpret_castto cast away a const qualifier in therocsparse_complex_numconstructor. See #434 for more information.
- Deprecated
rocsparse_[s|d|c|z]csritilu0_computeroutines. Users should use the newly addedrocsparse_[s|d|c|z]csritilu0_compute_exroutines going forward. - Deprecated
rocsparse_[s|d|c|z]csritsv_solveroutines. Users should use the newly addedrocsparse_[s|d|c|z]csritsv_solve_exroutines going forward. - Deprecated
AMDGPU_TARGETSusing in cmake files. Users should useGPU_TARGETSgoing forward.
- Under certain conditions, rocSPARSE might fail to compile with ASAN on Ubuntu 22.04.
- Added the
azurelinuxOS name to correct the gfortran dependency - Add
rocsparse_create_extract_descr,rocsparse_destroy_extract_descr,rocsparse_extract_buffer_size,rocsparse_extract_nnz, androcsparse_extractAPIs to allow extraction of the upper or lower part of sparse CSR or CSC matrices. - Support for the gfx1151, gfx1200, and gfx1201 architectures.
- Change the default compiler from hipcc to amdclang in install script and cmake files.
- Change address sanitizer build targets so that only gfx908:xnack+, gfx90a:xnack+, gfx940:xnack+, gfx941:xnack+, and gfx942:xnack+ are built when
BUILD_ADDRESS_SANITIZER=ONis configured.
- Improved user documentation
- Fixed the
csrmmmerge path algorithm so that diagonal is clamped to the correct range. - Fixed a race condition in
bsrgemmthat could on rare occasions cause incorrect results. - Fixed an issue in
hyb2csrwhere the CSR row pointer array was not being properly filled whenn=0,coo_nnz=0, orell_nnz=0. - Fixed scaling in
rocsparse_Xhybmvwhen only performingy=beta*y, for example, wherealpha==0iny=alpha*Ax+beta*y. - Fixed
rocsparse_Xgemmifailures when the y grid dimension is too large. This occured when n >= 65536.
- New Merge-Path algorithm to SpMM, supporting CSR format
- SpSM now supports row order
- rocsparseio I/O functionality has been added to the library
rocsparse_set_identity_permutationhas been added
- Adjusted rocSPARSE dependencies to related HIP packages
- Binary size has been reduced
- A namespace has been wrapped around internal rocSPARSE functions and kernels
rocsparse_csr_set_pointers,rocsparse_csc_set_pointers, androcsparse_bsr_set_pointersdo now allow the column indices and values arrays to be nullptr ifnnzis 0- gfx803 target has been removed from address sanitizer builds
- Improved user manual
- Improved contribution guidelines
- SpMV adaptive and LRB algorithms have been further optimized on CSR format
- Improved performance of SpMV adaptive with symmetrically stored matrices on CSR format
- Compilation errors with
BUILD_ROCSPARSE_ILP64=ONhave been resolved
- New LRB algorithm to SpMV, supporting CSR format
- rocBLAS as now an optional dependency for SDDMM algorithms
- Additional verbose output for
csrgemmandbsrgemm - CMake support for documentation
- Triangular solve with multiple rhs (SpSM, csrsm, ...) now calls SpSV, csrsv, etcetera when nrhs equals 1
- Improved user manual section Installation and Building for Linux and Windows
rocsparse_inverse_permutation- Mixed-precisions for SpVV
- Uniform int8 precision for gather and scatter
- Added new
rocsparse_spmvroutine - Added new
rocsparse_xbsrmvroutines - When using host pointer mode, you must now call
hipStreamSynchronizefollowingdoti,dotci,spvv, andcsr2ell
dotiroutine- Improved spin-looping algorithms
- Improved documentation
- Improved verbose output during argument checking on API function calls
rocsparse_spmv_exrocsparse_xbsrmv_ex
- Auto stages from
spmv,spmm,spgemm,spsv,spsm, andspitsv - Formerly deprecated
rocsparse_spmvroutines - Formerly deprecated
rocsparse_xbsrmvroutines - Formerly deprecated
rocsparse_spmm_exroutine
- Bug in
rocsparse-benchwhere the SpMV algorithm was not taken into account in CSR format - BSR and GEBSR routines (
bsrmv,bsrsv,bsrmm,bsrgeam,gebsrmv,gebsrmm) didn't always showblock_dim==0as an invalid size - Passing
nnz = 0todotiordotciwasn't always returning a dot product of 0 gpsvminimum size is nowm >= 3
- More mixed-precisions for SpMV, (
matrix: float,vectors: double,calculation: double) and (matrix: rocsparse_float_complex,vectors: rocsparse_double_complex,calculation: rocsparse_double_complex) - Support for gfx940, gfx941, and gfx942
- Bug in
csrsmandbsrsm
- In
csritlu0, the algorithmrocsparse_itilu0_alg_sync_split_fusionhas some accuracy issues when XNACK is enabled (you can userocsparse_itilu0_alg_sync_splitas an alternative)
- Memory leak in
csritsv - Bug in
csrsmandbsrsm
bsrgemmandspgemmfor BSR formatbsrgeam- Build support for Navi32
- Experimental hipGraph support for some rocSPARSE routines
csritsv,spitsvcsr iterative triangular solve- Mixed-precisions for SpMV
- Batched SpMM for transpose A in COO format with atomic algorithm
csr2bsrcsr2csr_compresscsr2coogebsr2csrcsr2gebsr
- Documentation
- Bug in COO SpMV grid size
- Bug in SpMM grid size when using very large matrices
- In
csritlu0, the algorithmrocsparse_itilu0_alg_sync_split_fusionhas some accuracy issues when XNACK is enabled (you can userocsparse_itilu0_alg_sync_splitas an alternative)
rocsparse_spmv_exroutinerocsparse_bsrmv_ex_analysisandrocsparse_bsrmv_exroutinescsritilu0routine- Build support for Navi31 and Navi 33
- Segmented algorithm for COO SpMV by performing analysis
- Improved performance when generating random matrices
bsr2csrroutine
- Integer overflow bugs
- Bug in
ellmv
- Transpose A for SpMM COO format
- Matrix checker routines for verifying matrix data
- Atomic algorithm for COO SpMV
bsrpadroutine
- Bug in
csrilu0that could cause a deadlock - Bug where asynchronous
memcpywould use wrong stream - Potential size overflows
- Batched SpMM for CSR, CSC, and COO formats
- Packages for test and benchmark executables on all supported operating systems using CPack
- Clients file importers and exporters
- Clients code size reduction
- Clients error handling
- Clients benchmarking for performance tracking
- Test adjustments due to round-off errors
- Fixing API call compatibility with rocPRIM
gtsv_interleaved_batchgpsv_interleaved_batchSpGEMM_reuse- Allow copying of mat info struct
- Optimization for SDDMM
- Allow unsorted matrices in
csrgemmmultipass algorithm
csrmv,coomv,ellmv, andhybmvfor (conjugate) transposed matricescsrmvfor symmetric matrices- Packages for test and benchmark executables on all supported operating systems using CPack
spmm_exhas been deprecated and will be removed in the next major release
- Optimization for
gtsv
- Triangular solve for multiple right-hand sides using BSR format
- SpMV for BSRX format
- SpMM in CSR format enhanced to work with transposed A
- Matrix coloring for CSR matrices
- Added batched tridiagonal solve (
gtsv_strided_batch) - SpMM for BLOCKED ELL format
- Generic routines for SpSV and SpSM
- Beta support for Windows 10
- Additional atomic-based algorithms for SpMM in COO format
- Extended version of SpMM
- Additional algorithm for SpMM in CSR format
- Added (conjugate) transpose support for CsrMV and SpMV (CSR) routines
- Packaging has been split into a runtime package (
rocsparse) and a development package (rocsparse-devel): The development package depends on the runtime package. When installing the runtime package, the package manager will suggest the installation of the development package to aid users transitioning from the previous version's combined package. This suggestion by package manager is for all supported operating systems (except CentOS 7) to aid in the transition. Thesuggestionfeature in the runtime package is introduced as a deprecated feature and will be removed in a future ROCm release.
- Bug with
gemvion Navi21 - Bug with adaptive CsrMV
- Optimization for pivot-based
gtsv
- (batched) Tridiagonal solver with and without pivoting
- Dense matrix sparse vector multiplication (gemvi)
- Support for gfx90a
- Sampled dense-dense matrix multiplication (SDDMM)
- client matrix download mechanism
- removed boost dependency in clients
- SpMM (CSR, COO)
- Code coverage analysis
- Install script
- Level 2/3 unit tests
rocsparse-benchno longer depends on boost
gebsrmmgebsrmvgebsrsvcoo2denseanddense2coo- Generic APIs, including
axpby,gather,scatter,rot,spvv,spmv,spgemm,sparsetodense,densetosparse - Support for mixed indexing types in matrix formats
- Changelog
csr2gebsrgebsr2gebscgebsr2gebsr- Treating filename as regular expression for YAML-based testing generation
- Documentation for
gebsr2csr
bsric0
- gfx1030 has been adjusted to the latest compiler
- Replace old XNACK 'off' compiler flag with new version
- Updated Debian package name
prune_csr2csr,prune_dense2csr_percentageandprune_csr2csr_percentageaddedbsrilu0 addedcsrilu0_numeric_boostfunctionality added
bsric0
- No changes for this ROCm release
- Fortran bindings
- CentOS 6 support
bsrmv
- Default compiler switched to HIP-Clang
csr2dense,csc2dense,csr2csr_compress,nnz_compress,bsr2csr,csr2bsr,bsrmv, andcsrgeam- Triangular solve for BSR format (
bsrsv) - Options for static build
- Examples
dense2csranddense2csc- Installation process