GPU assembly support on AMD and CUDA by multitalentloes · Pull Request #7042 · OPM/opm-simulators

multitalentloes · 2026-05-07T11:27:57Z

The GPU assembly support extracted from #7018
Still contains more code than necessary from that branch to support the assembly, but the goal is to reduce this PR to be as small as possible yet support the new GPU linearization.

Copilot

Pull request overview

This PR extracts and wires up GPU matrix assembly/linearization support for Flow (CUDA + HIP/AMD), building on prior GPU POC work, and introduces GPU-oriented model/problem shims plus supporting GPU ISTL utilities.

Changes:

Add GPU TPFA linearization/assembly path with supporting GPU data transfers (intensive quantities, boundary info, residual/Jacobian buffers).
Introduce simplified GPU-friendly model/problem wrappers used by the GPU assembly kernels.
Extend GPU ISTL utilities (typed diagonal block pointers; GpuView access changes) and update build system to compile a flow_gpu variant.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
tests/gpuistl/test_GpuSparseTable.cu	Extends kernel test to validate `SparseTable::dataSize()` usage on device.
opm/simulators/linalg/gpuistl/GpuView.hpp	Adjusts `operator[]` to return `const T&`; adds an extra include.
opm/simulators/linalg/gpuistl/detail/gpusparse_matrix_operations.hpp	Declares typed diagonal block pointer extraction API.
opm/simulators/linalg/gpuistl/detail/gpusparse_matrix_operations.cu	Implements `getDiagPtrsTyped()` and instantiates for MiniMatrix block sizes.
opm/simulators/flow/Transmissibility.hpp	Adds accessor for thermal boundary half-transmissibility map.
opm/simulators/flow/Transmissibility_impl.hpp	Implements new transmissibility map accessor.
opm/simulators/flow/SimplifiedGpuBlackOilModel.hpp	Adds simplified FI black-oil model wrapper and GPU copy/view helpers.
opm/simulators/flow/SimplifiedFlowProblemGPU.hpp	Adds simplified GPU problem wrapper for boundary thermal transmissibility (`alpha`) + module params.
opm/simulators/flow/NewTranFluxModule.hpp	Generalizes pressure-diff calculation template to accept alternative module params type.
opm/simulators/flow/FlowProblem.hpp	Exposes `thermalLawManager()` accessor.
opm/simulators/aquifers/BlackoilAquiferModel.hpp	Disables SupportsFaceTag include (commented).
opm/simulators/aquifers/BlackoilAquiferModel_impl.hpp	Disables grid SupportsFaceTag `static_assert` (commented).
opm/models/discretization/common/tpfalinearizer.hh	Major: adds GPU assembly path (domain/neighbor/boundary transfers, kernels, GPU Jacobian/residual handling).
opm/models/discretization/common/fvbaseproperties.hh	Introduces `to_gpu_type(_t)` mapping and a `GpuFIBlackOilModel` property hook.
opm/models/discretization/common/fvbasediscretization.hh	Adds helpers to fetch all cached IQs (timeIdx 0/1) and a `numDof()` accessor.
opm/models/blackoil/blackoillocalresidualtpfa.hh	Refactors boundary flux handling for GPU/non-static fluid system; adds GPU-related helpers.
opm/models/blackoil/blackoilintensivequantities.hh	Adjusts assignment for host/device; adds GPU-switch constructors; changes `withOtherFluidSystem()` to accept a pointer.
opm/models/blackoil/blackoilextbomodules.hh	Replaces a raw `throw` with `OPM_THROW`.
opm/models/blackoil/blackoildiffusionmodule.hh	Adds accessors for diffusion/tortuosity arrays; marks update as host/device.
opm/models/blackoil/blackoilconvectivemixingmodule.hh	Minor fluid-system index usage adjustments; adds a fluidsystem include.
flow/flow_gpu.hpp	Adds GPU flow typetag mapping and declarations shared between CUDA/HIP entrypoints.
flow/flow_gpu.hip	Adds HIP entrypoint implementation for GPU flow variant.
flow/flow_gpu.cu	Adds CUDA entrypoint implementation for GPU flow variant.
flow/flow_gpu_main.cpp	Adds standalone main for `flow_gpu` binary.
CMakeLists.txt	Adds `gpu` Flow model variant and selects `.cu`/`.hip` source accordingly.
CMakeLists_files.cmake	Adds per-file CUDA flags for new GPU sources and installs new public headers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+#include <iostream>
+


+#include <opm/common/utility/gpuistl_if_available.hpp>
+#include <opm/common/utility/VectorWithDefaultAllocator.hpp>
+
+namespace Opm {
+
+template<class Scalar, template <class> class Storage = Opm::VectorWithDefaultAllocator>
+class SimplifiedFlowProblemGPU
+{
+public:
+    using ModuleParams = BlackoilModuleParams<ConvectiveMixingModuleParam<Scalar, Storage>>;
+
+    SimplifiedFlowProblemGPU() = default;


+        } else if (boundaryFaceIndex == 2) {
+            return alpha2_[globalIndex];
+        } else {
+            OPM_THROW(std::logic_error, "Invalid boundary face index: " + std::to_string(boundaryFaceIndex));


+#include <opm/models/blackoil/blackoilmodel.hh>
+#include <opm/models/utils/propertysystem.hh>
+
+#include <opm/common/ErrorMacros.hpp>
+#include <opm/models/utils/propertysystem.hh>
+
+#include <opm/grid/CpGrid.hpp>
+#include <opm/grid/utility/ElementChunks.hpp>
+
+#include <opm/models/blackoil/blackoilconvectivemixingmodule.hh>
+#include <opm/models/blackoil/blackoilmoduleparams.hh>
+#include <opm/models/parallel/threadmanager.hpp>
+#include <opm/simulators/utils/DeferredLoggingErrorHelpers.hpp>
+
+#include <opm/material/fluidmatrixinteractions/EclMultiplexerMaterialParams.hpp>
+
+#include <cstddef>
+#include <stdexcept>
+#include <type_traits>
+
+#ifdef _OPENMP
+#include <omp.h>
+#endif


 #include <opm/simulators/aquifers/AquiferFetkovich.hpp>
 #include <opm/simulators/aquifers/AquiferNumerical.hpp>
-#include <opm/simulators/aquifers/SupportsFaceTag.hpp>
+// #include <opm/simulators/aquifers/SupportsFaceTag.hpp>


    // Grid needs to support Facetag
    using Grid = std::remove_const_t<std::remove_reference_t<decltype(simulator.vanguard().grid())>>;
-    static_assert(SupportsFaceTag<Grid>::value, "Grid has to support assumptions about face tag.");
+    // static_assert(SupportsFaceTag<Grid>::value, "Grid has to support assumptions about face tag.");


+    // TODO: make this more efficient (avoid copy!) and still valid for gpu
+    // TODO: did not want to delve into the special IntQuants vector with special allocators
+    std::vector<IntensiveQuantities> allIntensiveQuantities0()
+    {
+        std::vector<IntensiveQuantities> allIntensiveQuantities;
+        auto& timeZeroIntQuants = intensiveQuantityCache_[0];
+        allIntensiveQuantities.insert(allIntensiveQuantities.end(),
+                                        timeZeroIntQuants.begin(),
+                                        timeZeroIntQuants.end());
+
+
+        return allIntensiveQuantities;
+    }
+
+    std::vector<IntensiveQuantities> allIntensiveQuantities1()
+    {
+        std::vector<IntensiveQuantities> allIntensiveQuantities;
+        auto& timeOneIntQuants = intensiveQuantityCache_[1];
+        assert(!timeOneIntQuants.empty());
+        allIntensiveQuantities.insert(allIntensiveQuantities.end(),
+                timeOneIntQuants.begin(),
+                timeOneIntQuants.end());
+        return allIntensiveQuantities;


Now the member types MatrixBlock, VectorBlock and ADVectorBlock may be GPU-adapted variants, so no need to pass explicit types to kernels etc.

multitalentloes added the manual:irrelevant This PR is a minor fix and should not appear in the manual label May 7, 2026

multitalentloes mentioned this pull request May 7, 2026

GPU assembly support on AMD and CUDA OPM/opm-common#5145

Merged

multitalentloes force-pushed the add_gpu_thermalgaswaterassembly2 branch 2 times, most recently from d3c8416 to 60c8298 Compare May 8, 2026 07:59

atgeirr requested a review from Copilot May 11, 2026 08:13

Copilot started reviewing on behalf of atgeirr May 11, 2026 08:14 View session

Copilot AI reviewed May 11, 2026

View reviewed changes

kjetilly and others added 9 commits May 12, 2026 10:33

GPU assembly support on AMD and CUDA

197f438

clean up code

e342ac7

reduce diff

43ff545

revert variable name change

5a506b9

more cleaning...

37f2707

refactor argument helper structs

21fa50b

make getdiagptrs blocktype aware

3deb1ca

improve sparsetable copytogpu

ca2a67f

avoid compilation error on nvcc

c87cfc5

multitalentloes force-pushed the add_gpu_thermalgaswaterassembly2 branch from 9b6e313 to c87cfc5 Compare May 12, 2026 08:39

multitalentloes mentioned this pull request May 12, 2026

NVCC requires seeing DatumDepth classes OPM/opm-common#5150

Merged

Atgeirr Flø Rasmussen and others added 12 commits May 13, 2026 09:18

Make branch compile without GPU.

98d04f6

Squashed warnings.

a667a9d

Remove LocalIntensiveQuantities template param.

e7cd619

Remove unneded enableBioeffects function argument.

a7c27d3

Remove unneeded template parameters.

b5d7316

Now the member types MatrixBlock, VectorBlock and ADVectorBlock may be GPU-adapted variants, so no need to pass explicit types to kernels etc.

Auto-reformat some unreadable parts.

000ad00

now compiles

c5ff66b

improve variable and function names

badfceb

implement some copilot improvements

230190b

improve naming

da041d7

backup

f8778c6

improve template order

50835f3

multitalentloes added 2 commits May 13, 2026 16:19

make variable names more consistent

dc6818f

fix typo

40588db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU assembly support on AMD and CUDA#7042

GPU assembly support on AMD and CUDA#7042
multitalentloes wants to merge 23 commits into
OPM:masterfrom
multitalentloes:add_gpu_thermalgaswaterassembly2

multitalentloes commented May 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

multitalentloes commented May 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants