GPU assembly support on AMD and CUDA#7042
Open
multitalentloes wants to merge 23 commits into
Open
Conversation
d3c8416 to
60c8298
Compare
There was a problem hiding this comment.
Pull request overview
This PR extracts and wires up GPU matrix assembly/linearization support for Flow (CUDA + HIP/AMD), building on prior GPU POC work, and introduces GPU-oriented model/problem shims plus supporting GPU ISTL utilities.
Changes:
- Add GPU TPFA linearization/assembly path with supporting GPU data transfers (intensive quantities, boundary info, residual/Jacobian buffers).
- Introduce simplified GPU-friendly model/problem wrappers used by the GPU assembly kernels.
- Extend GPU ISTL utilities (typed diagonal block pointers; GpuView access changes) and update build system to compile a
flow_gpuvariant.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/gpuistl/test_GpuSparseTable.cu | Extends kernel test to validate SparseTable::dataSize() usage on device. |
| opm/simulators/linalg/gpuistl/GpuView.hpp | Adjusts operator[] to return const T&; adds an extra include. |
| opm/simulators/linalg/gpuistl/detail/gpusparse_matrix_operations.hpp | Declares typed diagonal block pointer extraction API. |
| opm/simulators/linalg/gpuistl/detail/gpusparse_matrix_operations.cu | Implements getDiagPtrsTyped() and instantiates for MiniMatrix block sizes. |
| opm/simulators/flow/Transmissibility.hpp | Adds accessor for thermal boundary half-transmissibility map. |
| opm/simulators/flow/Transmissibility_impl.hpp | Implements new transmissibility map accessor. |
| opm/simulators/flow/SimplifiedGpuBlackOilModel.hpp | Adds simplified FI black-oil model wrapper and GPU copy/view helpers. |
| opm/simulators/flow/SimplifiedFlowProblemGPU.hpp | Adds simplified GPU problem wrapper for boundary thermal transmissibility (alpha) + module params. |
| opm/simulators/flow/NewTranFluxModule.hpp | Generalizes pressure-diff calculation template to accept alternative module params type. |
| opm/simulators/flow/FlowProblem.hpp | Exposes thermalLawManager() accessor. |
| opm/simulators/aquifers/BlackoilAquiferModel.hpp | Disables SupportsFaceTag include (commented). |
| opm/simulators/aquifers/BlackoilAquiferModel_impl.hpp | Disables grid SupportsFaceTag static_assert (commented). |
| opm/models/discretization/common/tpfalinearizer.hh | Major: adds GPU assembly path (domain/neighbor/boundary transfers, kernels, GPU Jacobian/residual handling). |
| opm/models/discretization/common/fvbaseproperties.hh | Introduces to_gpu_type(_t) mapping and a GpuFIBlackOilModel property hook. |
| opm/models/discretization/common/fvbasediscretization.hh | Adds helpers to fetch all cached IQs (timeIdx 0/1) and a numDof() accessor. |
| opm/models/blackoil/blackoillocalresidualtpfa.hh | Refactors boundary flux handling for GPU/non-static fluid system; adds GPU-related helpers. |
| opm/models/blackoil/blackoilintensivequantities.hh | Adjusts assignment for host/device; adds GPU-switch constructors; changes withOtherFluidSystem() to accept a pointer. |
| opm/models/blackoil/blackoilextbomodules.hh | Replaces a raw throw with OPM_THROW. |
| opm/models/blackoil/blackoildiffusionmodule.hh | Adds accessors for diffusion/tortuosity arrays; marks update as host/device. |
| opm/models/blackoil/blackoilconvectivemixingmodule.hh | Minor fluid-system index usage adjustments; adds a fluidsystem include. |
| flow/flow_gpu.hpp | Adds GPU flow typetag mapping and declarations shared between CUDA/HIP entrypoints. |
| flow/flow_gpu.hip | Adds HIP entrypoint implementation for GPU flow variant. |
| flow/flow_gpu.cu | Adds CUDA entrypoint implementation for GPU flow variant. |
| flow/flow_gpu_main.cpp | Adds standalone main for flow_gpu binary. |
| CMakeLists.txt | Adds gpu Flow model variant and selects .cu/.hip source accordingly. |
| CMakeLists_files.cmake | Adds per-file CUDA flags for new GPU sources and installs new public headers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+34
to
+35
| #include <iostream> | ||
|
|
Comment on lines
+19
to
+30
| #include <opm/common/utility/gpuistl_if_available.hpp> | ||
| #include <opm/common/utility/VectorWithDefaultAllocator.hpp> | ||
|
|
||
| namespace Opm { | ||
|
|
||
| template<class Scalar, template <class> class Storage = Opm::VectorWithDefaultAllocator> | ||
| class SimplifiedFlowProblemGPU | ||
| { | ||
| public: | ||
| using ModuleParams = BlackoilModuleParams<ConvectiveMixingModuleParam<Scalar, Storage>>; | ||
|
|
||
| SimplifiedFlowProblemGPU() = default; |
| } else if (boundaryFaceIndex == 2) { | ||
| return alpha2_[globalIndex]; | ||
| } else { | ||
| OPM_THROW(std::logic_error, "Invalid boundary face index: " + std::to_string(boundaryFaceIndex)); |
Comment on lines
+31
to
+53
| #include <opm/models/blackoil/blackoilmodel.hh> | ||
| #include <opm/models/utils/propertysystem.hh> | ||
|
|
||
| #include <opm/common/ErrorMacros.hpp> | ||
| #include <opm/models/utils/propertysystem.hh> | ||
|
|
||
| #include <opm/grid/CpGrid.hpp> | ||
| #include <opm/grid/utility/ElementChunks.hpp> | ||
|
|
||
| #include <opm/models/blackoil/blackoilconvectivemixingmodule.hh> | ||
| #include <opm/models/blackoil/blackoilmoduleparams.hh> | ||
| #include <opm/models/parallel/threadmanager.hpp> | ||
| #include <opm/simulators/utils/DeferredLoggingErrorHelpers.hpp> | ||
|
|
||
| #include <opm/material/fluidmatrixinteractions/EclMultiplexerMaterialParams.hpp> | ||
|
|
||
| #include <cstddef> | ||
| #include <stdexcept> | ||
| #include <type_traits> | ||
|
|
||
| #ifdef _OPENMP | ||
| #include <omp.h> | ||
| #endif |
| #include <opm/simulators/aquifers/AquiferFetkovich.hpp> | ||
| #include <opm/simulators/aquifers/AquiferNumerical.hpp> | ||
| #include <opm/simulators/aquifers/SupportsFaceTag.hpp> | ||
| // #include <opm/simulators/aquifers/SupportsFaceTag.hpp> |
| // Grid needs to support Facetag | ||
| using Grid = std::remove_const_t<std::remove_reference_t<decltype(simulator.vanguard().grid())>>; | ||
| static_assert(SupportsFaceTag<Grid>::value, "Grid has to support assumptions about face tag."); | ||
| // static_assert(SupportsFaceTag<Grid>::value, "Grid has to support assumptions about face tag."); |
Comment on lines
+663
to
+685
| // TODO: make this more efficient (avoid copy!) and still valid for gpu | ||
| // TODO: did not want to delve into the special IntQuants vector with special allocators | ||
| std::vector<IntensiveQuantities> allIntensiveQuantities0() | ||
| { | ||
| std::vector<IntensiveQuantities> allIntensiveQuantities; | ||
| auto& timeZeroIntQuants = intensiveQuantityCache_[0]; | ||
| allIntensiveQuantities.insert(allIntensiveQuantities.end(), | ||
| timeZeroIntQuants.begin(), | ||
| timeZeroIntQuants.end()); | ||
|
|
||
|
|
||
| return allIntensiveQuantities; | ||
| } | ||
|
|
||
| std::vector<IntensiveQuantities> allIntensiveQuantities1() | ||
| { | ||
| std::vector<IntensiveQuantities> allIntensiveQuantities; | ||
| auto& timeOneIntQuants = intensiveQuantityCache_[1]; | ||
| assert(!timeOneIntQuants.empty()); | ||
| allIntensiveQuantities.insert(allIntensiveQuantities.end(), | ||
| timeOneIntQuants.begin(), | ||
| timeOneIntQuants.end()); | ||
| return allIntensiveQuantities; |
9b6e313 to
c87cfc5
Compare
Now the member types MatrixBlock, VectorBlock and ADVectorBlock may be GPU-adapted variants, so no need to pass explicit types to kernels etc.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The GPU assembly support extracted from #7018
Still contains more code than necessary from that branch to support the assembly, but the goal is to reduce this PR to be as small as possible yet support the new GPU linearization.