GPU Acceleration

RegularizedLeastSquares.jl supports generic GPU acceleration. This means that the user can use any GPU array type that supports the GPUArrays interface. This includes CUDA.jl, AMDGPU.jl, and Metal.jl. To perform a reconstruction on the GPU, one has to load a GPU backend package such as CUDA and specify the GPU array type:

using CUDA
gpu = CuArray

At first we will look at an example of dense GPU arrays.

using RegularizedLeastSquares
A = gpu(rand(Float32, 32, 16))
x = gpu(rand(Float32, 16))
b = A*x;

Solvers adapt their states based on the type of the given measurement vector. This means that the solver will automatically switch to GPU acceleration if a GPU array is passed as the measurement vector.

solver = createLinearSolver(CGNR, A; reg = L2Regularization(0.0001), iterations=32);
x_approx = solve!(solver, b)
16-element Vector{Float32}:
 0.11828006
 0.20086356
 0.77634
 0.30057234
 0.22131151
 0.048125662
 0.84582514
 0.46787456
 0.1725377
 0.8522468
 0.22672482
 0.9869703
 0.122989774
 0.2000143
 0.7840195
 0.73087007

This adaption does not include the operator. So if we want to compare with CPU result, we need to construct a new solver with a CPU operator.

solver = createLinearSolver(CGNR, Array(A); reg = L2Regularization(0.0001), iterations=32);
x_cpu = solve!(solver, Array(b))
isapprox(Array(x_approx), x_cpu)
true

Matrix-Free Operators

A special case is the usage of matrix-free operators. Since these operators do not have a concrete matrix representation, their GPU support depends on their implementation. Since not all multiplications within a solver approximation are in-place, the operator also needs to support the * operation and construct an appropriate result vector. For matrix-free operators based on LinearOperators.jl, this can be achieved by implementing the LinearOperators.storage_type method.

In the following, we will take another look at the CS example and execute it on the GPU. Note that for the JLArray example we chose a small phantom, since the JLArray implementation is not optimized for performance:

using ImagePhantoms, ImageGeoms
N = 32
image = shepp_logan(N, SheppLoganToft())

using Random, LinearOperatorCollection
randomIndices = shuffle(eachindex(image))
sampledIndices = sort(randomIndices[1:div(end, 3)]);

To construct the operator, we need to convert the indices to a GPU array. We also need to specify the correct storage type. In both LinearOperators.jl and LinearOperatorCollection.jl this is done with the S keyword argument.

gpu_indices = gpu(sampledIndices)
A = SamplingOp(eltype(image), pattern = gpu_indices, shape = size(image), S = typeof(b));

Let's inspect the storage type of the operator:

using LinearOperatorCollection.LinearOperators
LinearOperators.storage_type(A)
Vector{Float32} (alias for Array{Float32, 1})

Afterwards we can use the operator as usual:

b = A*vec(gpu(image));

And use it in the solver:

using RegularizedLeastSquares
reg = TVRegularization(0.01; shape=size(image))
solver = createLinearSolver(FISTA, A; reg=reg, iterations=20)
img_approx = solve!(solver,b);

To visualize the reconstructed image, we need to reshape the result vector to the correct shape and convert it to a CPU array:

img_approx = reshape(Array(img_approx),size(image))
32×32 Matrix{Float32}:
 0.00101102  0.00110708  0.00129486  …  0.00191152  0.00181119  0.00176317
 0.00108485  0.00120516  0.00142485     0.00207642  0.00194465  0.0018842
 0.00129354  0.00144831  0.00170987     0.00242752  0.00223714  0.00215599
 0.00168354  0.00187515  0.00220025     0.00299899  0.00268924  0.00256231
 0.00225214  0.00251522  0.00299666     0.00382381  0.00326671  0.00301945
 0.00300902  0.00342707  0.00417959  …  0.00496836  0.00402311  0.00353279
 0.00405923  0.00467085  0.0056849      0.00638751  0.00506009  0.00425931
 0.00562917  0.00629648  0.0073227      0.00768373  0.00628972  0.00531311
 0.007606    0.00815616  0.00874637     0.0078579   0.00728854  0.00652414
 0.00929491  0.00952994  0.0102001      0.0081802   0.00755981  0.0072924
 ⋮                                   ⋱              ⋮           
 0.0130903   0.0136095   0.0143         0.00888475  0.00857639  0.0080658
 0.0117656   0.0123412   0.0130407      0.00739936  0.00652403  0.00584041
 0.00963435  0.010353    0.0114217   …  0.0056574   0.00473653  0.00417112
 0.00755205  0.00801218  0.00868098     0.00403859  0.0033761   0.00302598
 0.00584923  0.00605821  0.0064204      0.00293018  0.00249776  0.00228187
 0.00443461  0.00446142  0.0045889      0.00228068  0.00196123  0.00180157
 0.00327395  0.00322906  0.00321365     0.00188298  0.00162933  0.00150206
 0.00242846  0.0024003   0.00238059  …  0.00164232  0.00145232  0.00136058
 0.00196044  0.00197145  0.00199666     0.00153157  0.00139066  0.00132936

We will again use CairoMakie for visualization:

using CairoMakie
function plot_image(figPos, img; title = "", width = 150, height = 150)
  ax = CairoMakie.Axis(figPos; yreversed=true, title, width, height)
  hidedecorations!(ax)
  heatmap!(ax, img)
end
fig = Figure()
plot_image(fig[1,1], image, title = "Image")
samplingMask = fill(false, size(image))
samplingMask[sampledIndices] .= true
plot_image(fig[1,2], image .* samplingMask, title = "Sampled Image")
plot_image(fig[1,3], img_approx, title = "Reconstructed Image")
resize_to_layout!(fig)
fig
Example block output

This page was generated using Literate.jl.