GPU Acceleration

RegularizedLeastSquares.jl supports generic GPU acceleration. This means that the user can use any GPU array type that supports the GPUArrays interface. This includes CUDA.jl, AMDGPU.jl, and Metal.jl. In this example we will use the package JLArrays.jl which provides a reference implementation for GPUArrays, that can runs on CPUs.

using JLArrays
gpu = JLArray;

To use the following examples on an actual GPU, load the appropraite package replace JLArray with the respective GPU array type, for example:

using CUDA
gpu = CuArray

At first we will look at an example of dense GPU arrays.

using RegularizedLeastSquares
A = gpu(rand(Float32, 32, 16))
x = gpu(rand(Float32, 16))
b = A*x;

Solvers adapt their states based on the type of the given measurement vector. This means that the solver will automatically switch to GPU acceleration if a GPU array is passed as the measurement vector.

solver = createLinearSolver(CGNR, A; reg = L2Regularization(0.0001), iterations=32);
x_approx = solve!(solver, b)

16-element JLArrays.JLArray{Float32, 1}:
 0.4164002
 0.14526057
 0.6163038
 0.6574648
 0.32578897
 0.5196078
 0.8784361
 0.9682347
 0.81701434
 0.3030981
 0.03612208
 0.16563502
 0.016823357
 0.034447186
 0.79909617
 0.8641832

This adaption does not include the operator. So if we want to compare with CPU result, we need to construct a new solver with a CPU operator.

solver = createLinearSolver(CGNR, Array(A); reg = L2Regularization(0.0001), iterations=32);
x_cpu = solve!(solver, Array(b))
isapprox(Array(x_approx), x_cpu)

true

Matrix-Free Operators

A special case is the usage of matrix-free operators. Since these operators do not have a concrete matrix representation, their GPU support depends on their implementation. Since not all multiplications within a solver approximation are in-place, the operator also needs to support the * operation and construct an appropriate result vector. For matrix-free operators based on LinearOperators.jl, this can be achieved by implementing the LinearOperators.storage_type method.

In the following, we will take another look at the CS example and execute it on the GPU. Note that for the JLArray example we chose a small phantom, since the JLArray implementation is not optimized for performance:

using ImagePhantoms, ImageGeoms
N = 32
image = shepp_logan(N, SheppLoganToft())

using Random, LinearOperatorCollection
randomIndices = shuffle(eachindex(image))
sampledIndices = sort(randomIndices[1:div(end, 3)]);

To construct the operator, we need to convert the indices to a GPU array. We also need to specify the correct storage type. In both LinearOperators.jl and LinearOperatorCollection.jl this is done with the S keyword argument.

gpu_indices = gpu(sampledIndices)
A = SamplingOp(eltype(image), pattern = gpu_indices, shape = size(image), S = typeof(b));

Let's inspect the storage type of the operator:

using LinearOperatorCollection.LinearOperators
LinearOperators.storage_type(A)

JLArrays.JLArray{Float32, 1}

Afterwards we can use the operator as usual:

b = A*vec(gpu(image));

And use it in the solver:

using RegularizedLeastSquares
reg = TVRegularization(0.01; shape=size(image))
solver = createLinearSolver(FISTA, A; reg=reg, iterations=20)
img_approx = solve!(solver,b);

To visualize the reconstructed image, we need to reshape the result vector to the correct shape and convert it to a CPU array:

img_approx = reshape(Array(img_approx),size(image))

32×32 Matrix{Float32}:
 0.00199702  0.00193104  0.00183131  …  0.00173736  0.00154652  0.00145767
 0.00216261  0.00211132  0.00203499     0.00197031  0.00173363  0.00162793
 0.0024575   0.00243332  0.00240797     0.0024831   0.00214833  0.00200677
 0.00286251  0.00287821  0.00295133     0.00336846  0.00285587  0.00264315
 0.00341038  0.00350722  0.00378339     0.00472802  0.00396004  0.00360812
 0.0041057   0.00437542  0.00499948  …  0.00635858  0.00550805  0.00495195
 0.00490898  0.00543578  0.0064682      0.00771049  0.00730472  0.00661022
 0.00587376  0.00659876  0.00785207     0.00861949  0.00896077  0.00841626
 0.00703571  0.00776789  0.00879208     0.0101396   0.0102775   0.0101047
 0.00844659  0.00884625  0.0087311      0.0115687   0.0118281   0.0119021
 ⋮                                   ⋱              ⋮           
 0.00741245  0.00819264  0.00908027     0.0090595   0.00861365  0.00801813
 0.00604114  0.00671177  0.00774456     0.00803995  0.00690551  0.00621056
 0.00481301  0.00526892  0.00607838  …  0.0062758   0.0051918   0.0046625
 0.00382091  0.00408526  0.00461551     0.00467788  0.00389877  0.00357043
 0.00304494  0.00321744  0.00354288     0.00351057  0.00300399  0.00280527
 0.00240789  0.00255864  0.00279744     0.00266541  0.00232971  0.00218029
 0.00194107  0.0020688   0.00227033     0.00207142  0.00180901  0.00166625
 0.00166149  0.00175785  0.00192877  …  0.00168288  0.00143337  0.00128359
 0.00154415  0.00162048  0.00176889     0.00149784  0.00124152  0.00108362

We will again use CairoMakie for visualization:

using CairoMakie
function plot_image(figPos, img; title = "", width = 150, height = 150)
  ax = CairoMakie.Axis(figPos; yreversed=true, title, width, height)
  hidedecorations!(ax)
  heatmap!(ax, img)
end
fig = Figure()
plot_image(fig[1,1], image, title = "Image")
samplingMask = fill(false, size(image))
samplingMask[sampledIndices] .= true
plot_image(fig[1,2], image .* samplingMask, title = "Sampled Image")
plot_image(fig[1,3], img_approx, title = "Reconstructed Image")
resize_to_layout!(fig)
fig

This page was generated using Literate.jl.