Multi-Threading

There are different ways multi-threading can be used with RegularizedLeastSquares.jl. To use multi-threading in Julia, one needs to start their session with multi-threads, see the Julia documentation for more information.

Solver Based Multi-Threading

This type of multi-threading is transparent to the solver and is applicable if the total solution is composed of individual solutions that can be solved in parallel. In particular, this approach also allows for using solvers with different parameters, such as their operator or regularization parameters.

using RegularizedLeastSquares
As = [rand(32, 16) for _ in 1:4]
xs = [rand(16) for _ in 1:4]
bs = [A*x for (A, x) in zip(As, xs)]

xs_approx = similar(xs)
Threads.@threads for i in 1:4
  solver = createLinearSolver(CGNR, As[i]; iterations=32)
  xs_approx[i] = solve!(solver, bs[i])
end

Operator Based Multi-Threading

This type of multi-threading involves linear operators or proximal maps that can be implemnted in parallel. Examples of this include the proximal map of the TV regularization term, which is based on the multi-threaded GradientOp from LinearOperatorCollection. GPU acceleration also falls under this approach, see GPU Acceleration for more information.

Measurement Based Multi-Threading

This level of multi-threading applies the same solver (and its parameters) to multiple measurement vectors or rather a measurement matrix B. This is useful in the case of multiple measurements that can be solved in parallel and can reuse the same solver. This approach is not applicable if the operator is stateful.

To use this approach we first build a measurement matrix B and a corresponding solver:

A = first(As)
B = mapreduce(x -> A*x, hcat, xs)
solver = createLinearSolver(CGNR, A; iterations=32)
CGNR{Matrix{Float64}, Matrix{Float64}, L2Regularization{Float64}, Vector{Any}}([0.873070177875379 0.4489929668503878 … 0.10130640845656891 0.4807989969814992; 0.9749353348257264 0.3735845974566312 … 0.02513483367683289 0.5554485107067461; … ; 0.823516103273532 0.6483667860839095 … 0.05889148345230033 0.36070270096907; 0.1933993497213098 0.08168369194818892 … 0.5087893401540348 0.7253768187076945], [12.126180730802902 8.703613141585812 … 8.697878141272353 9.979016753753086; 8.703613141585812 11.607601872559167 … 7.9465662621924436 8.327939100895293; … ; 8.697878141272353 7.9465662621924436 … 9.903872768226947 8.938247556168001; 9.979016753753086 8.327939100895293 … 8.938247556168001 12.240652197219724], L2Regularization{Float64}(0.0), Any[], NoNormalization(), 32, RegularizedLeastSquares.CGNRState{Float64, Float64, Vector{Float64}}([5.06e-321, 6.9380211621485e-310, 6.9378998317039e-310, 0.0, 5.06e-321, 6.9378946657496e-310, 5.0e-324, 5.263544247e-315, 6.9379577524915e-310, 0.0, 5.06e-321, 8.86e-321, 5.0e-324, 5.06e-321, 5.0e-324, 6.93789252806663e-310], [5.06e-321, 6.9380211621485e-310, 6.9378998317039e-310, 0.0, 5.06e-321, 6.9378946657496e-310, 5.0e-324, 5.263544247e-315, 6.9379577524915e-310, 0.0, 5.06e-321, 8.86e-321, 5.0e-324, 5.06e-321, 5.0e-324, 6.9378925280698e-310], [5.06e-321, 6.9380211621485e-310, 6.9378998317039e-310, 0.0, 5.06e-321, 6.9378946657496e-310, 5.0e-324, 5.263544247e-315, 6.9379577524915e-310, 0.0, 5.06e-321, 8.86e-321, 5.0e-324, 5.06e-321, 5.0e-324, 6.93789252768403e-310], [5.06e-321, 6.9380211621485e-310, 6.9378998317039e-310, 0.0, 5.06e-321, 6.9378946657496e-310, 5.0e-324, 5.263544247e-315, 6.9379577524915e-310, 0.0, 5.06e-321, 8.86e-321, 5.0e-324, 5.06e-321, 5.0e-324, 6.9378925276872e-310], 0.0, 0.0, 0.0, 0, 2.220446049250313e-16, 0.0))

We can then simply pass the measurement matrix to the solver. The result will be the same as if we passed each colument of B seperately:

x_approx = solve!(solver, B)
size(x_approx)
(16, 4)

The previous solve! call was still executed sequentially. To execute it in parallel, we have to specify a multi-threaded scheduler as a keyword-argument of the solve! call. RegularizedLeastSquares.jl provides a MultiThreadingState scheduler that can be used for this purpose. This scheduler is based on the Threads.@threads macro:

x_multi = solve!(solver, B; scheduler = MultiThreadingState)
x_approx == x_multi
true

Custom Scheduling

It is possible to implement custom scheduling. The following code shows how to implement this for the Threads.@spawn macro. Usually one this to implement multi-threading with a package such as FLoop.jl or ThreadPools.jl for thread pinning:

Since most solver have conv. criteria, they can finish at different iteration numbers, which we track this information with flags.

 mutable struct SpawnState{S, ST <: AbstractSolverState{S}} <: RegularizedLeastSquares.AbstractMatrixSolverState{S}
   states::Vector{ST}
   active::Vector{Bool}
   SpawnState(states::Vector{ST}) where {S, ST <: AbstractSolverState{S}} = new{S, ST}(states, fill(true, length(states)))
 end

To hook into the existing init! code we only have to supply a method that gets a copyable "vector" state. This will invoke our SpawnState constructor with copies of the given state.

 prepareMultiStates(solver::AbstractLinearSolver, state::SpawnState, b::AbstractMatrix) = prepareMultiStates(solver, first(state.states), b);

We specialise the iterate function which is called with the idx of still active states

function Base.iterate(solver::AbstractLinearSolver, state::SpawnState, activeIdx)
  @sync Threads.@spawn for i in activeIdx
    res = iterate(solver, state.states[i])
    if isnothing(res)
      state.active[i] = false
    end
  end
  return state.active, state
end

Now we can simply use the SpawnState scheduler in the solve! call:

x_custom = solve!(solver, B; scheduler = SpawnState)
x_approx == x_multi
true

This page was generated using Literate.jl.