Efficient Kaczmarz

Unlike many of the other solvers provided by RegularizedLeastSquares.jl, the Kaczmarz method does not utilize a matrix-vector product with the operator $\mathbf{A}$ nor the normal operator $\mathbf{A*A}$. Instead, it uses the rows of $\mathbf{A}$ to update the solution iteratively. Efficient Kaczmarz implementation therefore require very efficient dot products with the rows of $\mathbf{A}$. In RegularizedLeastSquares.jl, this is achieved with the dot_with_matrix_row function.

using RegularizedLeastSquares
A = randn(256, 256)
x = randn(256)
b = A*x;

The dot_with_matrix_row function calculates the dot product between a row of A and the current approximate solution of x:

row = 1
isapprox(RegularizedLeastSquares.dot_with_matrix_row(A, x, row), sum(A[row, :] .* x))

true

Since in Julia, dense arrays are stored in column-major order, such a row-based operation is quite inefficient. A workaround is to transpose the matrix then pass it to a Kaczmarz solver.

At = collect(transpose(A))
A_eff = transpose(At)

256×256 transpose(::Matrix{Float64}) with eltype Float64:
  0.138222   -0.815986   -0.458157  …   1.13464    -1.2559      0.0904353
 -0.0901411  -0.790107   -1.64117      -0.104218   -1.25769    -0.752571
 -1.3977     -0.0658572  -0.418464      1.70363    -1.29695     0.631492
 -0.415756   -1.63742    -0.590953     -1.06943    -0.165019   -1.52824
  0.548712   -0.268245   -0.786453     -1.24014    -0.693496   -0.0245972
 -1.58254    -0.116031    0.290272  …  -1.8788     -1.95892     1.34318
 -0.814817   -1.00597    -1.14166      -0.925887   -1.43197    -0.564316
 -0.298049   -0.654494    0.147558     -1.43565     0.542322    0.375231
 -0.554052   -0.621265    0.986895     -0.518966    0.0533121  -1.00655
  0.491985    1.7973      1.14473       0.0242168   1.05972    -0.619231
  ⋮                                 ⋱                           ⋮
 -0.903275    0.139654   -2.40032      -0.740434   -0.262511    0.674176
 -0.541804    0.454915    0.545221     -1.50538     0.672708    0.029648
 -0.0227916  -0.349905    0.213702      1.31863     0.534436   -0.662495
 -0.286013    2.17479    -1.175     …   0.555187   -0.520166    0.746992
  0.842314    1.13362    -0.741895     -0.494821   -0.554388    0.40132
  0.588076   -2.53672     1.55062      -0.541471   -0.405766    0.813539
  0.804086    1.1432      0.176165     -1.28705     0.105536    1.03964
 -1.53222     1.40588     1.08677      -0.538044   -0.438009   -1.39189
 -1.44139    -0.70507     2.55221   …  -1.28328    -1.31252     0.300821

Note that the transpose function can return a lazy transpose object, so we first collect the transpose into a dense matrix. Then we transpose it again to get the efficient representation of the matrix.

We can compare the performance using the BenchmarkTools.jl package. First for the original matrix:

using BenchmarkTools
solver = createLinearSolver(Kaczmarz, A; reg = L2Regularization(0.0001), iterations=100)
@benchmark solve!(solver, b) samples = 100

BenchmarkTools.Trial: 100 samples with 1 evaluation per sample.
 Range (min … max):  38.329 ms … 38.600 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     38.450 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   38.448 ms ± 56.091 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

              ▁  ▁▃ ▃    ▃▁▃▆▁▁▃█▁▁  ▁ ▃▃                      
  ▇▁▄▄▄▄▇▄▄▇▁▁█▄▇██▇█▁▁▇▄██████████▄▇█▇██▄▁▁▇▇▁▇▇▁▁▁▄▁▁▁▁▁▁▁▄ ▄
  38.3 ms         Histogram: frequency by time        38.6 ms <

 Memory estimate: 17.31 KiB, allocs estimate: 505.

And then for the efficient matrix:

solver_eff = createLinearSolver(Kaczmarz, A_eff; reg = L2Regularization(0.0001), iterations=100)
@benchmark solve!(solver_eff, b) samples = 100

BenchmarkTools.Trial: 100 samples with 1 evaluation per sample.
 Range (min … max):  1.795 ms … 1.851 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.811 ms             ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.812 ms ± 8.681 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

               ▂ ▄▂  █▆ ▆ ▂▂▂   ▂    ▂                       
  ▄▁▄▁▄▁▁▆▁▄▄▄▄████▆▆████████▆▆▄█▄█▄██▄▄▁▄▁▆▁▁▁▄▄▄▄▄▁▁▁▁▁▁▆ ▄
  1.79 ms        Histogram: frequency by time       1.83 ms <

 Memory estimate: 17.34 KiB, allocs estimate: 507.

We can also combine the efficient matrix with a weighting matrix, as is shown in the Weighting example.

Custom operators need to implement the dot_with_matrix_row function to be used with the Kaczmarz solver. Ideally, such an implementation is allocation free.

This page was generated using Literate.jl.