Efficient Kaczmarz

Unlike many of the other solvers provided by RegularizedLeastSquares.jl, the Kaczmarz method does not utilize a matrix-vector product with the operator $\mathbf{A}$ nor the normal operator $\mathbf{A*A}$. Instead, it uses the rows of $\mathbf{A}$ to update the solution iteratively. Efficient Kaczmarz implementation therefore require very efficient dot products with the rows of $\mathbf{A}$. In RegularizedLeastSquares.jl, this is achieved with the dot_with_matrix_row function.

using RegularizedLeastSquares
A = randn(256, 256)
x = randn(256)
b = A*x;

The dot_with_matrix_row function calculates the dot product between a row of A and the current approximate solution of x:

row = 1
isapprox(RegularizedLeastSquares.dot_with_matrix_row(A, x, row), sum(A[row, :] .* x))
true

Since in Julia, dense arrays are stored in column-major order, such a row-based operation is quite inefficient. A workaround is to transpose the matrix then pass it to a Kaczmarz solver.

At = collect(transpose(A))
A_eff = transpose(At)
256×256 transpose(::Matrix{Float64}) with eltype Float64:
 -0.845885  -1.03564      0.151952   …   0.671462   -0.329974  -0.04661
  0.155163   0.392258     0.897342      -1.8246      0.189523   1.10736
 -0.495033  -0.49133      0.778819       0.5618     -0.560283  -0.668798
 -0.233133   0.70318     -0.524632      -0.22617     0.978764  -0.400967
  0.17351    0.00356125   0.121891      -0.832413   -0.127142  -2.99181
  1.94344   -0.0894104    0.0727451  …   0.729585    0.44853   -1.62267
 -0.906821   0.320533    -0.825438       0.814818   -0.381371  -0.209179
  1.07931    0.823433     0.26865        0.757307    0.232194   1.54605
  0.725729   0.207188     0.549091      -0.924242    0.373392  -0.749068
 -0.159451   0.0493149    1.93157        0.98536    -1.68801    0.137009
  ⋮                                  ⋱                          ⋮
  0.340863  -0.398874    -0.454218      -0.904708    0.959747  -0.105418
  1.05571   -0.0929981    1.10612       -0.491928    1.75692   -1.30198
  2.69411    0.356481     0.711368       0.0130584   2.54326    1.15603
 -0.698125  -0.237754    -1.17593    …   0.100194    1.94983    0.0699787
  0.307402  -1.81184     -0.573497      -0.650915    0.329893   0.502739
  0.587114  -1.20302      0.234708       0.272464   -0.684008   0.278572
 -1.02732    0.891735    -2.78025        1.9955     -0.707748   1.71998
  1.56868   -0.998924    -1.04445       -0.186529   -0.662536  -0.29924
 -0.278334   0.558744    -0.606144   …  -0.341575    0.204934  -1.49226

Note that the transpose function can return a lazy transpose object, so we first collect the transpose into a dense matrix. Then we transpose it again to get the efficient representation of the matrix.

We can compare the performance using the BenchmarkTools.jl package. First for the original matrix:

using BenchmarkTools
solver = createLinearSolver(Kaczmarz, A; reg = L2Regularization(0.0001), iterations=100)
@benchmark solve!(solver, b) samples = 100
BenchmarkTools.Trial: 100 samples with 1 evaluation per sample.
 Range (minmax):  37.972 ms38.524 ms   GC (min … max): 0.00% … 0.00%
 Time  (median):     38.139 ms               GC (median):    0.00%
 Time  (mean ± σ):   38.157 ms ± 96.939 μs   GC (mean ± σ):  0.00% ± 0.00%

              ▅    ▅▅▂▂▂▂█▂   ▂   █ █ ▂       ▅               
  ▅▅▅▁█▁▅▁▅▅▅▅███▅█████████▅▅█▁▅▅███▅█▅▅▅▁▅█▅█▁██▁▅▁▁▅▅▁▅▁█ ▅
  38 ms           Histogram: frequency by time        38.4 ms <

 Memory estimate: 17.31 KiB, allocs estimate: 505.

And then for the efficient matrix:

solver_eff = createLinearSolver(Kaczmarz, A_eff; reg = L2Regularization(0.0001), iterations=100)
@benchmark solve!(solver_eff, b) samples = 100
BenchmarkTools.Trial: 100 samples with 1 evaluation per sample.
 Range (minmax):  1.801 ms 1.902 ms   GC (min … max): 0.00% … 0.00%
 Time  (median):     1.818 ms               GC (median):    0.00%
 Time  (mean ± σ):   1.820 ms ± 14.699 μs   GC (mean ± σ):  0.00% ± 0.00%

         ▂  ▆▅▆▂█                                          
  ▄▄▇▇▅▄▄█▅▇██████▇▄▅▅▄▁▄▁▁▄▁▁▁▁▁▁▄▁▄▁▁▁▄▁▁▁▁▁▁▄▄▁▄▁▁▁▁▁▄ ▄
  1.8 ms         Histogram: frequency by time        1.87 ms <

 Memory estimate: 17.34 KiB, allocs estimate: 507.

We can also combine the efficient matrix with a weighting matrix, as is shown in the Weighting example.

Custom operators need to implement the dot_with_matrix_row function to be used with the Kaczmarz solver. Ideally, such an implementation is allocation free.


This page was generated using Literate.jl.