Efficient Kaczmarz

Unlike many of the other solvers provided by RegularizedLeastSquares.jl, the Kaczmarz method does not utilize a matrix-vector product with the operator $\mathbf{A}$ nor the normal operator $\mathbf{A*A}$. Instead, it uses the rows of $\mathbf{A}$ to update the solution iteratively. Efficient Kaczmarz implementation therefore require very efficient dot products with the rows of $\mathbf{A}$. In RegularizedLeastSquares.jl, this is achieved with the dot_with_matrix_row function.

using RegularizedLeastSquares
A = randn(256, 256)
x = randn(256)
b = A*x;

The dot_with_matrix_row function calculates the dot product between a row of A and the current approximate solution of x:

row = 1
isapprox(RegularizedLeastSquares.dot_with_matrix_row(A, x, row), sum(A[row, :] .* x))
true

Since in Julia, dense arrays are stored in column-major order, such a row-based operation is quite inefficient. A workaround is to transpose the matrix then pass it to a Kaczmarz solver.

At = collect(transpose(A))
A_eff = transpose(At)
256×256 transpose(::Matrix{Float64}) with eltype Float64:
 -1.31841     0.464512    2.42372   …  -0.0945286   0.471259    0.570839
  1.3711     -1.9538      0.900866      1.91157    -0.108455   -0.747888
  1.91357    -1.17461     0.348358      0.278152   -0.125971    1.25915
 -2.11938     1.63108    -0.318206     -0.22226     0.317672   -0.762615
 -0.727183    0.0874219   0.681325      0.518665   -0.954246    0.0553033
 -0.624659    0.721199   -1.14028   …   0.949141    0.500782   -1.04
 -1.1731      0.681074    0.741181     -2.01154     0.824422   -0.363923
 -0.256242    0.626777   -0.171964     -0.544962    0.0312801   0.511189
 -0.983862   -0.724239   -0.516382     -0.0980682   2.76171    -1.13302
 -0.267138    0.333605   -0.121356      1.22106    -0.818869    0.943387
  ⋮                                 ⋱                           ⋮
  1.66475     0.472479   -0.183441      1.2447      0.134755    0.753172
 -0.8757     -0.973089   -0.558428      0.141847   -0.379535    0.607302
 -0.157392    0.631304   -0.54518       1.77496    -2.13202    -0.48395
  0.35504    -0.816248   -0.409547  …   0.0170573  -0.678147   -0.28966
 -0.166725   -0.493589    0.068753     -1.202       1.19069    -2.22396
  0.0425969  -1.20375     0.173463     -0.227487    1.50817    -0.436831
  1.24449     0.899385   -1.54272       0.474964   -0.722716    0.694785
  0.418353    0.71747    -1.26546       0.5119      0.594369    0.495401
  0.460503   -0.679736    1.04913   …   0.098671   -0.253889    0.15625

Note that the transpose function can return a lazy transpose object, so we first collect the transpose into a dense matrix. Then we transpose it again to get the efficient representation of the matrix.

We can compare the performance using the BenchmarkTools.jl package. First for the original matrix:

using BenchmarkTools
solver = createLinearSolver(Kaczmarz, A; reg = L2Regularization(0.0001), iterations=100)
@benchmark solve!(solver, b) samples = 100
BenchmarkTools.Trial: 100 samples with 1 evaluation per sample.
 Range (minmax):  28.407 ms28.548 ms   GC (min … max): 0.00% … 0.00%
 Time  (median):     28.457 ms               GC (median):    0.00%
 Time  (mean ± σ):   28.461 ms ± 26.056 μs   GC (mean ± σ):  0.00% ± 0.00%

             ▁ ▃    ▄▄█   ▁  ▆      ▃                         
  ▄▁▆▄▄▄▄▆▁▁▄█▇█▁▇▄▇███▇█▆▄█▁▆▆▄▄▆█▇▆▁▁▆▁▄▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▄ ▄
  28.4 ms         Histogram: frequency by time        28.5 ms <

 Memory estimate: 17.31 KiB, allocs estimate: 505.

And then for the efficient matrix:

solver_eff = createLinearSolver(Kaczmarz, A_eff; reg = L2Regularization(0.0001), iterations=100)
@benchmark solve!(solver_eff, b) samples = 100
BenchmarkTools.Trial: 100 samples with 1 evaluation per sample.
 Range (minmax):  1.861 ms 2.061 ms   GC (min … max): 0.00% … 0.00%
 Time  (median):     1.885 ms               GC (median):    0.00%
 Time  (mean ± σ):   1.887 ms ± 21.242 μs   GC (mean ± σ):  0.00% ± 0.00%

          █    ▃▁▁▃▃▁ ▁  ▃ ▃▆  ▁ ▁    ▃                     
  ▄▁▁▁▄▁▄▇█▄▄▇▇██████▄█▄▇███▁▇█▄█▇▇▇▄█▄▇▇▁▁▁▁▁▁▄▄▄▁▁▄▁▁▁▄ ▄
  1.86 ms        Histogram: frequency by time        1.92 ms <

 Memory estimate: 17.34 KiB, allocs estimate: 507.

We can also combine the efficient matrix with a weighting matrix, as is shown in the Weighting example.

Custom operators need to implement the dot_with_matrix_row function to be used with the Kaczmarz solver. Ideally, such an implementation is allocation free.


This page was generated using Literate.jl.