Efficient Kaczmarz

Unlike many of the other solvers provided by RegularizedLeastSquares.jl, the Kaczmarz method does not utilize a matrix-vector product with the operator $\mathbf{A}$ nor the normal operator $\mathbf{A*A}$. Instead, it uses the rows of $\mathbf{A}$ to update the solution iteratively. Efficient Kaczmarz implementation therefore require very efficient dot products with the rows of $\mathbf{A}$. In RegularizedLeastSquares.jl, this is achieved with the dot_with_matrix_row function.

using RegularizedLeastSquares
A = randn(256, 256)
x = randn(256)
b = A*x;

The dot_with_matrix_row function calculates the dot product between a row of A and the current approximate solution of x:

row = 1
isapprox(RegularizedLeastSquares.dot_with_matrix_row(A, x, row), sum(A[row, :] .* x))

true

Since in Julia, dense arrays are stored in column-major order, such a row-based operation is quite inefficient. A workaround is to transpose the matrix then pass it to a Kaczmarz solver.

At = collect(transpose(A))
A_eff = transpose(At)

256×256 transpose(::Matrix{Float64}) with eltype Float64:
  0.989365   -0.58187     1.97371    …  -0.694757   -0.655809   -0.996402
 -1.40978     0.192424   -0.745341      -0.547174    0.652495   -1.07808
  1.75596     1.60811     2.54307       -1.52723    -0.335948    1.43184
 -0.421766   -1.63185     0.934959       0.451071    0.362942   -1.1912
 -0.431688   -0.150076   -0.984168       1.36739     1.3035      2.85549
 -0.855404    1.25841     0.739715   …  -0.433284   -0.401964   -0.838355
 -0.548449   -0.0412651   0.484013      -1.28038    -2.13875    -0.166735
  1.06386     1.08153     0.617701      -1.28163    -0.224296   -0.368868
  1.04657     0.214068   -0.608255       0.525888    0.930029    0.440489
 -1.94456     0.206711   -1.32429       -0.204572    0.763032    0.290354
  ⋮                                  ⋱                           ⋮
  0.145738    0.224292   -0.85371       -1.9351     -1.11289    -0.785633
  0.827237   -0.267411   -0.140251       1.54657     0.905815   -1.24774
 -0.596398    1.10259    -0.861819       0.68415    -1.03592    -0.474571
  0.0153301   0.328171   -1.07013    …  -2.08589    -0.26642    -1.32704
  1.11026     0.625114    0.765961      -1.23445    -0.0597777   0.774984
  1.42451     0.578492   -1.17861       -0.113697    0.404036   -0.218054
  1.26331    -1.145       0.0806542      0.0282147   1.06409    -0.930758
 -0.347995    2.00732     0.163453       0.311289   -0.885045    0.782946
 -0.106906    1.26698    -0.397374   …   0.112938    0.414758    0.659097

Note that the transpose function can return a lazy transpose object, so we first collect the transpose into a dense matrix. Then we transpose it again to get the efficient representation of the matrix.

We can compare the performance using the BenchmarkTools.jl package. First for the original matrix:

using BenchmarkTools
solver = createLinearSolver(Kaczmarz, A; reg = L2Regularization(0.0001), iterations=100)
@benchmark solve!(solver, b) samples = 100

BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):  25.524 ms …  27.560 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     27.325 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   27.310 ms ± 189.770 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                                         █▄▃    
  ▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▁▁▁▂▁▁▇███▅▃ ▂
  25.5 ms         Histogram: frequency by time         27.4 ms <

 Memory estimate: 17.31 KiB, allocs estimate: 505.

And then for the efficient matrix:

solver_eff = createLinearSolver(Kaczmarz, A_eff; reg = L2Regularization(0.0001), iterations=100)
@benchmark solve!(solver_eff, b) samples = 100

BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):  1.826 ms …  2.073 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.856 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.859 ms ± 29.682 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▄▆▁ ▃       ▁█ ▁   ▁  ▁                                
  ▇▄▄▁▄███▇█▆▇▇▇▆▁▆██▁█▆▆▆█▆▁█▇▆▄▇▄▁▄▄▆▄▁▄▁▁▄▆▁▁▁▁▁▁▁▁▁▁▄▁▁▄ ▄
  1.83 ms        Histogram: frequency by time        1.92 ms <

 Memory estimate: 17.34 KiB, allocs estimate: 507.

We can also combine the efficient matrix with a weighting matrix, as is shown in the Weighting example.

Custom operators need to implement the dot_with_matrix_row function to be used with the Kaczmarz solver. Ideally, such an implementation is allocation free.

This page was generated using Literate.jl.