Efficient Kaczmarz

Unlike many of the other solvers provided by RegularizedLeastSquares.jl, the Kaczmarz method does not utilize a matrix-vector product with the operator $\mathbf{A}$ nor the normal operator $\mathbf{A*A}$. Instead, it uses the rows of $\mathbf{A}$ to update the solution iteratively. Efficient Kaczmarz implementation therefore require very efficient dot products with the rows of $\mathbf{A}$. In RegularizedLeastSquares.jl, this is achieved with the dot_with_matrix_row function.

using RegularizedLeastSquares
A = randn(256, 256)
x = randn(256)
b = A*x;

The dot_with_matrix_row function calculates the dot product between a row of A and the current approximate solution of x:

row = 1
isapprox(RegularizedLeastSquares.dot_with_matrix_row(A, x, row), sum(A[row, :] .* x))
true

Since in Julia, dense arrays are stored in column-major order, such a row-based operation is quite inefficient. A workaround is to transpose the matrix then pass it to a Kaczmarz solver.

At = collect(transpose(A))
A_eff = transpose(At)
256×256 transpose(::Matrix{Float64}) with eltype Float64:
  0.373784  -0.54312    0.349325   …   0.893351    1.73373   -2.1432
 -0.516792  -1.14507    1.31379       -0.775597    0.581488   0.849469
 -0.78072    1.50879   -0.0249041     -0.310329   -0.797523  -0.166914
 -0.52069    1.05935   -0.375865       0.192418    0.643546   0.427718
  0.715307  -1.03075   -0.433466       0.649053    0.756142   1.02974
 -1.18522    0.597532  -0.806605   …  -2.05184     0.978858  -1.2916
 -0.488956   1.35129   -1.30962       -0.106001    1.20288    1.34395
 -0.402171  -0.704042   0.347099       0.0959154  -0.469052   1.3633
  0.376355  -0.942417  -1.73957       -0.814914   -0.918854   0.504489
 -1.79248   -0.8531     0.942533      -0.193902    1.10743    0.0550654
  ⋮                                ⋱                          ⋮
  1.58513   -1.06558   -0.385501      -0.0650865   0.399914  -0.83391
  0.794599  -1.77515    0.141762      -0.734853    1.16374   -0.240044
 -0.210881  -0.46823    1.55944        0.530014   -0.500941  -1.70375
  0.244036  -2.4584     0.273192   …  -0.233904    1.27431   -0.780813
 -0.415182   0.439951  -2.10723        0.442822   -0.904055   0.833262
 -0.343712   0.142182  -0.125574       0.628302   -0.937728   0.0196701
 -1.89055   -1.94282   -0.323894      -2.46324     0.730791  -0.0403059
  0.391409  -0.691054   0.387689      -0.455906    0.923646  -0.244629
  0.756739  -2.15237    1.19982    …  -0.112755    0.588129  -1.00897

Note that the transpose function can return a lazy transpose object, so we first collect the transpose into a dense matrix. Then we transpose it again to get the efficient representation of the matrix.

We can compare the performance using the BenchmarkTools.jl package. First for the original matrix:

using BenchmarkTools
solver = createLinearSolver(Kaczmarz, A; reg = L2Regularization(0.0001), iterations=100)
@benchmark solve!(solver, b) samples = 100
BenchmarkTools.Trial: 100 samples with 1 evaluation per sample.
 Range (minmax):  27.856 ms28.098 ms   GC (min … max): 0.00% … 0.00%
 Time  (median):     27.945 ms               GC (median):    0.00%
 Time  (mean ± σ):   27.948 ms ± 44.343 μs   GC (mean ± σ):  0.00% ± 0.00%

            ▆   ▁ ▆▁  ▃▆▃▃██▁ █▁              ▁              
  ▇▁▁▄▄▁▄▁▁▄█▇▇▇█▄██▄▇███████▇██▄▇▄▁▄▄▇▄▁▄▄▁▁▁█▁▁▇▁▁▁▄▁▁▁▁▄ ▄
  27.9 ms         Histogram: frequency by time        28.1 ms <

 Memory estimate: 17.31 KiB, allocs estimate: 505.

And then for the efficient matrix:

solver_eff = createLinearSolver(Kaczmarz, A_eff; reg = L2Regularization(0.0001), iterations=100)
@benchmark solve!(solver_eff, b) samples = 100
BenchmarkTools.Trial: 100 samples with 1 evaluation per sample.
 Range (minmax):  1.828 ms 2.133 ms   GC (min … max): 0.00% … 0.00%
 Time  (median):     1.852 ms               GC (median):    0.00%
 Time  (mean ± σ):   1.864 ms ± 38.186 μs   GC (mean ± σ):  0.00% ± 0.00%

      ▂ ▆▄█                                               
  ▃▄▆▃██████▆▄▄▆▄▄▆▃▄▁▃▃▁▁▃▁▄▁▃▃▃▃▁▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▃ ▃
  1.83 ms        Histogram: frequency by time        1.98 ms <

 Memory estimate: 17.34 KiB, allocs estimate: 507.

We can also combine the efficient matrix with a weighting matrix, as is shown in the Weighting example.

Custom operators need to implement the dot_with_matrix_row function to be used with the Kaczmarz solver. Ideally, such an implementation is allocation free.


This page was generated using Literate.jl.