Efficient Kaczmarz
Unlike many of the other solvers provided by RegularizedLeastSquares.jl, the Kaczmarz method does not utilize a matrix-vector product with the operator $\mathbf{A}$ nor the normal operator $\mathbf{A*A}$. Instead, it uses the rows of $\mathbf{A}$ to update the solution iteratively. Efficient Kaczmarz implementation therefore require very efficient dot products with the rows of $\mathbf{A}$. In RegularizedLeastSquares.jl, this is achieved with the dot_with_matrix_row function.
using RegularizedLeastSquares
A = randn(256, 256)
x = randn(256)
b = A*x;The dot_with_matrix_row function calculates the dot product between a row of A and the current approximate solution of x:
row = 1
isapprox(RegularizedLeastSquares.dot_with_matrix_row(A, x, row), sum(A[row, :] .* x))trueSince in Julia, dense arrays are stored in column-major order, such a row-based operation is quite inefficient. A workaround is to transpose the matrix then pass it to a Kaczmarz solver.
At = collect(transpose(A))
A_eff = transpose(At)256×256 transpose(::Matrix{Float64}) with eltype Float64:
-1.31841 0.464512 2.42372 … -0.0945286 0.471259 0.570839
1.3711 -1.9538 0.900866 1.91157 -0.108455 -0.747888
1.91357 -1.17461 0.348358 0.278152 -0.125971 1.25915
-2.11938 1.63108 -0.318206 -0.22226 0.317672 -0.762615
-0.727183 0.0874219 0.681325 0.518665 -0.954246 0.0553033
-0.624659 0.721199 -1.14028 … 0.949141 0.500782 -1.04
-1.1731 0.681074 0.741181 -2.01154 0.824422 -0.363923
-0.256242 0.626777 -0.171964 -0.544962 0.0312801 0.511189
-0.983862 -0.724239 -0.516382 -0.0980682 2.76171 -1.13302
-0.267138 0.333605 -0.121356 1.22106 -0.818869 0.943387
⋮ ⋱ ⋮
1.66475 0.472479 -0.183441 1.2447 0.134755 0.753172
-0.8757 -0.973089 -0.558428 0.141847 -0.379535 0.607302
-0.157392 0.631304 -0.54518 1.77496 -2.13202 -0.48395
0.35504 -0.816248 -0.409547 … 0.0170573 -0.678147 -0.28966
-0.166725 -0.493589 0.068753 -1.202 1.19069 -2.22396
0.0425969 -1.20375 0.173463 -0.227487 1.50817 -0.436831
1.24449 0.899385 -1.54272 0.474964 -0.722716 0.694785
0.418353 0.71747 -1.26546 0.5119 0.594369 0.495401
0.460503 -0.679736 1.04913 … 0.098671 -0.253889 0.15625Note that the transpose function can return a lazy transpose object, so we first collect the transpose into a dense matrix. Then we transpose it again to get the efficient representation of the matrix.
We can compare the performance using the BenchmarkTools.jl package. First for the original matrix:
using BenchmarkTools
solver = createLinearSolver(Kaczmarz, A; reg = L2Regularization(0.0001), iterations=100)
@benchmark solve!(solver, b) samples = 100BenchmarkTools.Trial: 100 samples with 1 evaluation per sample.
Range (min … max): 28.407 ms … 28.548 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 28.457 ms ┊ GC (median): 0.00%
Time (mean ± σ): 28.461 ms ± 26.056 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▁ ▃ ▄▄█ ▃ ▁ ▆ ▃
▄▁▆▄▄▄▄▆▁▁▄█▇█▁▇▄▇███▆█▇▇█▆▄█▁▆▆▄▄▆█▇▆▁▁▆▁▄▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▄ ▄
28.4 ms Histogram: frequency by time 28.5 ms <
Memory estimate: 17.31 KiB, allocs estimate: 505.And then for the efficient matrix:
solver_eff = createLinearSolver(Kaczmarz, A_eff; reg = L2Regularization(0.0001), iterations=100)
@benchmark solve!(solver_eff, b) samples = 100BenchmarkTools.Trial: 100 samples with 1 evaluation per sample.
Range (min … max): 1.861 ms … 2.061 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.885 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.887 ms ± 21.242 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▃▁▁▃▃▁ ▁ ▃█▁ ▃▆ ▁ ▁ ▃
▄▁▁▁▄▁▄▇█▄▄▇▇██████▄█▄▇███▇██▁▇█▄█▇▇▇▄█▄▇▇▁▁▁▁▁▁▄▄▄▁▁▄▁▁▁▄ ▄
1.86 ms Histogram: frequency by time 1.92 ms <
Memory estimate: 17.34 KiB, allocs estimate: 507.We can also combine the efficient matrix with a weighting matrix, as is shown in the Weighting example.
Custom operators need to implement the dot_with_matrix_row function to be used with the Kaczmarz solver. Ideally, such an implementation is allocation free.
This page was generated using Literate.jl.