Efficient Kaczmarz
Unlike many of the other solvers provided by RegularizedLeastSquares.jl, the Kaczmarz method does not utilize a matrix-vector product with the operator $\mathbf{A}$ nor the normal operator $\mathbf{A*A}$. Instead, it uses the rows of $\mathbf{A}$ to update the solution iteratively. Efficient Kaczmarz implementation therefore require very efficient dot products with the rows of $\mathbf{A}$. In RegularizedLeastSquares.jl, this is achieved with the dot_with_matrix_row function.
using RegularizedLeastSquares
A = randn(256, 256)
x = randn(256)
b = A*x;The dot_with_matrix_row function calculates the dot product between a row of A and the current approximate solution of x:
row = 1
isapprox(RegularizedLeastSquares.dot_with_matrix_row(A, x, row), sum(A[row, :] .* x))trueSince in Julia, dense arrays are stored in column-major order, such a row-based operation is quite inefficient. A workaround is to transpose the matrix then pass it to a Kaczmarz solver.
At = collect(transpose(A))
A_eff = transpose(At)256×256 transpose(::Matrix{Float64}) with eltype Float64:
0.373784 -0.54312 0.349325 … 0.893351 1.73373 -2.1432
-0.516792 -1.14507 1.31379 -0.775597 0.581488 0.849469
-0.78072 1.50879 -0.0249041 -0.310329 -0.797523 -0.166914
-0.52069 1.05935 -0.375865 0.192418 0.643546 0.427718
0.715307 -1.03075 -0.433466 0.649053 0.756142 1.02974
-1.18522 0.597532 -0.806605 … -2.05184 0.978858 -1.2916
-0.488956 1.35129 -1.30962 -0.106001 1.20288 1.34395
-0.402171 -0.704042 0.347099 0.0959154 -0.469052 1.3633
0.376355 -0.942417 -1.73957 -0.814914 -0.918854 0.504489
-1.79248 -0.8531 0.942533 -0.193902 1.10743 0.0550654
⋮ ⋱ ⋮
1.58513 -1.06558 -0.385501 -0.0650865 0.399914 -0.83391
0.794599 -1.77515 0.141762 -0.734853 1.16374 -0.240044
-0.210881 -0.46823 1.55944 0.530014 -0.500941 -1.70375
0.244036 -2.4584 0.273192 … -0.233904 1.27431 -0.780813
-0.415182 0.439951 -2.10723 0.442822 -0.904055 0.833262
-0.343712 0.142182 -0.125574 0.628302 -0.937728 0.0196701
-1.89055 -1.94282 -0.323894 -2.46324 0.730791 -0.0403059
0.391409 -0.691054 0.387689 -0.455906 0.923646 -0.244629
0.756739 -2.15237 1.19982 … -0.112755 0.588129 -1.00897Note that the transpose function can return a lazy transpose object, so we first collect the transpose into a dense matrix. Then we transpose it again to get the efficient representation of the matrix.
We can compare the performance using the BenchmarkTools.jl package. First for the original matrix:
using BenchmarkTools
solver = createLinearSolver(Kaczmarz, A; reg = L2Regularization(0.0001), iterations=100)
@benchmark solve!(solver, b) samples = 100BenchmarkTools.Trial: 100 samples with 1 evaluation per sample.
Range (min … max): 27.856 ms … 28.098 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 27.945 ms ┊ GC (median): 0.00%
Time (mean ± σ): 27.948 ms ± 44.343 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▆ ▁ ▆▁ ▃▆▃▃▁▁██▁ █▁ ▁
▇▁▁▄▄▁▄▁▁▄█▇▇▇█▄██▄▇█████████▇██▄▇▄▁▄▄▇▄▁▄▄▁▁▁█▁▁▇▁▁▁▄▁▁▁▁▄ ▄
27.9 ms Histogram: frequency by time 28.1 ms <
Memory estimate: 17.31 KiB, allocs estimate: 505.And then for the efficient matrix:
solver_eff = createLinearSolver(Kaczmarz, A_eff; reg = L2Regularization(0.0001), iterations=100)
@benchmark solve!(solver_eff, b) samples = 100BenchmarkTools.Trial: 100 samples with 1 evaluation per sample.
Range (min … max): 1.828 ms … 2.133 ms ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.852 ms ┊ GC (median): 0.00%
Time (mean ± σ): 1.864 ms ± 38.186 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ ▆▄█ ▂
▃▄▆▃█████▇█▆▄▄█▆▄▄▆▃▄▁▃▃▁▁▃▁▄▁▃▃▃▃▁▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▃ ▃
1.83 ms Histogram: frequency by time 1.98 ms <
Memory estimate: 17.34 KiB, allocs estimate: 507.We can also combine the efficient matrix with a weighting matrix, as is shown in the Weighting example.
Custom operators need to implement the dot_with_matrix_row function to be used with the Kaczmarz solver. Ideally, such an implementation is allocation free.
This page was generated using Literate.jl.