Neat Randomized Algorithms: Randomized Cholesky QR

As a research area, randomized numerical linear algebra (RNLA) is as hot as ever. To celebrate the exciting work in this space, I’m starting a new series on my blog where I celebrate cool recent algorithms in the area. At some future point, I might talk about my own work in this series, but for now I’m hoping to use this series to highlight some of the awesome work being done by my colleagues.

Given a tall matrix A \in \real^{m\times n} with m\ge n, its (economy-size) QR factorization is a decomposition of the form A = QR, where Q \in \real^{m\times n} is a matrix with orthonormal columns and R \in \real^{n\times n} is upper triangular. QR factorizations are used to solve least-squares problems and as a computational procedure to orthonormalize the columns of a matrix.

Here’s an example in MATLAB, where we use QR factorization to orthonormalize the columns of a 10^6\times 10^2 test matrix. It takes about 2.5 seconds to run.

>> A = randn(1e6, 1e2) * randn(1e2) * randn(1e2); % test matrix
>> tic; [Q,R] = qr(A,"econ"); toc
Elapsed time is 2.647317 seconds.

The classical algorithm for computing a QR factorization uses Householder reflectors and is exceptionally numerically stable. Since Q has orthonormal columns, Q^\top Q = I is the identity matrix. Indeed, this relation holds up to a tiny error for the Q computed by Householder QR:

>> norm(Q'*Q - eye(1e2)) % || Q^T Q - I ||
ans =

The relative error \|A - QR\|/\|A\| is also small:

>> norm(A - Q*R) / norm(A)
ans =

Here is an alternate procedure for computing a QR factorization, known as Cholesky QR:

function [Q,R] = cholesky_qr(A)
   R = chol(A'*A);
   Q = A / R;       % Q = A * R^{-1}

This algorithm works by forming A^\top A, computing its (upper triangular) Cholesky decomposition A^\top A = R^\top R, and setting Q = AR^{-1}. Cholesky QR is very fast, about 5\times faster than Householder QR for this example:

>> tic; [Q,R] = cholesky_qr(A); toc
Elapsed time is 0.536694 seconds.

Unfortunately, Cholesky QR is much less accurate and numerically stable than Householder QR. Here, for instance, is the value of \|Q^\top Q - I\|, about ten million times larger than for Householder QR!:

>> norm(Q'*Q - eye(1e2))
ans =

What’s going on? As we’ve discussed before on this blog, forming A^\top A is typically problematic in linear algebraic computations. The “badness” of a matrix A is measured by its condition number, defined to be the ratio of its largest and smallest singular values \kappa(A) = \sigma_{\rm max}(A)/\sigma_{\rm min}(A). The condition number of A^\top A is the square of the condition number of A, \kappa(A^\top A) = \kappa(A)^2, which is at the root of Cholesky QR’s loss of accuracy. Thus, Cholesky QR is only appropriate for matrices that are well-conditioned, having a small condition number \kappa(A) \approx 1, say \kappa(A) \le 10.

The idea of randomized Cholesky QR is to use randomness to precondition A, producing a matrix R_1 that B = AR_1^{-1} is well-conditioned. Then, since B is well-conditioned, we can apply ordinary Cholesky QR to it without issue. Here are the steps:

  1. Draw a sketching matrix S of size 2n\times m; see these posts of mine for an introduction to sketching.
  2. Form the sketch SA. This step compresses the very tall matrix m\times n to the much shorter matrix SA of size 2n\times n.
  3. Compute a QR factorization SA = Q_1R_1 using Householder QR. Since the matrix SA is small, this factorization will be quick to compute.
  4. Form the preconditioned matrix B = AR_1^{-1}.
  5. Apply Cholesky QR to B to compute B = QR_2.
  6. Set R = R_2R_1. Observe that A = BR_1 = QR_2R_1 = QR, as desired.

MATLAB code for randomized Cholesky QR is provided below:1Code for the sparsesign subroutine can be found here.

function [Q,R] = rand_cholesky_qr(A)
   S = sparsesign(2*size(A,2),size(A,1),8); % sparse sign embedding
   R1 = qr(S*A,"econ"); % sketch and (Householder) QR factorize
   B = A / R1; % B = A * R_1^{-1}
   [Q,R2] = cholesky_qr(B);
   R = R2*R1;

Randomized Cholesky QR is still faster than ordinary Householder QR, about 3\times faster in our experiment:

>> tic; [Q,R] = rand_cholesky_qr(A); toc
Elapsed time is 0.920787 seconds.

Randomized Cholesky QR greatly improves on ordinary Cholesky QR in terms of accuracy and numerical stability. In fact, the size of \|Q^\top Q - I\| is even smaller than for Householder QR!

>> norm(Q'*Q - eye(1e2))
ans =

The relative error \|A - QR\|/\|A\| is small, too! Even smaller than for Householder QR in fact:

>> norm(A - Q*R) / norm(A)
ans =

Like many great ideas, randomized Cholesky QR was developed independently by a number of research groups. A version of this algorithm was first introduced in 2021 by Fan, Guo, and Lin. Similar algorithms were investigated in 2022 and 2023 by Balabanov, Higgins et al., and Melnichenko et al. Check out Melnichenko et al.‘s paper in particular, which shows very impressive results for using randomized Cholesky QR to compute column pivoted QR factorizations.

References: Primary references are A Novel Randomized XR-Based Preconditioned CholeskyQR Algorithm by Fan, Guo, and Lin (2021); Randomized Cholesky QR factorizations by Balabanov (2022); Analysis of Randomized Householder-Cholesky QR Factorization with Multisketching by Higgins et al. (2023); CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT) by Melnichenko et al. (2023). The idea of using sketching to precondition tall matrices originates in the paper A fast randomized algorithm for overdetermined linear least-squares regression by Rokhlin and Tygert (2008).

Leave a Reply

Your email address will not be published. Required fields are marked *