The Better Way to Convert an SVD into a Symmetric Eigenvalue Problem

A singular value decomposition of an m\times n matrix B is a factorization of the form B = U\Sigma V^\top, where U and V are square, orthogonal matrices and \Sigma is a diagonal matrix with (i,i)th entry \sigma_i \ge 0.1Everything carries over essentially unchanged for complex-valued matrices B with U and V being unitary matrices and every (\cdot)^\top being replaced by (\cdot)^* for (\cdot)^* the Hermitian transpose. The diagonal entries of \Sigma are referred to as the singular values of B and are conventionally ordered \sigma_{\rm max} = \sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_{\min(m,n)} = \sigma_{\rm min}. The columns of the matrices U and V are referred to as the right- and left- singular vectors of B and satisfy the relations Bv_i = \sigma_i u_i and B^\top u_i = \sigma_i v_i.

One can obtain the singular values and right and left singular vectors of B from the eigenvalues and eigenvectors of B^\top B and BB^\top. This follows from the calculations B^\top B = V\Sigma^2 V^\top and B^\top B = U\Sigma^2 U^\top. In other words, the nonzero singular values of B are the square roots of the nonzero eigenvalues of B^\top B and BB^\top. If one merely solves one of these problems, computing \Sigma along with U or V, one can obtain the other matrix V or U by computing U = BV \Sigma^{-1} or V = B^\top U \Sigma^{-1}. (These formulas are valid for invertible square matrices B, but similar formulas hold for singular or rectangular B to compute the singular vectors with nonzero singular values.)

This approach is often unundesirable for several reasons. Here are a few I’m aware of:

  1. Accuracy: Roughly speaking, in double-precision arithmetic, accurate stable numerical methods can resolve differences on the order of 16 orders of magnitude. This means an accurately computed SVD of B can resolve the roughly 16 orders of magnitude of decaying singular values, with singular values smaller than that difficult to compute accurately. By computing B^\top B, we square all of our singular values, so resolving 16 orders of magnitude of the eigenvalues of B^\top B means we only resolve 8 orders of magnitude of the singular values of B.2Relatedly, the two-norm condition number \kappa(B) := \sigma_{\rm max}(B) / \sigma_{\rm min}(B) of B^\top B is twice that of B. The dynamic range of our numerical computations has been cut in half!
  2. Loss of orthogonality: While U = BV \Sigma^{-1} and V = B^\top U \Sigma^{-1} are valid formulas in exact arithmetic, they fair poorly when implemented numerically. Specifically, the numerically computed values U_{\rm numerical} and V_{\rm numerical} may not be orthogonal matrices with, for example, U_{\rm numerical}^\top U_{\rm numerical} not even close to the identity matrix. One can, of course, orthogonalize the computed U or V, but this doesn’t fix the underlying problem that U or V have not been computed accurately.
  3. Loss of structure: If B possesses additional structure (e.g. sparsity), this structure may be lost or reduced by computing the product B^\top B.
  4. Nonlinearity: Even if we’re not actually computing the SVD numerically but doing analysis with pencil and paper, finding the SVD of B from B^\top B has the disadvantage of performing a nonlinear transformation on B. This prevents us from utilizing additive perturbation theorems for sums of symmetric matrices in our analysis.3For instance, one cannot prove Weyl’s perturbation theorem for singular values by considering B^\top B and applying Weyl’s perturbation theorem for symmetric eigenvalues.

There are times where these problems are insignificant and this approach is sensible: we shall return to this point in a bit. However, these problems should disqualify this approach from being the de facto way we reduce SVD computation to a symmetric eigenvalue problem. This is especially true since we have a better way.

The better way is by constructing the so-called Hermitian dilation4As Stewart and Sun detail in Section 4 of Chapter 1 of their monograph Matrix Perturbation Theory, the connections between the Hermitian dilation and the SVD go back to the discovery of the SVD itself, as it is used in Jordan’s construction of the SVD in 1874. (The SVD was also independently discovered by Beltrami the year previous.) Stewart and Sun refer to this matrix as the Jordan-Wiedlant matrix associated with B, as they attribute the widespread use of the matrix today to the work of Wiedlant. We shall stick to the term Hermitian dilation to refer to this matrix. of B, which is defined to be the matrix

(1)   \begin{equation*} \mathcal{H}(B) = \begin{bmatrix} 0 & B \\ B^\top & 0 \end{bmatrix}. \end{equation*}

One can show that the nonzero eigenvalues of \mathcal{H}(B) are precisely plus-or-minus the singular values of B. More specifically, we have

(2)   \begin{equation*} \mathcal{H}(B) \begin{bmatrix} u_i \\ \pm v_i \end{bmatrix} = \pm \sigma_i \begin{bmatrix} u_i \\ \pm v_i \end{bmatrix}. \end{equation*}

All of the remaining eigenvalues of \mathcal{H}(B) not of this form are zero.5This follows by noting \operatorname{rank}(\mathcal{H}(B)) = 2\operatorname{rank}(B) and thus \pm \sigma_i for i = 1,2,\ldots,\operatorname{rank}(B) account for all the nonzero eigenvalues of \mathcal{H}(B). Thus, the singular value decomposition of B is entirely encoded in the eigenvalue decomposition of \mathcal{H}(B).

This approach of using the Hermitian dilation \mathcal{H}(B) to compute the SVD of B fixes all the issues identified with the “B^\top B” approach. We are able to accurately resolve a full 16 orders of magnitude of singular values. The computed singular vectors are accurate and numerically orthogonal provided we use an accurate method for the symmetric eigenvalue problem. The Hermitian dilation \mathcal{H}(B) preserves important structural characteristics in B like sparsity. For purposes of theoretical analysis, the mapping B \mapsto \mathcal{H}(B) is linear.6The linearity of the Hermitian dilation gives direct extensions of most results about the symmetric eigenvalues to singular values; see Exercise 22.

Often one can work with the Hermitian dilation only implicitly: the matrix \mathcal{H}(B) need not actually be stored in memory with all its extra zeros. The programmer designs and implements an algorithm with \mathcal{H}(B) in mind, but deals with the matrix B directly for their computations. In a pinch, however, forming \mathcal{H}(B) directly in software and utilizing symmetric eigenvalue routines directly is often not too much less efficient than a dedicated SVD routine and can cut down on programmer effort significantly.

As with all things in life, there’s no free lunch here. There are a couple of downsides to the Hermitian dilation approach. First, \mathcal{H}(B) is, except for the trivial case B = 0, an indefinite symmetric matrix. By constast, B^\top B and BB^\top are positive semidefinite, which can be helpful in some contexts.7This is relevant if, say, we want to find the small singular values by inverse iteration. Positive definite linear systems are easier to solve by either direct or iterative methods. Further, if n\ll m (respectively, m \ll n), then B^\top B (respectively, BB^\top) is tiny compared to \mathcal{H}(B), so it might be considerably cheaper to compute an eigenvalue decomposition of B^\top B (or BB^\top) than \mathcal{H}(B).

Despite the somewhat salacious title of this article, the B^\top B and Hermitian dilation approaches both have their role, and the purpose of this article is not to say the B^\top B approach should be thrown in the dustbin. However, in my experience, I frequently hear the B^\top B approach stated as the definitive way of converting an SVD into an eigenvalue problem, with the Hermitian dilation approach not even mentioned. This, in my opinion, is backwards. For accuracy reasons alone, the Hermitian dilation should be the go-to tool for turning SVDs into symmetric eigenvalue problems, with the B^\top B approach only used when the problem is known to have singular values which don’t span many orders of magnitude or B is tall and skinny and the computational cost savings of the B^\top B approach are vital.