In the previous post, we looked at the randomized SVD. In this post, we continue this discussion by looking at the analysis of the randomized SVD. Our approach is adapted from a new analysis of the randomized SVD by Joel A. Tropp and Robert J. Webber.
There are many types of analysis one can do for the randomized SVD. For instance, letting be a matrix and the rank- output of the randomized SVD, natural questions include:
- What is the expected error measured in some norm ? What about expected squared error ? How do these answers change with different norms?
- How large can the randomized SVD error get except for some small failure probability ?
- How close is the randomized SVD truncated to rank compared to the best rank- approximation to ?
- How close are the singular values and vectors of the randomized SVD approximation compared to those of ?
Implicitly and explicitly, the analysis of Tropp and Webber provides satisfactory answers to a number of these questions. In the interest of simplifying the presentation, we shall focus our presentation on just one of these questions, proving following result:
Here, is the Frobenius norm. We encourage the interested reader to check out Tropp and Webber’s paper to see the methodology we summarize here used to prove numerous facts about the randomized SVD and its extensions using subspace iteration and block Krylov iteration.
Let us recall the randomized SVD as we presented it in last post:
- Collect information. Form where is an appropriately chosen random matrix.
- Orthogonalize. Compute an orthonormal basis for the column space of by, e.g., thin QR factorization.
- Project. Form , where denotes the conjugate transpose.
The randomized SVD is the approximation
It is easy to upgrade into a compact SVD form , as we did as steps 4 and 5 in the previous post.
For the analysis of the randomized SVD, it is helpful to notice that the approximation takes the form
Here, denotes the orthogonal projection onto the column space of the matrix . We call the projection formula for the randomized SVD.1This formula bares a good deal of similarity to the projection formula for the Nyström approximation. This is not a coincidence.
Aside: Quadratic Unitarily Invariant Norms
A unitarily invariant norm is said to be quadratic if there exists another unitarily invariant norm such that
Many examples of quadratic unitarily invariant norms are found among the Schatten -norms, defined as
The Schatten -norms are unitarily invariant norms for every . However, the Schatten -norms are quadratic unitarily invariant norms only for since
For the remainder of this post, we let and be a quadratic unitarily invariant norm pair satisfying (1).
The starting point of our analysis is the following decomposition of the error of the randomized SVD:
Recall that we have defined to be an optimal rank- approximation to :
Thus, the error decomposition says that the excess error is bounded by
Using the projection formula, we can prove the error decomposition (2) directly:
The first line is the projection formula, the second is relation between and , and the third is the triangle inequality. For the fifth line, we use the fact that is an orthoprojector and thus has unit spectral norm together with the fact that
For the final inequality, we used commutation rule
Bounding the Error
In this section, we shall continue to refine the error decomposition (2). Consider a thin SVD of , partitioned as
- and both have orthonormal columns,
- and are square diagonal matrices with nonnegative entries,
- and have columns, and
- is an matrix.
Under this notation, the best rank- approximation is . Define
We assume throughout that is full-rank (i.e., ).
Applying the error decomposition (2), we get
For the second line, we used the unitary invariance of . Observe that the column space of is a subset of the column space of so
Let’s work on simplifying the expression
First, observe that
Thus, the projector takes the form
Remarkably, this seemingly convoluted combination of matrices actually is a well-studied operation in matrix theory, the parallel sum. The parallel sum of positive semidefinite matrices and is defined as
We will have much more to say about the parallel sum soon. For now, we use this notation to rewrite (5) as
Plugging this back into (4) and then (3), we obtain the error bound
Simplifying this expression further will require more knowledge about parallel sums, which we shall discuss now.
Aside: Parallel Sums
Let us put aside the randomized SVD for a moment and digress to the seemingly unrelated topic of electrical circuits. Suppose we have a battery of voltage and we connect the ends using a wire of resistance . The current is given by Ohm’s law
Similarly, if the wire is replaced by a different wire with resistance , the current is then
Now comes the interesting question. What if we connect the battery by both wires (resistances and ) simultaneously in parallel, so that current can flow through either wire? Since wires of resistances and still have voltage , Ohm’s law still applies and thus the total current is
The net effect of connecting resisting wires of resistances and is the same as a single resisting wire of resistance
We call the the operation the parallel sum of and because it describes how resistances combine when connected in parallel.
The parallel sum operation can be extended to all nonnegative numbers and by continuity:
Electrically, this definition states that one obtains a short circuit (resistance ) if either of the wires carries zero resistance.
Parallel sums of matrices were introduced by William N. Anderson, Jr. and Richard J. Duffin as a generalization of the parallel sum operation from electrical circuits. There are several equivalent definitions. The natural definition (8) applies for positive definite matrices
This definition can be extended to positive semidefinite matrices by continuity. It can be useful to have an explicit definition which applies even to positive semidefinite matrices. To do so, observe that the (scalar) parallel sum satisfies the identity
To extend to matrices, we capitalize the letters and use the Moore–Penrose pseudoinverse in place of the inverse:
This was the definition (6) of the parallel sum we gave above which we found naturally in our randomized SVD error bounds.
The parallel sum enjoys a number of beautiful properties, some of which we list here:
- Symmetry. ,
- Simplified formula. for positive definite matrices,
- Bounds. and .
- Monotone. is monotone in the Loewner order.
- Concavity. The map is (jointly) concave (with respect to the Loewner order).
- Conjugation. For any square , .
- Trace. .2In fact, this property holds with the trace replaced by any positive linear map. That is, if is a linear map from square matrices to square matrices satisfying if , then .
- Unitarily invariant norms. .
For completionists, we include a proof of the last property for reference.
Nearing the Finish: Enter the Randomness
Equipped with knowledge about parallel sums, we are now prepared to return to bounding the error of the randomized SVD. Apply property 8 of the parallel sum to the randomized SVD bound (7) to obtain
This bound is totally deterministic: It holds for any choice of test matrix , random or otherwise. We can use this bound to analyze the randomized SVD in many different ways for different kinds of random (or non-random) matrices . Tropp and Webber provide a number of examples instantiating this general bound in different ways.
We shall present only the simplest error bound for randomized SVD. We make the following assumptions:
- The test matrix is standard Gaussian.4By which, we mean the entries of are independent and standard normal.
- The norm is the Frobenius norm .
- The randomized SVD rank is .
Under these assumptions and using the property , we obtain the following expression for the expected error of the randomized SVD
All that is left to is compute the expectation of . Remarkably, this can be done in closed form.
Aside: Expectation of Gaussian–Inverse Gaussian Matrix Product
The final ingredient in our randomized SVD analysis is the following computation involving Gaussian random matrices. Let and be independent standard Gaussian random matrices with of size , . Then
For the probability lovers, we will now go on to prove this. For those who are less probabilistically inclined, this result can be treated as a black box.
Wrapping it All Up
We’re now within striking distance. All that is left is to assemble the pieces we have been working with to obtain a final bound for the randomized SVD.
Recall we have chosen the random matrix to be standard Gaussian. By the rotational invariance of the Gaussian distribution, and are independent and standard Gaussian as well. Plugging the matrix expectation bound to (9) then completes the analysis