Today, I want to talk about the generalized Nyström approximation, which I regard as the one of the “big three” approaches to constructing a low-rank approximation to matrix.1The other two approaches are the “projection approximation”/randomized SVD approach and the Nyström approximation. Understanding this approximation, under what conditions it works and the sharpest possible error bounds for it, is a subject of two recent papers of mine:
- Faster Randomized Linear Algebra with Structured Random Matrices, joint with Chris Camaño, Raphael Meyer, and Joel Tropp.
- Sharp analysis of sketched least squares and randomized low-rank approximation, joint with Robert Webber.
On the occasion of the release of the second paper this morning, I felt it was a good time to talk about the generalized Nyström approximation on this blog. In this post, I will try and motivate the generalized Nyström approximation, describing the motivation for the method and when it might be preferable to alternatives.
Existing Characters: Nyström Approximation and the Randomized SVD
To begin our story, let me begin with a reminder of a couple of characters we’ve met in previous installments of this blog, randomized Nyström approximation and the randomized SVD.
Randomized Nyström approximation is a method for producing a low-rank approximation to a positive semidefinite2For this post, a positive semidefinite matrix will always be real symmetric or complex Hermitian. We will focus only on real matrices for this post, though the extension to complex matrices is straightforward. (psd) matrix
. For form this approximation, begin by drawing a random test matrix
, say, with independent standard normal random entries. (We will have more to say about the choice of
below). Using this test matrix, the Nyström approximation is defined as3Here, the inverse
should be interpreted as a Moore–Penrose pseudoinverse in the event where
is singular.
![]()
The randomized SVD is a method for constructing a low-rank approximation to a general, non-symmetric or even non-square matrix
. Again, begin by constructing a random test matrix
. To construct a low-rank approximation, compute the product
and orthonormalize its columns (e.g., by QR decomposition) to obtain
![]()
![]()
How do these two algorithms compare? There are at least three major differences between the two algorithms. Here are the first two:
- Scope. Nyström approximation applies only to psd matrices, and randomized SVD applies to a general rectangular mtrix.
- Single-pass? The Nyström approximation requires only a single pass over the matrix
to form. (Each entry of
needs to be read once to form the product
, after which we have all the information we need from
to form
.) By contrast, the randomized SVD requires two passes, one to compute
and a second to compute
.
The third point is more subtle and concerns the accuracy of these algorithms. As we saw in a previous post, the randomized SVD approximation satisfies the error bound
(1) ![]()
Here is analogous bound for the randomized Nyström approximation, taken from Corollary 8.3 in this paper of Tropp and Webber:
(2) ![]()
The nuclear norm
![]()
The conclusion of this discussion is that, for matrices with slowly decaying eigenvalues4Recall that the eigenvalues and singular values of a psd matrix coincide., the the randomized Nyström error bound (2) can be much larger than the randomized SVD error bound (1). For such problems, the error of the randomized SVD can be much smaller than the error of randomized Nyström approximation. We add this to our list of comparisons
- Frobenius-norm error bounds? The (Frobenius-norm) error of the randomized SVD
is bounded in terms of the Frobenius-norm error of the best rank-
approximation
for
. For the Nyström approximation, the error
is bounded by a more complicated expression that also involves the nuclear norm of the best rank-
approximation.
Generalized Nyström Approximation: The Best of Both Worlds
It is natural to ask: Is there one algorithm that achieves the positive attributes of both the randomized Nyström and randomized SVD algorithms? Is there a single-pass low-rank approximation algorithm that can be applying to any rectangular matrix and achieves Frobenius-norm error bounds? The answer is yes, and we will derive such an approximation now.
As with the randomized SVD and randomized Nyström approximation, we first compute the product
of
with a random matrix
. We may then search for the best approximation
to
spanned by the columns of
. Such an approximation takes the form
, and we may find the best
by solving a least-squares problem
(3) ![]()
To obtain a one-pass algorithm, we need a faster way of computing an (approximate) solution to the least-squares problem (3). As we have seen before on this blog, sketching provides a natural approach to quickly and approximately solving a least-squares problem. Specifically, we draw another random test matrix
and solve the “sketched” least squares problem
(4) ![]()
![]()
![]()
History
In the modern randomized linear algebra literature, the generalized Nyström approximation appears to have been concurrently discovered by Woolfe, Liberty, Rokhlin, & Tygert (2008) and Clarkson & Woodruff (2009). An algebraically equivalent but more numerically stable version of the generalized Nyström approximation was developed by Tropp, Yurtsever, Udell, & Cevher (2017). Nakatsukasa (2020) re-examined the low-rank approximation format, developed a different class of numerical stable implementations, and suggested the name generalized Nyström approximation. Alex Townsend and Per-Gunnar Martinsson trace the origins of this low-rank approximation format far earlier back to the works of Wedderburn (1934).
Implementation
This post is concerned with the generalized Nyström approximation as a type of low-rank approximation format. To use generalized Nyström approximation in practice, one must use an appropriate algorithm which computes the decomposition in a stable way.
Perhaps the simplest algorithm for computing a generalized Nyström approximation was studied by Nakatsukasa (2020). One begins by computing the matrices
![]()
Relationship to Other Formats
As the name suggests, the generalized Nyström approximation format generalizes the Nyström approximation beyond psd matrices. Indeed, the standard Nyström approximation
![]()
Perhaps less obviously, the generalized Nyström approximation also generalizes the randomized SVD approximation. Indeed, the randomized SVD approximation
is the generalized Nyström approximation with trivial right test matrix
.
Generalized Nyström Approximation = Sketch-and-Solve + Randomized SVD
What is the generalized Nyström approximation? There are several interpretations. For instance, if
and
is invertible, the generalized Nyström approximation is the unique approximation satisfying the interpolatory condition
![]()
Notwithstanding the validity and usefulness of other interpretations, my view is that the most useful interpretation of generalized Nyström approximation is the one we started with:
Generalized Nyström approximation is a sketched version of the randomized SVD approximation.
To see this insight in action, we will use it to analyze the generalized Nyström approximation with Gaussian test matrices. Let
and
be populated with independent standard Gaussian random entries, and as we have been, assume
. Let us analyze the expected (squared) Frobenius-norm error of the generalized Nyström approximation.
We will use the following result for sketching with a Gaussian embedding, due to Bartan & Pilanci (2020) and discussed in this previous blog post.
Theorem 1 (sketch-and-solve): Consider a (matrix) least-squares problem
with dimensions
and
. Let
be a (standard) Gaussian test matrix, and instate the sketch-and-solve solution
Then
We can apply this result to the generalized Nyström approximation by setting
. Let
denote the expectation over
alone. Then
![]()
![Rendered by QuickLaTeX.com \begin{align*}\expect \norm{B - B\Omega(\Phi^\top B\Omega)^\dagger\Phi^\top B}_{\rm F}^2 &= \expect_\Omega\left[\expect_\Phi \norm{B - B\Omega(\Phi^\top B\Omega)^\dagger\Phi^\top B}_{\rm F}^2\right] \\&= \left( 1 + \frac{k}{p - (k + 1)} \right)\expect_\Omega \norm{B - (B\Omega)(B\Omega)^\dagger B}_{\rm F}^2 \\&\le \left( 1 + \frac{k}{p - (k + 1)} \right)\left[\min_{r < k-1}\left( 1 + \frac{r}{k - (r + 1)} \right) \norm{B - \lowrank{B}_r}_{\rm F}^2\right].\end{align*}](https://www.ethanepperly.com/wp-content/ql-cache/quicklatex.com-f6a0cfb074737e67fcb3b172f5d5badc_l3.png)
Theorem 2 (generalized Nyström approximation): With the present setting, it holds that
This result is Theorem 4.3 in this paper of Tropp, Yurtsever, Udell, & Cevher (2017). A slight refinement of this bound appears in my paper with Robert Webber, and we show that our new bound is sharp on hard examples. Thus, Theorem 2 is nearly the best possible error bound for the generalized Nyström approximation.
Choice of Random Matrix
For most of this post, we have focused on the cases where the random test matrices
and
are unstructured matrices with Gaussian random entries. But can we use more structured random test matrices? Say, sparse test matrices? Do these lead to faster low-rank approximation algorithms?
For the randomized SVD, the results are disappointing. Computing
with a sparse random matrix is fast. But then we compute
and compute
; the matrix
is dense and unstructured, so computing
is slower and all benefits of the sparse test matrix have been erased.
The issue with the randomized SVD is that it’s a two-pass algorithm: The first pass, computing
, can be done using a sparse random test matrix. But the second pass
requires a matrix product with a dense matrix
.
The situation is much better for the generalized Nyström approximation, which requires only a single pass and can be implemented only by multiplying
against sparse matrices. Indeed, generating
and
to be sparse matrices, the generalized Nyström approximation can be written
![]()
Structured test matrices, like sparse ones, can be very powerful. But basic theoretical questions remain about their properties. We tackle these theoretical questions in my new paper (joint with Chris Camaño, Raphael Meyer, and Joel Tropp), and we provide experiments demonstrating how structured sketching matrices can lead to large speedups in generalized Nyström approximation and other linear algebra tasks. I think it’s a really neat paper, and my wonderful collaborator Chris did some really beautiful experiments for it. I hope you’ll check it out!