Stochastic Trace Estimation

I am delighted to share that me, Joel A. Tropp, and Robert J. Webber‘s paper XTrace: Making the Most of Every Sample in Stochastic Trace Estimation has recently been released as a preprint on arXiv. In it, we consider the implicit trace estimation problem:

Implicit trace estimation problem: Given access to a square matrix A via the matrix–vector product operation \omega \mapsto A\omega, estimate its trace \tr A = \sum_{i=1}^n A_{ii}.

Algorithms for this task have many uses such as log-determinant computations in machine learning, partition function calculations in statistical physics, and generalized cross validation for smoothing splines. I described another application to counting triangles in a large network in a previous blog post.

Our paper presents new trace estimators XTrace and XNysTrace which are highly efficient, producing accurate trace approximations using a small budget of matrix–vector products. In addition, these algorithms are fast to run and are supported by theoretical results which explain their excellent performance. I really hope that you will check out the paper to learn more about these estimators!

For the rest of this post, I’m going to talk about the most basic stochastic trace estimation algorithm, the GirardHutchinson estimator. This seemingly simple algorithm exhibits a number of nuances and forms the backbone for more sophisticated trace estimates such as Hutch++, Nyström++, XTrace, and XNysTrace. Toward the end, this blog post will be fairly mathematical, but I hope that the beginning will be fairly accessible to all.

Girard–Hutchinson Estimator: The Basics

The GirardHutchinson estimator for the trace of a square matrix A is

(1)   \[\hat{\tr} = \frac{1}{m} \sum_{i=1}^m \omega_i^* A \omega_i. \]

Here, \omega_1,\ldots,\omega_m are random vectors, usually chosen to be statistically independent, and {}^* denotes the conjugate transpose of a vector or matrix. The Girard–Hutchinson estimator only depends on the matrix A through the matrix–vector products A\omega_1,\ldots,A\omega_m.

Unbiasedness

Provided the random vectors are isotropic

(2)   \[\mathbb{E} [\omega_i\omega_i^*] = I, \]

the Girard–Hutchinson estimator is unbiased:

(3)   \[\mathbb{E} [\hat{\tr}] = \tr A.\]

Let us confirm this claim in some detail. First, we use linearity of expectation to evaluate

(4)   \[\mathbb{E} [\hat{\tr}] = \mathbb{E} \left[ \frac{1}{m} \sum_{i=1}^m \omega_i^*A\omega_i \right] = \frac{1}{m} \sum_{i=1}^m \mathbb{E} \left[ \omega_i^* A \omega_i\right]. \]

Therefore, to prove that \mathbb{E} [\hat{\tr}] = \tr A, it is sufficient to prove that \mathbb{E} \left[\omega_i^*A\omega_i\right] = \tr A for each i.

When working with traces, there are two tricks that solve 90% of derivations. The first trick is that, if we view a number as a 1\times 1 matrix, then a number equals its trace, x = \tr x. The second trick is the cyclic property: For a k\times p matrix B and a p\times k matrix C, we have \tr (BC) = \tr (CB). The cyclic property should be handled with care when one works with a product of three or more matrices. For instance, we have

    \[\tr[BCD] = \tr[(BC)D] = \tr[D(BC)] = \tr[DBC].\]

However,

    \[\tr [BCD] \ne \tr[CBD] \quad \text{in general}.\]

One should think of the matrix product BCD as beads on a closed loop of string. One can move the last bead D to the front of the other two, \tr [BCD] = \tr[DBC], but not interchange two beads, \tr[BCD] \ne \tr[CBD].

With this trick in hand, let’s return to proving that \mathbb{E} \left[\omega_i^*A\omega_i\right] = \tr A for every i. Apply our two tricks:

    \[\mathbb{E} \left[\omega_i^*A\omega_i\right] = \mathbb{E} \tr \left[\omega_i^*A\omega_i\right] = \mathbb{E} \tr \left[A\omega_i\omega_i^*\right].\]

The expectation is a linear operation and the matrix A is non-random, so we can bring the expectation into the trace as

    \[\mathbb{E} \left[\omega_i^*A\omega_i\right] = \mathbb{E} \tr \left[A\omega_i\omega_i^*\right] = \tr(A \mathbb{E}[\omega_i\omega_i^*] ).\]

Invoke the isotropy condition (2) and conclude:

    \[\mathbb{E} \left[\omega_i^*A\omega_i\right] = \tr(A \mathbb{E}[\omega_i\omega_i^*] ) = \tr(A\cdot I) = \tr A.\]

Plugging this into (4) confirms the unbiasedness claim (3).

Variance

Continue to assume that the \omega_i‘s are isotropic (3) and now assume that \omega_1,\ldots,\omega_m are independent. By independence, the variance can be written as

    \[\Var(\hat{\tr}) = \frac{1}{m^2} \sum_{i=1}^m \Var(\omega_i^*A\omega_i).\]

Assuming that \omega_1,\ldots,\omega_m are identically distributed \omega_1,\ldots,\omega_m \sim \omega, we then get

    \[\Var(\hat{\tr}) = \frac{1}{m} \Var(\omega^*A\omega).\]

The variance decreases like 1/m, which is characteristic of Monte Carlo-type algorithms. Since \hat{\tr} is unbiased (i.e, (3)), this means that the mean square error decays like 1/m so the average error (more precisely root-mean-square error) decays like

    \[\left| \hat{\tr} - \tr A \right| \lessapprox \frac{\mathrm{const}}{\sqrt{m}}.\]

This type of convergence is very slow. If I want to decrease the error by a factor of 10, I must do 100\times the work!

Variance-reduced trace estimators like Hutch++ and our new trace estimator XTrace improve the rate of convergence substantially. Even in the worst case, Hutch++ and XTrace reduce the variance at a rate 1/m^2 and (root-mean-square) error at rates 1/m:

    \[\Var(\hat{\tr}_{\text{H++ or X}}) \le \frac{\mathrm{const}}{m^2},\quad \left| \hat{\tr}_{\text{H++ or X}} - \tr A \right| \lessapprox \frac{\mathrm{const}}{m}.\]

For matrices with rapidly decreasing singular values, the variance and error can decrease much faster than this.

Variance Formulas

As the rate of convergence for the Girard–Hutchinson estimator is so slow, it is imperative to pick a distribution on test vectors \omega that makes the variance of the single–sample estimate \omega^*A\omega as low as possible. In this section, we will provide several explicit formulas for the variance of the Girard–Hutchinson estimator. Derivations of these formulas will appear at the end of this post. These variance formulas help illuminate the benefits and drawbacks of different test vector distributions.

To express the formulas, we will need some notation. For a complex number z = a + bi we use \Re(z) = a and \Im(z) = b to denote the real and imaginary parts. The variance of a random complex number z is

    \[\Var(z) := \mathbb{E} |z - \mathbb{E} z|^2 = \Var(\Re z) + \Var(\Im z).\]

The Frobenius norm of a matrix A is

    \[\left\|A\right\|_{\rm F}^2 = \sum_{i,j} |A_{ij}|^2.\]

If A is real symmetric or complex Hermitian with (real) eigenvalues \lambda_1,\ldots,\lambda_n, we have

(5)   \[\left\|A\right\|_{\rm F}^2 = \sum_{i=1}^n \lambda_i^2. \]

A^\top denotes the ordinary transpose of A and A^* denotes the conjugate transpose of A.

Real-Valued Test Vectors

We first focus on real-valued test vectors \omega. Since \omega is real, we can use the ordinary transpose {}^\top rather than the conjugate transpose {}^*. Since \omega^\top A\omega is a number, it is equal to its own transpose:

    \[\omega^\top A \omega = (\omega^\top A \omega)^\top = \omega^\top A^\top \omega.\]

Therefore,

    \[\omega^\top A\omega = \frac{\omega^\top A \omega + \omega^\top A^\top \omega}{2} = \omega^\top \left( \frac{A + A^\top}{2} \right)\omega.\]

The Girard–Hutchinson trace estimator applied to A is the same as the Girard–Hutchinson estimator applied to the symmetric part of A, (A+A^\top)/2.

For the following results, assume A is symmetric, A = A^\top.

  1. Real Gaussian: \omega_1,\ldots,\omega_m are independent standard normal random vectors.

        \[\Var(\omega^\top A\omega) = 2 \left\|A\right\|_{\rm F}^2.\]

  2. Uniform signs (Rademachers): \omega_1,\ldots,\omega_m are independent random vectors with uniform \pm 1 coordinates.

        \[\Var(\omega^\top A \omega) = 2\sum_{i\ne j} |A_{ij}|^2.\]

  3. Real sphere: Assume \omega_1,\ldots,\omega_n are uniformly distributed on the real sphere of radius \sqrt{n}: \omega \sim \text{Uniform} \{x\in \mathbb{R}^n : x^\top x = n\}.

        \[\Var(\omega^\top A\omega) = \frac{2n}{n+2} \left( \left\|A\right\|_{\rm F}^2 - \frac{1}{n} |\tr A|^2 \right).\]

These formulas continue to hold for nonsymmetric A by replacing A by its symmetric part (A+A^\top)/2 on the right-hand sides of these variance formulas.

Complex-Valued Test Vectors

We now move our focus to complex-valued test vectors \omega. As a rule of thumb, one should typically expect that the variance for complex-valued test vectors applied to a real symmetric matrix A is about half the natural real counterpart—e.g., for complex Gaussians, you get about half the variance than with real Gaussians.

A square complex matrix has a Cartesian decomposition

    \[A = A^{\rm H} + i A^{\rm SH}\]

where

    \[A^{\rm H} = \frac{A+A^*}{2} ,\quad A^{\rm SH} = \frac{A - A^*}{2i}\]

denote the Hermitian and skew-Hermitian parts of A. Similar to how the imaginary part of a complex number is real, the skew-Hermitian part of a complex matrix is Hermitian (and i A^{\rm SH} is skew-Hermitian). Since A^{\rm H} and A^{\rm SH} are both Hermitian, we have

    \[\Re(\omega^* A\omega) = \omega^* A^{\rm H} \omega, \quad \Im (\omega^* A \omega) = \omega^* A^{\rm SH} \omega.\]

Consequently, the variance of \omega^*A \omega can be broken into Hermitian and skew-Hermitian parts:

    \[\Var(\omega^* A\omega) = \Var(\omega^* A^{\rm H}\omega) + \Var(\omega^* A^{\rm SH}\omega).\]

For this reason, we will state the variance formulas only for Hermitian A, with the formula for general A following from the Cartesian decomposition.

For the following results, assume A is Hermitian, A = A^*.

  1. Complex Gaussian: \omega_1,\ldots,\omega_m are independent standard complex random vectors, i.e., each \omega_i has iid entries distributed as (g_1+ig_2)/\sqrt{2} for g_1,g_2 standard normal random variables.

        \[\Var(\omega^* A\omega) = \left\|A\right\|_{\rm F}^2.\]

  2. Uniform phases (Steinhauses): \omega_1,\ldots,\omega_m are independent random vectors whose entries are uniform on the complex unit circle \{ z \in \complex : |z| \}.

        \[\Var(\omega^* A \omega) = \sum_{i\ne j} |A_{ij}|^2.\]

  3. Complex sphere: Assume \omega_1,\ldots,\omega_n are uniformly distributed on the complex sphere of radius \sqrt{n}: \omega \sim \text{Uniform} \{x\in \complex^n : x^* x = n\}.

        \[\Var(\omega^* A\omega) = \frac{n}{n+1} \left( \left\|A\right\|_{\rm F}^2 - \frac{1}{n} |\tr A|^2 \right).\]

Optimality Properties

Let us finally address the question of what the best choice of test vectors is for the Girard–Hutchinson estimator. We will state two results with different restrictions on \omega_1,\ldots,\omega_m.

Our first result, due to Hutchinson, is valid for real symmetric matrices with real test vectors.

Optimality (independent test vectors with independent coordinates). If the test vectors \omega_1,\ldots,\omega_m \in \mathbb{R}^n are isotropic (2), independent from each other, and have independent entries, then for any fixed real symmetric matrix A, the minimum variance for \hat{\tr} is obtained when \omega_1,\ldots,\omega_m are populated with random signs (\omega_i)_j \sim \textnormal{Uniform} \{\pm 1\}.

The next optimality results will have real and complex versions. To present the results for \mathbb{R}-valued and an \complex-valued test vectors on unified footing, let \field denote either \mathbb{R} or \complex. We let a \field-Hermitian matrix be either a real symmetric matrix (if \field = \mathbb{R}) or a complex Hermitian matrix (if \field = \complex). Let a \field-unitary matrix be either a real orthogonal matrix (if \field = \mathbb{R}) or a complex unitary matrix (if \field = \complex).

The condition that the vectors \omega_1,\ldots,\omega_m have independent entries is often too restrictive in practice. It rules out, for instance, the case of uniform vectors on the sphere. If we relax this condition, we get a different optimal distribution:

Optimality (independent test vectors). Consider any set \mathscr{A} of \field-Hermitian matrices which is invariant under \field-unitary similary transformations:

    \[\text{If $A \in \mathscr{A}$ and $U$ is $\field$-unitary, then $U^*AU \in \mathscr{A}$.}\]

Assume that the test vectors \omega_1,\ldots,\omega_m are independent and isotropic (2). The worst-case variance \sup_{A \in \mathscr{A}} \Var(\hat{\tr}) is minimized by choosing \omega_1,\ldots,\omega_m uniformly on the \field-sphere: \omega_1,\ldots,\omega_m \sim \text{Uniform} \{ x \in \field^n : x^*x =n \}.

More simply, if you wants your stochastic trace estimator to be effective for a class of inputs \mathscr{A} (closed under \field-unitary similarity transformations) rather than a single input matrix A, then the best distribution are test vectors drawn uniformly from the sphere. Examples of classes of matrices \mathscr{A} include:

  • Fixed eigenvalues. For fixed real eigenvalues \lambda_1,\ldots,\lambda_n \in \mathbb{R}, the set of all \field-Hermitian matrices with these eigenvalues.
  • Density matrices. The class of all trace-one psd matrices.
  • Frobenius norm ball. The class of all \field-Hermitian matrices of Frobenius norm at most 1.

Derivation of Formulas

In this section, we provide derivations of the variance formulas. I have chosen to focus on derivations which are shorter but use more advanced techniques rather than derivations which are longer but use fewer tricks.

Real Gaussians

First assume A is real. Since A is real symmetric, A has an eigenvalue decomposition A = Q\Lambda Q^\top, where Q is orthogonal and \Lambda is a diagonal matrix reporting A‘s eigenvalues. Since the real Gaussian distribution is invariant under orthogonal transformations, \omega^\top A\omega = (Q^\top \omega)^\top \Lambda (Q^\top\omega) has the same distribution as \omega^\top \Lambda \omega. Therefore,

    \[\Var(\omega^\top A \omega) = \Var(\omega^\top \Lambda \omega) = \Var \left( \sum_{i=1}^n \lambda_i \omega_i^2 \right) = \sum_{i=1}^n \lambda_i^2 \Var(\omega_i^2) = 2\sum_{i=1}^n \lambda_i^2 = 2\left\|A\right\|_{\rm F}^2.\]

Here, we used that the variance of a squared standard normal random variable is two.

For A non-real matrix, we can break the matrix A into its entrywise real and imaginary parts A = \mathfrak{R}(A) + i \, \mathfrak{I}(A). Thus,

    \[\Var(\omega^\top A \omega) = \Var(\omega^\top \mathfrak{R}(A) \omega) + \Var(\omega^\top \mathfrak{I}(A) \omega) = 2\left\|\mathfrak{R}(A)\right\|_{\rm F}^2 + 2\left\|\mathfrak{I}(A)\right\|_{\rm F}^2 = 2\left\|A\right\|_{\rm F}^2.\]

Uniform Signs

First, compute

    \[\omega^\top A \omega - \mathbb{E}[\omega^\top A \omega] = \sum_{i,j=1}^n A_{ij} \omega_i\omega_j - \sum_{i=1}^n A_{ii} = \sum_{i\ne j} A_{ij} \omega_i\omega_j + \sum_{i=1}^n A_{ii}(\omega_i^2-1).\]

For a vector \omega of uniform random signs, we have \omega_i^2 = 1 for every i, so the second sum vanishes. Note that we have assumed A symmetric, so the sum over i\ne j can be replaced by two times the sum over i < j:

    \[\omega^\top A \omega - \mathbb{E}[\omega^\top A \omega] = 2\sum_{i< j} A_{ij} \omega_i\omega_j.\]

Note that \{ \omega_i \omega_j : i < j\} are pairwise independent. As a simple exercise, one can verify that the identity

    \[\Var(a_1 X_1+\cdots+a_kX_k) = |a_1|^2 \Var(X_1) + \cdots + |a_k|^2 \Var(X_k)\]

holds for any pairwise independent family of random variances X_1,\ldots,X_k and numbers a_1,\ldots,a_k. Ergo,

    \begin{align*}\Var(\omega^\top A\omega) &= \Var(\omega^\top A \omega - \mathbb{E}[\omega^\top A \omega]) \\&= \Var\left(\sum_{i< j} 2A_{ij} \omega_i\omega_j\right) \\&= \sum_{i<j} 4 |A_{ij}|^2 \Var(\omega_i\omega_j) \\&= \sum_{i<j} 4 |A_{ij}|^2 \\&= 2 \sum_{i\ne j} |A_{ij}|^2.\end{align*}

In the second-to-last line, we use the fact that \omega_i\omega_j is a uniform random sign, which has variance 1. The final line is a consequence of the symmetry of A.

Uniform on the Real Sphere

The simplest proof is I know is by the “camel principle”. Here’s the story (a lightly edited quotation from MathOverflow):

A father left 17 camels to his three sons and, according to the will, the eldest son was to be given a half of the camels, the middle son one-third, and the youngest son the one-ninth. The sons did not know what to do since 17 is not evenly divisible into either two, three, or nine parts, but a wise man helped the sons: he added his own camel, the oldest son took 18/2=9 camels, the second son took 18/3=6 camels, the third son 18/9=2 camels and the wise man took his own camel and went away.

We are interested in a vector \omega which is uniform on the sphere of radius \sqrt{n}. Performing averages on the sphere is hard, so we add a camel to the problem by “upgrading” \omega to a spherically symmetric vector g which has a random length. We want to pick a distribution for which the computation \Var(g^\top A g) is easy. Fortunately, we already know such a distribution, the Gaussian distribution, for which we already calculated \Var(g^\top A g) = 2\left\|A\right\|_{\rm F}^2.

The Gaussian vector g and the uniform vector \omega on the sphere are related by

    \[g = \sqrt{\frac{a}{n}} \omega,\]

where a is the squared length of the Gaussian vector g. In particular, a has the distribution of the sum of n squared Gaussian random variables, which is known as a \chi^2 random variable with n degrees of freedom.

Now, we take the camel back. Compute the variance of g^\top A g using the chain rule for variance:

    \[\Var(g^\top A g) = \mathbb{E}[\Var(g^\top A g \mid a)] + \Var(\mathbb{E}[g^\top A g \mid a]).\]

Here, \Var(\cdot \mid a) and \mathbb{E}[ \cdot \mid a] denote the conditional variance and conditional expectation with respect to the random variable a. The quick and dirty ways of working with these are to treat the random variable a “like a constant” with respect to the conditional variance and expectation.

Plugging in the formula g = \sqrt{a/n} \cdot \omega and treating a “like a constant”, we obtain

    \begin{align*}\Var(g^\top A g) &= \mathbb{E}[\Var(a/n \cdot \omega^\top A \omega \mid a)] + \Var(\mathbb{E}[a/n \cdot \omega^\top A \omega \mid a]) \\&=\mathbb{E}[(a/n)^2\Var(\omega^\top A \omega)] + \Var(a/n \cdot \mathbb{E}[\omega^\top A \omega]) \\&= \frac{1}{n^2} \mathbb{E}[a^2] \cdot \Var(\omega^\top A \omega) + \frac{1}{n^2} \Var(a) |\mathbb{E} [\omega^\top A \omega]|^2.\end{align*}

As we mentioned, a is a \chi^2 random variable with n degrees of freedom and \mathbb{E}[a^2] and \Var(a) are known quantities that can be looked up:

    \[\mathbb{E}[a^2] = n(n+2), \quad \Var(a) = 2n.\]

We know \Var(g^\top A g) = 2\left\|A\right\|_{\rm F}^2 and \mathbb{E} [\omega^\top A \omega] = \tr A. Plugging these all in, we get

    \[2\left\|A\right\|_{\rm F}^2 = \frac{n+2}{n} \Var(\omega^\top A\omega) + \frac{2}{n} |\tr A|^2.\]

Rearranging, we obtain

    \[\Var(\omega^\top A\omega) = \frac{2n}{n+2} \left( \left\|A\right\|_{\rm F}^2 - \frac{1}{n}|\tr A|^2\right).\]

Complex Gaussians

The trick is the same as for real Gaussians. By invariance of complex Gaussian random vectors under unitary transformations, we can reduce to the case where A is a diagonal matrix populated with eigenvalues \lambda_1,\ldots,\lambda_n. Then

    \[\Var(\omega^*A \omega) = \Var \left( \sum_{i=1}^n \lambda_i |\omega_i|^2 \right) = \sum_{i=1}^n \Var(|\omega_i|^2) \lambda_i^2 = \sum_{i=1}^n \lambda_i^2 = \left\|A\right\|_{\rm F}^2.\]

Here, we use the fact that 2|\omega_i|^2 is a \chi^2 random variable with two degrees of freedom, which has variance four.

Random Phases

The trick is the same as for uniform signs. A short calculation (remembering that A is Hermitian and thus \overline{A_{ij}} = A_{ji}) reveals that

    \[\Var\left( \omega^* A \omega \right) = \Var \left( \sum_{i<j} 2 \Re(A_{ij} \overline{\omega_i} \omega_j) \right).\]

The random variables \{\overline{\omega_i} \omega_j : i < j\} are pairwise independent so we have

    \[\Var\left( \omega^* A \omega \right) = \Var \left( \sum_{i<j} 2 \Re(A_{ij} \overline{\omega_i} \omega_j) \right) = 4\sum_{i<j} \Var \left( \Re(A_{ij} \overline{\omega_i} \omega_j) \right).\]

Since \overline{\omega}_i \omega_j is uniformly distributed on the complex unit circle, we can assume without loss of generality that A_{ij} = |A_{ij}|. Thus, letting \phi be uniform on the complex unit circle,

    \[\Var\left( \omega^* A \omega \right) = 4\sum_{i<j} \Var \left( |A_{ij}|\Re(\phi)) \right) = 4\Var\left( \Re(\phi) \right)\sum_{i<j}|A_{ij}|^2.\]

The real and imaginary parts of \phi have the same distribution so

    \[1 = \Var(\phi) = \Var(\Re \phi) + \Var(\Im \phi) = 2 \Var(\Re \phi)\]

so \Var(\Re \phi) = 1/2. Thus

    \[\Var\left( \omega^* A \omega \right) = 2 \sum_{i<j}|A_{ij}|^2 = \sum_{i\ne j} |A_{ij}|^2.\]

Uniform on the Complex Sphere: Derivation 1 by Reduction to Real Case

There are at least three simple ways of deriving this result: the camel trick, reduction to the real case, and Haar integration. Each of these techniques illustrates a trick that is useful in its own right beyond the context of trace estimation. Since we have already seen an example of the camel trick for the real sphere, I will present the other two derivations.

Let us begin with the reduction to the real case. Let \mathfrak{R}(\cdot) and \mathfrak{I}(\cdot) denote the real and imaginary parts of a vector or matrix, taken entrywise. The key insight is that if \omega is a uniform random vector on the complex sphere of radius \sqrt{n}, then

    \[\mathscr{R}(\omega) := \twobyone{\mathfrak{R}(\omega)}{\mathfrak{I}(\omega)}\in\real^{2n} \quad \text{is a uniform random vector on the real sphere of radius $\sqrt{n}$}.\]

We’ve converted the complex vector \omega into a real vector \mathscr{R}(\omega).

Now, we need to convert the complex matrix A into a real matrix \mathscr{R}(A). To do this, recall that one way of representing complex numbers is by 2\times 2 matrices:

    \[a + bi \iff \twobytwo{a}{-b}{b}{a}.\]

Using this correspondence addition and multiplication of complex numbers can be carried by addition and multiplication of the corresponding matrices.

To convert complex matrices to real matrices, we use a matrix-version of the same representation:

    \[\mathscr{R}(A) = \twobytwo{\mathfrak{R}(A)}{-\mathfrak{I}(A)}{\mathfrak{I}(A)}{\mathfrak{R}(A)}.\]

One can check that addition and multiplication of complex matrices can be carried out by addition and multiplication of the corresponding “realified” matrices, i.e.,

    \[\mathscr{R}(A + B) = \mathscr{R}(A) + \mathscr{R}(B), \quad \mathscr{R}(A\cdot B) = \mathscr{R}(A) \cdot \mathscr{R}(B)\]

holds for all complex matrices A and B.

We’ve now converted complex matrix A and vector \omega into real matrix \mathscr{R}(A) and vector \mathscr{R}(\omega). Let’s compare \omega^*A\omega to \mathscr{R}(\omega)^\top\mathscr{R}(A)\mathscr{R}(\omega). A short calculation reveals

    \[\omega^*A\omega = \mathscr{R}(\omega)^\top \mathscr{R}(A)\mathscr{R}(\omega) .\]

Since \mathscr{R}(\omega) is a uniform random vector on the sphere of radius \sqrt{n}, \sqrt{2}\cdot \mathscr{R}(\omega) is a uniform random vector on the sphere of radius \sqrt{2n}. Thus, by the variance formula for the real sphere, we get

    \[\Var(\omega^*A\omega) = \Var[(\sqrt{2}\mathscr{R}(\omega))^\top (\mathscr{R}(A)/2)(\sqrt{2}\mathscr{R}(\omega) )] = \frac{4n}{2n+2} \left[ \|\mathscr{R}(A)/2\|_{\rm F}^2 - \frac{1}{8n}(\tr\mathscr{R}(A))^2 \right].\]

A short calculation verifies that \tr \mathscr{R}(A) = 2\tr A and \norm{\mathscr{R}(A)}_{\rm F}^2 = 2\|A\|_{\rm F}^2. Plugging this in, we obtain

    \[\Var(\omega^*A\omega)= \frac{n}{n+1} \left[ \|A\|_{\rm F}^2 - \frac{1}{n}(\tr A)^2  \right].\]

Uniform on the Complex Sphere: Derivation 2 by Haar Integration

The proof by reduction to the real case requires some cumbersome calculations and requires that we have already computed the variance in the real case by some other means. The method of Haar integration is more slick, but it requires some pretty high-power machinery. Haar integration may be a little bit overkill for this problem, but this technique is worth learning as it can handle some truly nasty expected value computations that appear, for example, in quantum information.

We seek to compute

    \[\mathbb{E} [(\omega^*A \omega)^2].\]

The first trick will be to write this expession using a single matrix trace using the tensor (Kronecker) product \otimes. For those unfamiliar with the tensor product, the main properties we will be using are

(6)   \[(A\otimes B) (C\otimes D) = (AB) \otimes (CD), \quad \tr(A\otimes B) = \tr A \cdot \tr B. \]

We saw in the proof of unbiasedness that

    \[\omega^* A \omega = \tr (\omega^*A\omega) = \tr (A \omega\omega^*).\]

Therefore, by (6),

    \[(\omega^*A\omega)^2 = (\tr [A \omega\omega^*])^2 = \tr [A\omega\omega^* \otimes A\omega\omega^*] = \tr [(A\otimes A) (\omega\omega^* \otimes \omega\omega^*)].\]

Thus, to evaluate \mathbb{E}[(\omega^*A\omega)^2], it will be sufficient to evaluate \mathbb{E}[\omega\omega^* \otimes \omega\omega^*]. Forunately, there is a useful formula for these expectation provided by a field of mathematics known as representation theory (see Lemma 1 in this paper):

    \[\mathbb{E}[ \omega\omega^* \otimes \omega\omega^*] = \frac{2n}{n+1} \operatorname{Proj}_{\operatorname{Sym}^2(\complex^n)}.\]

Here, \operatorname{Proj}_{\operatorname{Sym}^2(\complex^n)} is the orthogonal projection onto the space of symmetric two-tensors \operatorname{Sym}^2(\complex^n) = \operatorname{span} \{ v \otimes v : v \in \complex^n \}. Therefore, we have that

    \[\mathbb{E}[(\omega^*A\omega)^2] = \tr [(A\otimes A) \mathbb{E}(\omega\omega^* \otimes \omega\omega^*)] = \frac{2n}{n+1} \tr [(A\otimes A) \operatorname{Proj}_{\operatorname{Sym}^2(\complex^n)}].\]

To evalute the trace on the right-hand side of this equation, there is another formula (see Lemma 6 in this paper):

    \[\tr \left[(A\otimes B) \operatorname{Proj}_{\operatorname{Sym}^2(\complex^n)}\right] = \frac{1}{2} \left( \tr(AB) + \tr A \cdot \tr B \right).\]

Therefore, we conclude

    \begin{align*}\Var(\omega^* A \omega) &= \mathbb{E}[(\omega^*A\omega)^2] - (\mathbb{E}[\omega^*A\omega])^2 \\&= \frac{2n}{n+1}\tr [(A\otimes A) \operatorname{Proj}_{\operatorname{Sym}^2(\complex^n)}] - (\tr A)^2 \\&= \frac{n}{n+1}\left[ \tr A^2 + (\tr A)^2 \right] - (\tr A)^2 \\&= \frac{n}{n+1}\left[ \left\|A\right\|_{\rm F}^2 - \frac{1}{n} (\tr A)^2 \right].\end{align*}

Proof of Optimality Properties

In this section, we provide proofs of the two optimality properties.

Optimality: Independent Vectors with Independent Coordinates

Assume A is real and symmetric and suppose that \omega is isotropic (2) with independent coordinates. The isotropy condition

    \[\mathbb{E}[\omega\omega^\top] = I\]

implies that \mathbb{E}[\omega_i\omega_j] = \delta_{ij}, where \delta is the Kronecker symbol. Using this fact, we compute the second moment:

    \begin{align*}\mathbb{E}[ (\omega^*A \omega)^2] &= \mathbb{E}\left[ \left( \sum_{i=1}^n A_{ii} \omega_i^2 +2 \sum_{i<j} A_{ij}\omega_i\omega_j) \right)^2\right] \\&= \sum_{i=1}^n A_{ii}^2 \mathbb{E}[\omega_i^4] + \sum_{i<j} (2A_{ii}A_{jj}+4A_{ij}^2) \mathbb{E}[\omega_i^2]\mathbb{E}[\omega_j^2] \\&= \sum_{i=1}^n A_{ii}^2 \mathbb{E}[\omega_i^4] + \sum_{i<j} (2A_{ii}A_{jj}+4A_{ij}^2) .\end{align*}

Thus

    \[\Var(\omega^*A\omega) = \mathbb{E}[ (\omega^*A \omega)^2] - (\mathbb{E}[\omega^* A \omega])^2 = \sum_{i=1}^n A_{ii}^2 (\mathbb{E}[|\omega_i|^4]-1) + 4\sum_{i<j} A_{ij}^2.\]

The variance is minimized by choosing \omega with \mathbb{E} \omega_i^4 as small as possible. Since \mathbb{E} \omega_i^2 = 1, the smallest possible value for \mathbb{E} \omega_i^4 is \mathbb{E} \omega_i^4 = 1, which is obtained by populating \omega with random signs.

Optimality: Independent Vectors

This result appears to have first been proven by Richard Kueng in unpublished work. We use an argument suggested to me by Robert J. Webber.

Assume \mathscr{A} is a class of \field-Hermitian matrices closed under \field-unitary similarity transformations and that \omega is an isotropic random vector (2). Decompose the test vector as

    \[\omega = a \cdot s \quad \text{for} \quad a \in [0,+\infty), \: s \in\{x\in \field^n : x^*x = n \}.\]

First, we shall show that the variance is reduced by replacing s with a vector t drawn uniformly from the sphere

(7)   \[\sup_{A\in\mathscr{A}} \Var(\tilde{\omega}^*A\tilde{\omega}) \le \sup_{A\in\mathscr{A}} \Var(\omega^*A\omega \]

where

(8)   \[\tilde{\omega} = a\cdot t \quad \text{and}\quad t\sim \text{Uniform} \{ x \in \field^n :x^*x = n \} \quad \text{is independent of $a$}. \]

Note that such a t can be generated as t = Qs for a uniformly random \field-unitary matrix Q. Therefore, we have

    \begin{align*}\sup_{A\in\mathscr{A}} \Var(\tilde{\omega}^*A\tilde{\omega})&= \sup_{A\in\mathscr{A}} \left[\mathbb{E}[(\tilde{\omega}^*A\tilde{\omega})^2] - (\tr A)^2\right]\\&= \sup_{A\in\mathscr{A}} \left[\mathbb{E}[a^2 \cdot s^*(Q^*AQ)s] - (\tr (Q^*AQ))^2\right].\end{align*}

Now apply Jensen’s inequality only over the randomness in Q to obtain

    \begin{align*}\sup_{A\in\mathscr{A}} \Var(\tilde{\omega}^*A\tilde{\omega})&= \sup_{A\in\mathscr{A}} \left[\mathbb{E}[a^2 \cdot s^*(Q^*AQ)s] - (\tr (Q^*AQ))^2\right] \\&\le \mathbb{E}_Q \sup_{A\in\mathscr{A}} \left[\mathbb{E}_{a,s}[a^2 \cdot s^*(Q^*AQ)s] - (\tr (Q^*AQ))^2\right].\end{align*}

Finally, note that since \mathscr{A} is closed under \field-unitary similarity transformations, the supremum over Q^*AQ for A \in \mathscr{A} is the same as the supremum of A \in \mathscr{A}, so we obtain

    \begin{align*}\sup_{A\in\mathscr{A}} \Var(\tilde{\omega}^*A\tilde{\omega})&\le \mathbb{E}_Q \sup_{A\in\mathscr{A}} \left[\mathbb{E}_{a,s}[a^2 \cdot s^*(Q^*AQ)s] - (\tr (Q^*AQ))^2\right] \\&= \mathbb{E}_Q \sup_{A\in\mathscr{A}} \left[\mathbb{E}_{a,s}[a^2 \cdot s^*As] - (\tr A)^2\right] \\&= \sup_{A\in\mathscr{A}} \Var(\omega^*A\omega).\end{align*}

We have successfully proven (7). This argument is a specialized version of a far more general result which appears as Proposition 4.1 in this paper.

Next, we shall prove

(9)   \[\sup_{A\in\mathscr{A}} \Var(t^*At) \le \sup_{A\in\mathscr{A}} \Var(\tilde{\omega}^*A\tilde{\omega}), \]

where t is still defined as in (8). Indeed, using the chain rule for variance, we obtain

    \begin{align*}\Var(\tilde{\omega}^*A\tilde{\omega})&= \Var(a^2\cdot t^*At) \\&= \mathbb{E}[\Var(a^2\cdot t^* A t \mid a)] + \Var(\mathbb{E}[a^2\cdot t^* A t \mid a]) \\&= \mathbb{E}[a^4]\Var(t^* A t )+ (\tr A)^2\Var(a^2) \\&\ge \mathbb{E}[a^4]\Var(t^* A t ).\end{align*}

Here, we have used that t is uniform on the sphere and thus \mathbb{E}[t^*At] = \tr A. By definition, a is the length of \omega divided by \sqrt{n}. Therefore,

    \[\mathbb{E}[a^2] = \frac{1}{n}\mathbb{E}[\omega^*\omega] = \frac{1}{n} \mathbb{E}[\tr (\omega\omega^*)] = \frac{1}{n} \tr (\mathbb{E}[\omega\omega^*]) = \frac{\tr I}{n} = 1.\]

Therefore, by Jensen’s inequality,

    \[\mathbb{E}[a^4] = \mathbb{E}[(a^2)^2] \ge (\mathbb{E}[a^2])^2 = 1.\]

Thus

    \[\Var(\tilde{\omega}^*A\tilde{\omega}) \ge \mathbb{E}[a^4]\Var(t^* A t ) \ge \Var(t^*At) \quad \text{for every }A,\]

which proves (9).

The Vandermonde Decomposition

In this post, I want to discuss a beautiful and somewhat subtle matrix factorization known as the Vandermonde decomposition that appears frequently in signal processing and control theory. We’ll begin from the very basics, introducing the controls-and-signals context, how the Vandermonde decomposition comes about, and why it’s useful. By the end, I’ll briefly share how we can push the Vandermonde decomposition beyond matrices to the realm of tensors, which will can allow us to separate mixed signals from multiple measurements. This tensorial generalization plays an important role in my paper (L_r,L_r,1)-decompositions, sparse component analysis, and the blind separation of sums of exponentials, joint work with Nithin Govindajaran and Lieven De Lathauwer, which recently appeared in the SIAM Journal of Matrix Analysis and Applications.

Finding the Frequencies

Suppose I give you a short recording of a musical chord consisting of three notes. How could you determine which three notes they were? Mathematically, we can represent such a three-note chord as a combination of scaled and shifted cosine functions

(1)   \[f(t) = a_0 \cos(\omega_0 t - \phi_0) + a_1 \cos(\omega_1 t - \phi_1) + a_2 \cos(\omega_2 t - \phi_2). \]

We are interested in obtaining the (angular) frequencies \omega_1, \omega_2, and \omega_3.

In the extreme limit, when we are given the values of the signal for all t, both positive and negative, the frequencies are immediately given by taking a Fourier transform of the function f(\cdot). In practice, we only have access to the function f at certain times t_0,\ldots,t_{n-1} which we assume are equally spaced

    \[t_j = j\Delta t \quad \textnormal{for} \quad j = 0,1,2,\ldots,n-1.\]

Given the samples

    \[f_j = f(t_j) \quad \textnormal{for} \quad j = 0,1,2,\ldots,n-1\]

we could try to identify \omega_1, \omega_2, and \omega_3 using a discrete Fourier transform.1The discrete Fourier transform can be computed very quickly using the fast Fourier transform, as I discussed in a previous post. Unfortunately, this generally requires a large number of samples to identify \omega_1, \omega_2, and \omega_3 accurately. (The accuracy scales roughly like 1/n, where n is the number of samples.) We are interested in finding a better way to identify the frequencies.

Now that we’ve moved from the function f(\cdot), defined for any real input t, to a set of samples f_0,f_1,\ldots,f_{n-1} it will be helpful to rewrite our formula (1) for f in a different way. By Euler’s identity, the cosines can be rewritten as

    \[\cos \alpha = \frac{\mathrm{e}^{\mathrm{i} \alpha}+\mathrm{e}^{-\mathrm{i} \alpha}}{2}.\]

As a consequence, we can rewrite one of the frequency components in (1) as

    \[a_0 \cos(\omega_0 t - \phi_0) = d_0 \mathrm{e}^{\mathrm{i} \omega_0t} + d_1 \mathrm{e}^{-\mathrm{i} \omega_0t}.\]

Here, d_0 and d_1 are complex coefficients d_0 = a_0 \mathrm{e}^{-\mathrm{i} \phi_0}/2 and d_1 = a_0 \mathrm{e}^{\mathrm{i} \phi_0}/2 which contain the same information as the original parameters a_0 and \phi_0. Now notice that we are only interest in values t_j = j\, \Delta t which are multiples of the spacing \Delta t. Thus, our frequency component can be further rewritten as

    \[a_0 \cos(\omega_0 t_j - \phi_0) = d_0 z_0^j + d_1 z_1^j\]

where z_0 := \mathrm{e}^{\mathrm{i} \omega_0\, \Delta t} and z_1 := \mathrm{e}^{-\mathrm{i}\omega_0\, \Delta t}. Performing these reductions, our samples f_j take the form

(2)   \[f_j = d_0 z_0^j + d_1 z_1^j + \cdots + d_5 z_5^j. \]

We’ve now reformulated our frequency problems in identifying the parameters d_0,\ldots,d_5 and z_0,\ldots,z_5 in the relation (2) from a small number of measurements f_0,f_1,\ldots,f_{n-1}.

Frequency Finding as a Matrix Factorization

We will return to the algorithmic problem of identifying the parameters in the relation (2) from measurements in a little bit. First, we will see that (2) can actually be written as a matrix factorization. Understanding computations by matrix factorization has been an extremely successful paradigm in applied mathematics, and we will see in this post how viewing (2) as a matrix factorization can be very useful.

While it may seem odd at first,2As pointed out to me on math stack exchange, one reason forming the Hankel matrix is sensible is because it effectively augments the sequence of numbers f_0,f_1,\ldots,f_{n-1} into a sequence of vectors given by the columns of H. This can reveal patterns in the sequence which are less obvious when it is represented as given just as numbers. For instance, any seven columns of H are linearly dependent, a surprising fact since the columns of H have length (n-1)/2+1 which can be much larger than seven. In addition, as we will soon effectively exploit later, vectors in the nullspace of H (or related Hankel matrices derived from the sequence) give recurrence relations obeyed by the sequence. This speaks to a general phenomenon where properties of sequence (say, arising from snapshots of a dynamical system) can sometimes become more clear by this procedure of delay embedding. it will be illuminating to repackage the measurements f_0,f_1,\ldots,f_{n-1} as a matrix:

(3)   \[H = \begin{bmatrix} f_0 & f_1 & f_2 & \cdots & f_{(n-1)/2} \\f_1 & f_2 & f_3 & \cdots & f_{(n-1)/2+1} \\f_2 & f_3 & f_4 & \cdots & f_{(n-1)/2+2} \\\vdots & \vdots & \vdots & \ddots & \vdots \\f_{(n-1)/2} & f{(n-1)/2+1} & f{(n-1)/2+2} & \cdots & f_{n-1} \end{bmatrix}. \]

Here, we have assumed n is odd. The matrix H is known as the Hankel matrix associated with the sequence f_0,\ldots,f_{n-1}. Observe that the entry in position ij of H depends only on the sum of the indices i and j, H_{ij} = f_{i+j}. (We use a zero-indexing system to label the rows and columns of H where, for instance, the first row of H is row 0.)

Let’s see how we can interpret the frequency decomposition (2) as a factorization of the Hankel matrix H. We first write out H_{ij} using (2):

(4)   \[H_{ij} = f_{i+j} = \sum_{k=0}^5 d_k z_k^{i+j} = \sum_{k=0}^5 d_k z_k^i \cdot z_k^j. \]

The power z_k^{i+j} was just begging to be factorized as z_k^i\cdot z_k^j, which we did. Equation (4) almost looks like the formula for the product of two matrices with entries z_k^i, so it makes sense to introduce the 6\times (n-1)/2 matrix V with entry V_{ki} = z_k^i. This is a so-called Vandermonde matrix associated with z_0,\ldots,z_5 and has the form

    \[V = \begin{bmatrix}z_0^0 & z_0^1 & z_0^2 & \cdots & z_0^{(n-1)/2} \\z_1^0 & z_1^1 & z_1^2 & \cdots & z_1^{(n-1)/2} \\\vdots & \vdots & \vdots & \ddots & \vdots \\z_5^0 & z_5^1 & z_5^2 & \cdots & z_5^{(n-1)/2}\end{bmatrix}.\]

If we also introduce the 6\times 6 diagonal matrix D = \operatorname{diag}(d_0,d_1,\ldots,d_5), the formula (4) for H can be written as the matrix factorization3In the Vandermonde decomposition H=V^\top D V, the factor V appears transposed even when V is populated with complex numbers! This differs from the usual case in linear algebra where we use the conjugate transpose rather than the ordinary transpose when working with complex matrices. As a related issue, observe that if at least one of the measurements f_0,\ldots,f_{n-1} is a (non-real) complex number, the Hankel matrix H is symmetric but not Hermitian.

(5)   \[H = V^\top D V. \]

This is the Vandermonde decomposition of the Hankel matrix H, a factorization of H as a product of the transpose of a Vandermonde matrix, a diagonal matrix, and that same Vandermonde matrix.

The Vandermonde decomposition immediately tells us all the information d_0,\ldots,d_5 and z_0,z_1,\ldots,z_5 describing our sampled recording f_0,\ldots,f_{n-1} via (2). Thus, the problem of determining d_0,\ldots,d_5 and z_0,z_1,\ldots,z_5 is equivalent to finding the Vandermonde decomposition (5) of the Hankel matrix H.

Computing the Vandermonde Decomposition: Prony’s Method

Computing the Vandermonde decomposition accurately can be a surprisingly hard task, particularly if the measurements f_0,f_1,\ldots,f_{n-1} are corrupted by even a small amount of measurement error. In view of this, I want to present a very classical way of computing this decomposition (dating back to 1795!) known as Prony’s method. This method is conceptually simple and will be a vehicle to continue exploring frequency finding and its connection with Hankel matrices. It remains in use, though it’s accuracy may be significantly worse compared to other methods.

As a first step to deriving Prony’s method, let’s reformulate the frequency finding problem in a different way. Sums of cosines like the ones in our expression (1) for the function f(\cdot) often appear as the solution to a (linear) ordinary differential equation (ODE). This means that one way we could find the frequencies comprising f(\cdot) would be to find a differential equation which f(\cdot) satisfies. Together with the initial condition f(0), determining all the frequencies would be very straightforward.

Since we only have access to samples f_0, f_1,\ldots,f_{n-1} of f(\cdot) at regular time intervals, we will instead look for the “discrete-time” analog of a linear ODE, a linear recurrence relation. This is an expression of the form

(6)   \[f_m = c_{k-1} f_{m-1} + \cdots + c_1f_{m-k+1}+ c_0 f_{m-k} \quad \textnormal{for every} \quad m = k,\,{k+1},\,{k+2},\cdots. \]

In our case, we’ll have k = 6 because there are six terms in the formula (2) for f_j. Together with initial conditions f_0,f_1,\ldots,f_{k-1}, such a recurrence will allow us to determine the parameters z_0,\ldots,z_5 and d_0,\ldots,d_5 in our formula (2) for our sampled recordings f_0,\ldots,f_{n-1} and hence also allow us to compute the Vandermonde decomposition (5).

Observe that the recurrence (6) is a linear equation in the variables c_0,\ldots,c_5. A very good rule of thumb in applied mathematics is to always write down linear equations in matrix–vector notation in see how it looks. Doing this, we obtain

(7)   \[\begin{bmatrix} f_6 \\ f_7 \\ \vdots \\ f_{n-1} \end{bmatrix} = \underbrace{\begin{bmatrix} f_0 & f_1 & f_2 & f_3 & f_4 & f_5 \\ f_1 & f_2 & f_3 & f_4 & f_5 & f_6 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ f_{n-7} & f_{n-6} & f_{n-5} & f_{n-4} & f_{n-3} & f_{n-2} \end{bmatrix}}_{=F}\begin{bmatrix} c_0 \\ c_1 \\ \vdots \\ c_5 \end{bmatrix}. \]

Observe that the matrix on the right-hand side of this equation is also a Hankel matrix (like H in (3)) formed from the samples f_0,\ldots,f_{n-1}. Call this Hankel matrix F. Unlike H in (3), F is rectangular. If n is much larger than 6, F will be tall, possessing many more rows than columns. We assume n > 12 going forward.4n=12 would also be fine for our purposes, but we assume n > 12 to illustrate this highly typical case.

Let’s write (7) a little more compactly as

(8)   \[f_{6 \, :\, n-1} = F c, \]

where we’ve introduced f_{6\,:\, n-1} for the vector on the left-hand side of (7) and collected the recurrence coefficients c_0,\ldots,c_5 into a vector c. For a typical system of linear equations like (8), we would predict the system to have no solution c: Because F has more rows than columns (if n > 12), the system equations (8) has more equations than unknowns. Fortunately, we are not in the typical case. Despite the fact that we have more equations than unknowns, the linear equations (8) have a unique solution c.5This solution can be computed by solving the 6\times 6 system of linear equations \begin{bmatrix} f_6 \\ f_7 \\ \vdots \\ f_{11} \end{bmatrix} = \begin{bmatrix} f_0 & f_1 & \cdots & f_5 \\ f_1 & f_2 & \cdots & f_6 \\ \vdots & \vdots & \ddots & \vdots \\ f_{5} & f_6 & \cdots & f_{11} \end{bmatrix}\begin{bmatrix} c_0 \\ c_1 \\ \vdots \\ c_5\end{bmatrix}. In particular, the matrix on the right-hand side of this equation is guaranteed to be nonsingular under our assumptions. Using the Vandermonde decomposition, can you see why? The existence of a unique solution is a consequence of the fact that the samples f_0,\ldots,f_{n-1} satisfy the formula (2). As a fun exercise, you might want to verify the existence of a unique c satisfying (8)!

As a quick aside, if the measurements f_0,\ldots,f_{n-1} are corrupted by small measurement errors, then the equations (8) will usually not possess a solution. In this case, it would be appropriate to find the least squares solution to equation (8) as a way of mitigating these errors.

Hurrah! We’ve found the coefficients c_0,\ldots,c_5 providing a recurrence relation (6) for our measurements f_0,\ldots,f_{n-1}. All that is left is to find the parameters z_0,\ldots,z_5 and d_0,\ldots,d_5 in our signal formula (2) and the Vandermonde decomposition (5). Fortunately, this is just a standard computation for linear recurrence relations, nicely paralleling the solution of (homogenous) linear ODEs by means of the so-called “characteristic equation”. I’ll go through fairly quickly since this material is well-explained elsewhere on the internet (like Wikipedia). Let’s guess that our recurrence (6) has a solution of the form f_j = z^j; we seek to find all complex numbers z for which this is a bonafide solution. Plugging this solution into the formula (6) for f_6 gives

(9)   \[z^6 = c_0 z^5 + c_1 z^4 + \cdots + c_6. \]

This is the so-called characteristic equation for the recurrence (6). As a single-variable polynomial equation of degree six, it has six complex solutions z_0,z_1,\ldots,z_5. These numbers z_0,z_1,\ldots,z_5 are precisely those numbers which appear in the sequence formula (2) and the Vandermonde decomposition (5).

Finally, we need to compute the coefficients d_0,\ldots,d_5. But this is easy. Observe that the formula (2) provides the following system of linear equations for d_0,\ldots,d_5:

(10)   \[\begin{bmatrix}f_0 \\ f_1 \\ \vdots \\ f_{n-1}\end{bmatrix} = \begin{bmatrix}1 & 1 & \cdots & 1 \\ z_0 & z_1 & \cdots & z_5 \\ \vdots & \vdots & \ddots & \vdots \\ z_0^{n-1} & z_1^{n-1} & \cdots & z_5^{n-1}\end{bmatrix} \begin{bmatrix}d_0 \\ d_1 \\ \vdots \\ d_{n-1}.\end{bmatrix}. \]

Again, this system of equations will have a unique solution if the measurements f_0,\ldots,f_{n-1} are uncorrupted by errors (and can be solved in the least squares sense if corrupted). This gives d_0,\ldots,d_5, completing our goal of computing the parameters in the formula (2) or, equivalently, finding the Vandermonde decomposition (5).

We have accomplished our goal of computing the Vandermonde decomposition. The approach by which we did so is known as Prony’s method, as mentioned in the introduction to this section. As suggested, this method may not always give high-accuracy results. There are two obvious culprits that jump out about why this is the case. Prony’s method requires solving for the roots of the polynomial equation (9) expressed “in the monomial basis” and solving a system of linear equations (10) with a (transposed) Vandermonde matrix. Both of these problems can be notoriously ill-conditioned and thus challenging to solve accurately and may require the measurements f_0,\ldots,f_{n-1} to be done to very high accuracy. Notwithstanding this, Prony’s method does useful results in some cases and forms the basis for potentially more accurate methods, such as those involving generalized eigenvalue problems.

Separating Signals: Extending the Vandermonde Decomposition to Tensors

In our discussion of the frequency identification problem, the Vandermonde decomposition (5) has effectively been an equivalent way of showing the samples f_j are a combination of exponentials z^j. So far, the benefits of the matrix factorization perspective have yet to really reveal themselves.

So what are the benefits of the Vandermonde decompostions? A couple of nice observations related to the Vandermonde decomposition and the “Hankelization” of the signals H have already been lurking in the background. For instance, the rank of the Hankel matrix H is the number of frequency components z_k needed to describe the samples and the representation of the samples as a mixture of exponentials is uniquely determined only if the matrix H does not have full rank; I have a little more to say about this at the very end. There are also benefits to certain computational problems; one can use Vandermonde decompositions to compute super high accuracy singular value decompositions of Hankel matrices.

The power of the Vandermonde decomposition really starts to shine when we go beyond the basic frequency finding problem we discussed by introducing more signals. Suppose now there are three short recordings f^{(1)}(\cdot), f^{(2)}(\cdot), and f^{(3)}(\cdot). (Here, the superscript denotes an index rather than differentiation.) Each signal is a weighted mixture of three sources s^{(1)}(\cdot), s^{(2)}(\cdot), and s^{(3)}(\cdot), each of which plays a musical chord of three notes (thus representable as a sum of cosines as in (1)). One can think of the sources of being produced three different musical instruments at different places in a room and the recordings f^{(1)}(\cdot), f^{(2)}(\cdot), and f^{(3)}(\cdot) being taken from different microphones in the room.6This scenario of instruments and microphones ignores the finite propagation speed of sound, which also would introduce time delays in the sources in the recorded signals. We effectively treat the speed of sound as being instantaneous. Our goal is now not just to identify the musical notes in the recordings but also to identify how to assign those notes to reconstruct the source signals s^{(1)}(\cdot), s^{(2)}(\cdot), and s^{(3)}(\cdot).

Taking inspiration from earlier, we record samples f_0^{(\ell)},\ldots,f_{n-1}^{(\ell)} for each recording \ell = 1,2,3 and form each collection of samples into a Hankel matrix

    \[H^{(\ell)} = \begin{bmatrix} f_0^{(\ell)} & f_1^{(\ell)} & f_2^{(\ell)} & \cdots & f_{(n-1)/2}^{(\ell)} \\f_1^{(\ell)} & f_2^{(\ell)} & f_3^{(\ell)} & \cdots & f_{(n-1)/2+1}^{(\ell)} \\f_2^{(\ell)} & f_3^{(\ell)} & f_4^{(\ell)} & \cdots & f_{(n-1)/2+2}^{(\ell)} \\\vdots & \vdots & \vdots & \ddots & \vdots \\f_{(n-1)/2}^{(\ell)} & f_{(n-1)/2+1}^{(\ell)} & f_{(n-1)/2+2}^{(\ell)} & \cdots & f_{n-1}^{(\ell)} \end{bmatrix}.\]

Here comes the crazy part: Stack the Hankelized recordings H^{(1)}, H^{(2)}, and H^{(3)} as slices of a tensor \mathcal{H}. A tensor, in this context, just means a multidimensional array of numbers. Just as a vector is a one-dimensional array and a matrix is a two-dimensional array, a tensor could have any number of dimensions. In our case, we need just three. If we use a MATLAB-esque indexing notation, \mathcal{H} is a (n-1)/2\times (n-1)/2 \times 3 array given by

    \[\mathcal{H}(:,:,\ell) = H^{(\ell)} \quad \textnormal{for} \quad \ell=1,2,3.\]

The remarkable thing is that the source signals can be determined (under appropriate conditions) by computing a special kind of Vandermonde decomposition of the tensor \mathcal{H}! (Specifically, the required decomposition is a Vandermonde-structured (L_r,L_r,1)-block term decomposition of the tensor \mathcal{H}.) Even more cool, this decomposition can be computed using general-purpose software like Tensorlab.

If this sounds interesting, I would encourage you to check out my recently published paper (L_r,L_r,1)-decompositions, sparse component analysis, and the blind separation of sums of exponentials, joint work with Nithin Govindajaran and Lieven De Lathauwer and recently published in the SIAM Journal on Matrix Analysis and Applications. In the paper, we explain what this (L_r,L_r,1)-decomposition is and how applying it to \mathcal{H} can be used to separate mixtures of exponentials signals from the resulting Vandermonde structure, an idea originating in the work of De Lathauewer. A very important question for these signal separation problems is that of uniqueness. Given the three sampled recordings (comprising the tensor \mathcal{H}), is there just one way of unscrambling the mixtures into different sources or multiple? If there are multiple, then we might have possibly computed the wrong one. If there is just a single unscrambling, though, then we’ve done our job and unmixed the scrambled signals. The uniqueness of these tensor decompositions is fairly complicated math, and we survey existing results and prove new ones in this paper.7One of our main technical contributions is a new notion of uniqueness of (L_r,L_r,1)-decompositions which we believe is nicely adapted to the signal separation context. Specfically, we prove mathematized versions of the statement “if the source signals are sufficiently different from each others and the measurements of sufficiently high quality, then the signals can uniquely be separated”.

Conclusions, Loose Ends, and Extensions

The central idea that we’ve been discussing is how it can be useful to convert between a sequence of observations f_0,f_1,\ldots,f_{n-1} and a special matricization of this sequence into a Hankel matrix (either square, as in (3), or rectangular, as in (7)). By manipulating the Hankel matrix, say, by computing its Vandermonde decomposition (5), we learn something about the original signal, namely a representation of the form (2).

This is a powerful idea which appears implicitly or explicitly throughout various subfields of mathematics, engineering, and computation. As with many other useful ideas, this paradigm admits many natural generalizations and extensions. We saw one already in this post, where we extended the Vandermonde decomposition to the realm of tensors to solve signal separation problems. To end this post, I’ll place a few breadcrumbs at the beginning of a few of the trails of these generalizations for any curious to learn more, wrapping up a few loose ends on the way.

Is the Vandermonde Decomposition Unique?
A natural question is whether the Vandermonde decomposition (5) is unique. That is, is it possible that there exists two Vandermonde decompositions

    \[H = V^\top DV = \tilde{V}^\top \tilde{D} \tilde{V}\]

of the same (square) Hankel matrix H? This is equivalent to whether the frequency components z_0,z_1,\ldots can be uniquely determined from the measurements f_0,f_1,\ldots,f_{n-1}.

Fortunately, the Vandermonde decomposition is unique if (and only if) the matrix H is a rank-deficient matrix. Let’s unpack this a little bit. (For those who could use a refresher on rank, I have a blog post on precisely this topic.) Note that the Vandermonde decomposition is a rank factorization8Rank factorizations are sometimes referred to as “rank-revealing factorizations”. I discuss my dispreference for this term in my blog post on low-rank matrices. since V has \rank H rows, V has full (row) rank, and D is invertible. This means that if take enough samples f_0,\ldots,f_{n-1} of a function f(\cdot) which is a (finite) combinations of exponentials, the matrix H will be rank-deficient and the Vandermonde decomposition unique.9The uniqueness of the Vandermonde decomposition can be proven by showing that, in our construction by Prony’s method, the c‘s, z‘s, and d‘s are all uniquely determined. If too few samples are taken, then H does not contain enough information to determine the frequency components of the signal f(\cdot) and thus the Vandermonde decomposition is non-unique.

Does Every Hankel Matrix Have a Vandermonde Decomposition?
This post has exclusively focused on a situation where we are provided with sequence we know to be represented as a mixture of exponentials (i.e., taking the form (2)) from which the existence of the Vandermonde decomposition (5) follows immediately. What if we didn’t know this were the case, and we were just given a (square) Hankel matrix H. Is H guaranteed to possess a Vandermonde decomposition of the form (5)?

Unfortunately, the answer is no; there exist Hankel matrices which do not possess a Vandermonde decomposition. The issue is related to the fact that the appropriate characteristic equation (analogous to (9)) might possess repeated roots, making the solutions to the recurrence (6) not just take the form z^j but also jz^j and perhaps j^2z^j, j^3z^j, etc.

Are There Cases When the Vandermonde Decomposition is Guaranteed To Exist?
There is one natural case when a (square) Hankel matrix is guaranteed to possess a Vandermonde decomposition, namely when the matrix is nonsingular/invertible/full-rank. Despite this being a widely circulated fact, I am unaware of a simple proof for why this is the case. Unfortunately, there is not just one but infinitely many Vandermonde decompositions for a nonsingular Hankel matrix, suggesting these decompositions are not useful for the frequency finding problem that motivated this post.
What If My Hankel Matrix Does Not Possess a Vandermonde Decomposition?
As discussed above, a Hankel matrix may fail to have a Vandermonde decomposition if the characteristic equation (a la (9)) has repeated roots. This is very much analogous to the case of a non-diagonalizable matrix for which the characteristic polynomial has repeated roots. In this case, while diagonalization is not possible, one can “almost-diagonalize” the matrix by reducing it to its Jordan normal form. In total analogy, every Hankel matrix can be “almost Vandermonde decomposed” into a confluent Vandermonde decomposition (a discovery that appears to have been made independently several times). I will leave these links to discuss the exact nature of this decomposition, though I warn any potential reader that these resources introduce the decomposition first for Hankel matrices with infinitely many rows and columns before considering the finite case as we have. One is warned that while the Vandermonde decomposition is always a rank decomposition, the confluent Vandermonde decomposition is not guaranteed to be one.10Rather, the confluent Vandermonde decomposition is a rank decomposition for an infinite extension of a finite Hankel matrix. Consider the Hankel matrix H = \begin{bmatrix} 1 & 0 & \cdots & 0 & 0 \\ 0 & 0 & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 0 & 0 \\ 0 & 0 & \cdots & 0 & 1 \end{bmatrix}.This matrix has rank-two but no rank-two confluent Vandermonde decomposition. The issue is that when extended to an infinite Hankel matrix \begin{bmatrix} 1 & \cdots & 0 & 0 & &\cdots & 1\\ \vdots & \ddots & \vdots &\vdots\\ 0 & \cdots & 0 & 0 & 1\\ 0 & \cdots & 0 & 1 \\ & & 1 & & \ddots \\ \vdots\\ 1\end{bmatrix}, this (infinite!) matrix has a rank exceeding the size of the original Hankel matrix H.
The Toeplitz Vandermonde Decomposition
Just as it proved useful to arrange samples f_0,\ldots, f_{n-1} into a Hankel matrix, it can also be useful to form them into a Toeplitz matrix

    \[T = \begin{bmatrix} f_{(n-1)/2} & f_{(n-1)/2+1} & f_{(n-1)/2+2} & \cdots & f_{n-1} \\\\f_{(n-1)/2-1} & f_{(n-1)/2} & f_{(n-1)/2+1} & \cdots & f_{n-2} \\\\f_{(n-1)/2-2} & f_{(n-1)/2-1} & f_{(n-1)/2} & \cdots & f_{n-3} \\\\\vdots & \vdots & \vdots & \ddots & \vdots \\\\ f_0 & f_1 & f_2 & \cdots & f_{(n-1)/2} \end{bmatrix}.\]

The Toeplitz matrix T has the appealing propery that the matrix–vector product Tx computes a (discrete) convolution of the sampled signal f with the sampled signal x which has all sorts of uses in signal processing and related fields.11I discuss Toeplitz matrices and a fast algorithm to compute the product Tx using the fast Fourier transform more in a blog post I wrote about the subject.

One can interconvert between Hankel and Toeplitz matrices by reversing the order of the rows. As such, to the extent to which Hankel matrices possess Vandermonde decompositions (with all the asterisks and fine print just discussed), Toeplitz matrices do as well but with the rows of the first factor reversed:

    \[T = \operatorname{ReversedRows}(V^\top) \cdot DV.\]

There is a special and important case where more is true. If a Toeplitz matrix is (Hermitian) positive semidefinite, then T always possesses a Vandermonde decomposition of the form

    \[T = V^* D V,\]

where V is a Vandermonde matrix associated with parameters z_0,z_1,\ldots,z_{k-1} which are complex numbers of absolute value one and D is a diagonal matrix with real positive entries.12The keen-eyed reader will note that V appears conjugate transposed in this formula rather than transposed as in the Hankel Vandermonde decomposition (5). This Vandermonde decomposition is unique if and only if T is rank-deficient. Positive semidefinite Toeplitz matrices are important as they occur as autocorrelation matrices which effectively describe the similarity between a signal and different shifts of itself in time. Autocorrelation matrices appear under different names in everything from signal processing to random processes to near-term quantum algorithms (a topic near and dear to my heart). A delightfully simple and linear algebraic derivation of this result is given by Yang and Xie (see Theorem 1).13Unfortunately, Yang and Xie incorrectly claim that every Toeplitz matrix possesses a rank factorization Vandermonde decomposition of the form T = V^* D V where V is a Vandermonde matrix populated with entries on the unit circle and D is a diagonal matrix of possibly *complex* entries. This claim is disproven by the example \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}. This decomposition can be generalized to infinite positive semidefinite Toeplitz matrices (appropriately defined).14Specifically, one can show that an infinite positive semidefinite Toeplitz matrix (appropriately defined) also has a “Vandermonde decomposition” (appropriately defined). This result is often known as Herglotz’s theorem and is generalized by the Bochner–Weil theorem.

Minimal Rank Completions

I’m delighted to share that my first applied mathematics paper, Minimal Rank Completions for Overlapping Blocks, coauthored with Nithin Govindarajan and Shivkumar Chandrasekaran, has been accepted for publication in Linear Algebra and its Applications. (We also have a preprint on arXiv.) In this post, I wanted to share what our paper is about and share an alternate derivation which ended up being cut from the final paper.

Our paper is concerned with the following question: given a collection of matrices which overlap in a particular way, can we choose the region of simultaneous overlap so as to minimize the rank of each of the individual matrices? This is a multi-objective optimization problem, so a priori it is not clear it has a simultaneous solution. So it was of great surprise to us when we discovered that it does. In fact, we were able to find a tractable algorithm to find all solutions to this multi-objective problem.

Our enormous thanks go to the anonymous reviewer, who went above and beyond by discovering and sketching to us a dramatically shorter and more revealing proof1In particular, our original proof only produced some solutions to this problem, where the reviewer’s improved proof strategy allowed us to characterize all solutions. of our main theorem. We are truly indebted to this extraordinary reviewer for the significantly streamlined and more understandable presentation in the current iteration of our paper.

For the remainder of this post, I want to discuss minimal rank completions in a little more generality and present a solution to a special case of a minimal rank completion problem which can be tractably solved. I will present a solution to this problem we discovered which I have not seen elsewhere in the literature. I hope that this perspective will be complementary to existing solutions to this problem published in the literature including the summary of this result in our paper.

Minimal Rank Completion Problems

Minimal rank completions have achieved a lot of buzz in statistics and data science. Given a matrix with unknown entries, it is often very natural to impute the missing entries as those minimizing the rank of the matrix. This is justified by a kind of Occam’s razor argument: big data matrices are very often (approximately) low-rank, so when data is missing, we may assume it takes whatever values are necessary to minimize the rank.

Finding the rank-minimizing choice of missing entries is, in general, NP-hard. However, under certain assumptions, semidefinite programming relaxations can sometimes exactly recover the missing entries.

Alternately, if the missing entries belong to a special pattern, it may be possible to find the rank-minimizing choice of the unknown entries using linear algebraic tools like matrix factorizations. These approaches have significantly more limited domains of applicability than optimization-based approaches, but have advantages in that they

  • can be more tractable as they involve matrix factorizations rather than solving optimization problems;
  • work over arbitrary fields of numbers, such as finite fields appearing in computer science;
  • can find all ways of choosing the entries to minimize the rank, in particular certifying or dis-certifying the uniqueness of the rank-minimizing choice; and
  • are guaranteed to find the rank-minimizer, without any statistical assumptions.

These algebraic methods have their roots in systems theory, integral operators, and rank-structured matrices. The last of these was the application which motivated our interest in the subject.

Our paper concerns an overlapping variant of this problem, where we simultaneously minimize the ranks of several matrices by choosing the entries in the overlap carefully. This problem emerged to us naturally in the construction of certain matrix representations, and we hope it might prove useful in tackling other problems. As it turns out, the overlapping problem can be solved very much in the spirit of a much simpler problem, the block 2\times 2 minimal rank completion problem, which we will spend most of the remainder of this post discussing.

A Solution to the Block 2×2 Case

The block 2\times 2 minimal rank completion problem is as follows: given a partially filled block matrix

(1)   \begin{equation*} \begin{bmatrix} A & B \\ ? & C \end{bmatrix} \end{equation*}

how can the “?” be filled in to minimize the rank of this matrix?

A generalized version of this problem was originally solved by Kaashoek and Woerdeman. An alternate derivation using matrix factorizations is given by Eidelman, Gohberg, and Haimovici, though they only find some of the solutions to this problem. We seek to characterize all ways of choosing the “?” so that the rank is minimize the rank.

Here, I present the solution to this problem which my coauthors and I originally discovered, which is different than the one we summarize in the final version of our paper.2This construction does appear in Section 4 of an initial preprint of ours on rank-structured matrices. This solution is in the spirit of the one by Eidelman, Gohberg, and Haimovici but does produce all solutions.

Let’s start by recalling some facts about linear algebra. Given an m\times n real3The analysis in this discussion holds over any field, but we shall use the real numbers for concreteness. matrix K, one can define its column space, which I will denote \operatorname{Col} K, to be the vector space consist of all linear combinations of the columns of K. From this column space, or indeed any vector subspace of \mathbb{R}^m, one can extract a basis for this subspace, meaning every vector in the subspace can be uniquely decomposed as a linear combination of elements from the basis. In this discussion, we shall always arrange the basis vectors as columns of a basis matrix P. If we instead consider the row space \operatorname{Row} K of K, then we arrange the elements of a basis as rows of a matrix Q.

Before we solve the problem, let’s reason about the lowest possible rank we could possibly expect for the completed matrix. Since the rank of a matrix can be no smaller than the rank of any of its submatrices, clearly we must have that, for any assignment X of the “?”,

(2)   \begin{equation*} \operatorname{rank} \begin{bmatrix} A & B \\ X & C \end{bmatrix} \ge \operatorname{rank} \begin{bmatrix} A & B\end{bmatrix}. \end{equation*}

However, in general, the rank of the completed matrix must be higher than \operatorname{rank} \begin{bmatrix} A & B\end{bmatrix}, as is exemplified when A and B are both zero but C is not. In addition to rank of \begin{bmatrix} A & B\end{bmatrix}, we must also account for rows of \begin{bmatrix} X & C\end{bmatrix} which cannot be written as linear combinations of the rows above, no matter how X is chosen. With a little noodling, one can convince oneself that there are always at least \operatorname{rank} \begin{bmatrix} B \\ C \end{bmatrix} - \operatorname{rank} B such rows, leading to the bound

(3)   \begin{equation*} \operatorname{rank} \begin{bmatrix} A & B \\ X & C \end{bmatrix} \ge \operatorname{rank} \begin{bmatrix} A & B\end{bmatrix} + \operatorname{rank} \begin{bmatrix} B \\ C \end{bmatrix} - \operatorname{rank} B =: r_{\rm opt}. \end{equation*}

We shall show that, by judicious choice of X, Eq. (3) can always be achieved with equality, making r_{\rm opt} the minimal rank for this completion problem.

Let us now find such a rank-minimizing X. The construction begins by considering the column spaces \operatorname{Col} A and \operatorname{Col} B of the matrices A and B. The intersection of these column spaces \operatorname{Col} A \cap \operatorname{Col} B is again a vector subspace, and we choose a basis P_{AB} for it. The subspace \operatorname{Col} A \cap \operatorname{Col} B might have a smaller size than \operatorname{Col} A. Therefore, to find a basis for \operatorname{Col} A, we can extend the basis P_{AB} by adding new columns P_{\overline{A}} to it so that the enlarged matrix \begin{bmatrix} P_{AB} & P_{\overline{A}} \end{bmatrix} forms a basis for \operatorname{Col} A. Similarly, we can find new columns P_{\overline{B}} to add to P_{AB} so that the matrix \begin{bmatrix} P_{AB} & P_{\overline{B}} \end{bmatrix} forms a basis for \operatorname{Col} B.

Now comes something subtle. Since P = \begin{bmatrix} P_{AB} & P_{\overline{A}} \end{bmatrix} forms a basis for \operatorname{Col} A, every column in A can be written as a linear combination of columns in P. In matrix language, this means there exists a matrix Q^\top such that A = PQ^\top—in fact, this exact Q forms a basis for \operatorname{Row} A. Since P is divided into two collections of columns, it is only natural to similarly (and conformally) divide Q into two pieces as Q = \begin{bmatrix} Q_{AB,A} & Q_{\overline{A},A} \end{bmatrix}. Thus, doing the same with B as we did with A, we obtain two matrix factorizations

(4)   \begin{equation*} A = \begin{bmatrix} P_{AB} & P_{\overline{A}} \end{bmatrix} \begin{bmatrix} Q_{AB,A}^\top \\ Q_{\overline{A},A}^\top \end{bmatrix}, \quad B = \begin{bmatrix} P_{AB} & P_{\overline{B}} \end{bmatrix} \begin{bmatrix} Q_{AB,B}^\top \\ Q_{\overline{B},B}^\top \end{bmatrix}. \end{equation*}

Now we do the same song and dance with the row spaces of B and C. Let’s go through the construction somewhat quickly. We start with a basis Q_{BC} for the intersection of \operatorname{Row} B \cap \operatorname{Row} C. We then extend these to bases \begin{bmatrix} Q_{BC} & Q_{\overline{B}} \end{bmatrix} and \begin{bmatrix} Q_{BC} & Q_{\overline{C}} \end{bmatrix} for \operatorname{Row} B and \operatorname{Row} C. From here, we derive the existence of matrix factorizations analogous to Eq. (4):

(5)   \begin{equation*} B = \begin{bmatrix} P_{B,BC} & P_{B,\overline{B}} \end{bmatrix} \begin{bmatrix} Q_{BC}^\top \\ Q_{\overline{B}}^\top \end{bmatrix},\quad C = \begin{bmatrix} P_{C,BC} & P_{C,\overline{C}} \end{bmatrix} \begin{bmatrix} Q_{BC}^\top \\ Q_{\overline{C}}^\top \end{bmatrix}. \end{equation*}

For the next part, we shall take advantage of a powerful fact: if I have a bases P and Q for the row and column spaces of a matrix K, there exists a nonsingular matrix R for which K=PRQ^\top. Applying this fact to the bases \begin{bmatrix} P_{B,BC} & P_{B,\overline{B}} \end{bmatrix} and \begin{bmatrix} Q_{AB,B} & Q_{\overline{B},B} \end{bmatrix} for B‘s column and row spaces, we get the existence of a matrix R = \begin{bmatrix} R_{BC,AB} & R_{\overline{B},AB} \\ R_{BC,\overline{B}} & R_{\overline{B},\overline{B}} \end{bmatrix} such that

(6)   \begin{equation*} B = \begin{bmatrix} P_{B,BC} & P_{B,\overline{B}} \end{bmatrix}\begin{bmatrix} R_{BC,AB} & R_{BC,\overline{B}} \\ R_{\overline{B},AB} & R_{\overline{B},\overline{B}} \end{bmatrix} \begin{bmatrix} Q_{AB,B}^\top \\ Q_{\overline{B},B}^\top \end{bmatrix}. \end{equation*}

We now have all ingredients to describe the solution. Rather than trying to “derive” the solution in a rigorous way, let us try and discover the solution in a non-rigorous way and justify our work later. We’re hoping to find assignments X of the “?” such that \begin{bmatrix} A & B \\ X & C \end{bmatrix} achieves the minimum possible rank r_{\rm opt} defined in Eq. (3). To do so, let’s try and find a rank factorization of the completed matrix and then infer what values X will take. First off, let’s build a rank factorization for \begin{bmatrix} A & B \end{bmatrix} using the building blocks we’ve already built

(7)   \begin{equation*} \begin{bmatrix} A & B \end{bmatrix} = \begin{bmatrix} P_{AB} & P_{\overline{B}} & P_{\overline{A} \end{bmatrix} \begin{bmatrix} Q_{AB,A}^\top & Q_{AB,B}^\top \\ 0 & Q_{\overline{B},B}^\top \\ Q_{\overline{A},A}^\top & 0 \end{bmatrix}. \end{equation*}

Now we want to extend this to a rank factorization for the entire completed matrix. Let’s build up to this in stages, denoting by \star entries whose value we have yet to determine. For us to have a hope of representing the matrix C, we’ll need to somehow add Q^\top_{\overline{C}} into our rank factorization. Doing exactly this, we get the partially specified rank factorization

(8)   \begin{equation*} \begin{bmatrix} A & B \\ X & C \end{bmatrix} = \begin{bmatrix} P_{AB} & P_{\overline{B}} & P_{\overline{A} & 0 \\ \star & \star & \star & \star \end{bmatrix} \begin{bmatrix} Q_{AB,A}^\top & Q_{AB,B}^\top \\ 0 & Q_{\overline{B},B}^\top \\ Q_{\overline{A},A}^\top & 0\\ \star & Q_{\overline{C}}^\top \end{bmatrix}. \end{equation*}

Now, to fill in the second block row of the left factor, we recall that C = P_{C,BC} Q_{BC}^\top + P_{C,\overline{C}} Q_{\overline{C}}^\top. From Eqs. (5) and (6), we deduce that Q_{BC}^\top = R_{BC,AB} Q_{AB,B}^\top + R_{BC,\overline{B}} Q_{\overline{B},B}^\top. Thus, we can fill in more entries:

(9)   \begin{equation*} \begin{bmatrix} A & B \\ X & C \end{bmatrix} = \begin{bmatrix} P_{AB} & P_{\overline{B}} & P_{\overline{A} & 0 \\ P_{C,BC}R_{BC,AB} & P_{C,BC} R_{BC,\overline{B}} & \star & P_{C,\overline{C}} \end{bmatrix} \begin{bmatrix} Q_{AB,A}^\top & Q_{AB,B}^\top \\ 0 & Q_{\overline{B},B}^\top \\ Q_{\overline{A},A}^\top & 0\\ \star & Q_{\overline{C}}^\top \end{bmatrix}. \end{equation*}

With these entries filled in, the remaining \star‘s can be chosen arbitrarily. Assigning names F_{\overline{A}}^\top and F_{\overline{C}} to these free variables, we conclude that

(10)   \begin{equation*} \begin{bmatrix} A & B \\ X & C \end{bmatrix} = \begin{bmatrix} P_{AB} & P_{\overline{B}} & P_{\overline{A} & 0 \\ P_{C,BC}R_{BC,AB} & P_{C,BC} R_{BC,\overline{B}} & F_{\overline{C}} & P_{C,\overline{C}} \end{bmatrix} \begin{bmatrix} Q_{AB,A}^\top & Q_{AB,B}^\top \\ 0 & Q_{\overline{B},B}^\top \\ Q_{\overline{A},A}^\top & 0\\ F_{\overline{A}}^\top & Q_{\overline{C}}^\top \end{bmatrix} \end{equation*}

and

(11)   \begin{equation*} X = P_{C,BC}R_{BC,AB}Q_{AB,A}^\top + F_{\overline{A}} Q_{\overline{A},A}^\top + P_{C,\overline{C}} F_{\overline{C}}^\top. \end{equation*}

From all the analysis previous, we know that all X‘s of the form Eq. (11) solve the minimal rank completion problem, making the completed matrix achieve the minimal rank r_{\rm opt} defined in Eq. (3). With a little more elbow grease, one can also confirm that all such minimal completions X are of the form Eq. (11), proving that Eq. (11) indeed characterizes all solutions to the minimal rank completion problem. This completes the characterization and construction of the complete set of minimizers to the block 2\times 2 minimal rank completion problem.

The Overlapping Block Minimal Rank Completion Problem

If you found this post interesting, be sure to check out our paper (or here on arXiv) for an explanation of a different way of thinking about the solution of the block 2\times 2 minimal rank completion problem and a solution to a more general “overlapping” variant. The treatment should be reasonably self-contained, and we hope the solution to this problem could prove a useful tool in tackling open problems in systems theory and the study of rank-structured matrices.