Note to Self: Trace Estimation with Tensor Products

Let x_1,\ldots,x_\ell be random vectors and let x = x_1 \otimes \cdots \otimes x_\ell denote their tensor product. Assume the vectors x_i are isotropic, in the sense that

    \[\expect[x_i^{\vphantom{\top}} x_i^\top] = I.\]

The vector x inherits the isotropy property as well \expect[xx^\top] = I. As a consequence, we can use the vector x to form an unbiased estimator for the matrix trace \expect[x^\top Ax] = \tr(A). Trace estimation has been a frequent topic on this blog.

What is the variance of the trace estimate x^\top Ax? This question was addressed by Raphael Meyer and Haim Avron. The variance of the trace estimator x^\top A x is

    \[\Var(x^\top A x) = \expect[(x^\top A x)^2] - \expect[x^\top A x]^2 = \expect[(x^\top A x)^2] - \tr(A)^2.\]

As such, bounding the variance is equivalent to bounding the second moment \expect[(x^\top A x)^2]. Suppose that the individual base vectors x_i satisfy a moment bound

(1)   \[\expect[(x_i^\top A x_i)^2] \le \alpha \tr(A)^2 \quad \text{for every psd matrix } A.\]

For instance, Gaussian, random sign, and uniformly random vectors on the sphere all satisfy this bound with \alpha \le 3. Under assumption (1), Meyer and Avron show that the tensor-product trace estimator x^\top Ax satisfies the bound

(2)   \[\expect[(x^\top A x)^2] \le \alpha^\ell \tr(A)^2 \quad \text{for any psd matrix } A.\]

Here, a matrix A is positive semidefinite (psd) if it is symmetric and satisfies v^\top A v \ge 0 for any vector v. Observe, the constant in the bound (2) is exponentially larger than the constant in bound (1). Unfortunately, as Meyer and Avron show, this exponentially large variance is a real property of the tensor-structured trace estimator, at least on worst-case examples.

Meyer and Avron’s paper is really nice, and it contains many different results for Kronecker-structured trace estimation beyond the bound (2). I highly recommend checking their paper out, which was just appeared in the SIAM Journal of Matrix Analysis and Applications! In this blog, I’ll give an alternate, somewhat shorter proof of the Meyer–Avron bound (2).

Suppose that x = y \otimes z is a tensor product of isotropic vectors y\in \real^{d_1} and z \in \real^{n_2}, and suppose that

(3)   \[\expect[(y^\top A_1 y)^2] \le \alpha_1 \tr(A_1)^2 \quad \text{and} \quad \expect[(z^\top A_2 z)^2] \le \alpha_2 \tr(A_2)^2 \]

for any psd matrices A_1 and A_2. Now, let A be an (d_1n_2)\times(d_1n_2) psd matrix, and partition A as

(4)   \[A = \begin{bmatrix} A_{11} & \cdots & A_{1d_1} \\ \vdots & \ddots & \vdots \\ A_{d_1 1} & \cdots & A_{d_1 d_1} \end{bmatrix}. \]

Then

(1)   \begin{align*}x^\top A x &= \begin{bmatrix} y_1 z \\ \vdots \\ y_{d_1}z\end{bmatrix}^\top\begin{bmatrix} A_{11} & \cdots & A_{1d_1} \\ \vdots & \ddots & \vdots \\ A_{d_1 1} & \cdots & A_{d_1 d_1} \end{bmatrix}\begin{bmatrix} y_1 z \\ \vdots \\ y_{d_1}z\end{bmatrix} \\ &= \underbrace{\begin{bmatrix} y_1 \\ \vdots \\ y_{d_1}\end{bmatrix}^\top}_{y^\top}\begin{bmatrix} z^\top A_{11}z & \cdots & z^\top A_{1d_1}z \\ \vdots & \ddots & \vdots \\ z^\top A_{d_1 1}z & \cdots & z^\top A_{d_1 d_1}z \end{bmatrix}\underbrace{\begin{bmatrix} y_1 \\ \vdots \\ y_{d_1}\end{bmatrix}}_y.\end{align*}

Taking the expectation over the random vector y alone and applying (3), we obtain

(2)   \begin{align*}\expect_y [(x^\top A x)^2] &\le \alpha_1 \left( \tr \begin{bmatrix} z^\top A_{11}z & \cdots & z^\top A_{1d_1}z \\ \vdots & \ddots & \vdots \\ z^\top A_{d_1 1}z & \cdots & z^\top A_{d_1 d_1}z \end{bmatrix} \right)^2 \\ &= \alpha_1 [z^\top(A_{11}+\cdots+ A_{d_1d_1})z]^2.\end{align*}

Taking the expectation over z and invoking the law of total expectation then yields

(3)   \begin{align*}\expect [(x^\top A x)^2] &\le \alpha_1 \alpha_2 \tr(A_{11}+\cdots+ A_{d_1d_1})^2=\alpha_1 \alpha_2\tr(A)^2.\end{align*}

In the last line, we observe that the trace of A is the sum of the traces of its diagonal blocks A_{ii}. Voilà! We have deduced the bound

    \[\expect [(x^\top A x)^2] \le \alpha_1\alpha_2 \tr(A)^2,\]

which immediately yields the Meyer–Avron bound (2) by iteration.

Leave a Reply

Your email address will not be published. Required fields are marked *