Don’t Use Gaussians in Stochastic Trace Estimation

Suppose we are interested in estimating the trace $\tr(A) = \sum_{i=1}^n A_{ii}$ of an $n\times n$ matrix $A$ that can be only accessed through matrix–vector products $Ax_1,\ldots,Ax_m$ . The classical method for this purpose is the Girard–Hutchinson estimator

$\hat{\tr} = \frac{1}{m} \left( x_1^\top Ax_1 + \cdots + x_m^\top Ax_m \right),$

where the vectors $x_1,\ldots,x_m$ are independent, identically distributed (iid) random vectors satisfying the isotropy condition

$\expect[x_ix_i^\top] = I.$

Examples of vectors satisfying this condition include

Gaussian: The entries of each $x_i$ are iid standard Gaussian random variables.
Random signs: The entries of each $x_i$ are iid random numbers taking the values $\pm 1$ with equal probabilities (i.e., Rademacher random variables).
Sphere: Each $x_i$ is a uniformly random point on the (Euclidean) sphere of radius $\sqrt{n}$ , $x_i \sim \operatorname{Unif} \{ y \in \mathbb{R}^n : y^\top y = n \}$ .

Stochastic trace estimation has a number of applications: log-determinant computations in machine learning, partition function calculations in statistical physics, generalized cross validation for smoothing splines, and triangle counting in large networks. Several improvements to the basic Girard–Hutchinson estimator have been developed recently. I am partial to XTrace, an improved trace estimator that I developed with my collaborators.

This post is addressed at the question:

Which distribution should be used for the test vectors $x_i$ for stochastic trace estimation?

Since the Girard–Hutchinson estimator is unbiased $\expect[\hat{\tr}] = \tr(A)$ , the variance of $\hat{\tr}$ is equal to the mean-square error. Thus, the lowest variance trace estimate is the most accurate. In my previous post on trace estimation, I discussed formulas for the variance $\Var(\hat{\tr})$ of the Girard–Hutchinson estimator with different choices of test vectors. In that post, I stated the formulas for different choices of test vectors (Gaussian, random signs, sphere) and showed how those formulas could be proven.

In this post, I will take the opportunity to editorialize on which distribution to pick. The thesis of this post is as follows:

The sphere distribution is essentially always preferable to the Gaussian distribution for trace estimation.

To explain why, let’s focus on the case when $A$ is real and symmetric.¹ Let $\lambda_1,\ldots,\lambda_n$ be the eigenvalues of $A$ and define the eigenvalue mean

$\overline{\lambda} = \frac{\lambda_1 + \cdots + \lambda_n}{n}.$

Then the variance of the Girard–Hutchinson estimator with Gaussian vectors $x_i$ is

$\Var(\hat{\tr}_{\rm Gaussian}) = \frac{1}{m} \cdot 2 \sum_{i=1}^n \lambda_i^2.$

For vectors $x_i$ drawn from the sphere, we have

$\Var(\hat{\tr}_{\rm sphere}) = \frac{1}{m} \cdot \frac{n}{n+2} \cdot 2\sum_{i=1}^n (\lambda_i - \overline{\lambda})^2.$

The sphere distribution improves on the Gaussian distribution in two ways. First, the variance of $\Var(\hat{\tr}_{\rm sphere})$ is smaller than $\Var(\hat{\tr}_{\rm Gaussian})$ by a factor of $n/(n+2) < 1$ . This improvement is quite minor. Second, and more importantly, $\Var(\hat{\tr}_{\rm Gaussian})$ is proportional to the sum of $A$ ‘s squared eigenvalues whereas $\Var(\hat{\tr}_{\rm sphere})$ is proportional to the sum of $A$ ‘s squared eigenvalues after having been shifted to be mean-zero!

The difference between Gaussian and sphere test vectors can be large. To see this, consider a $1000\times 1000$ matrix $A$ with eigenvalues uniformly distributed between $0.9$ and $1.1$ with a (Haar orthgonal) random matrix of eigenvectors. For simplicity, since the variance of all Girard–Hutchinson estimates is proportional to $1/m$ , we take $m=1$ . Below show the variance of Girard–Hutchinson estimator for different distributions for the test vector. We see that the sphere distribution leads to a trace estimate which has a variance 300× smaller than the Gaussian distribution. For this example, the sphere and random sign distributions are similar.

Which Distribution Should You Use: Signs vs. Sphere

The main point of this post is to argue against using the Gaussian distribution. But which distribution should you use: Random signs? The sphere distribution? The answer, for most applications, is one of those two, but exactly which depends on the properties of the matrix $A$ .

The variance of the Girard–Hutchinson estimator with the random signs estimator is

$\Var(\hat{\tr}_{\rm signs}) = 2 \sum_{i\ne j} A_{ij}^2.$

Thus, $\Var(\hat{\tr}_{\rm signs})$ depends on the size of the off-diagonal entries of $A$ ; $\Var(\hat{\tr}_{\rm signs})$ does not depend on the diagonal of $A$ at all! For matrices with small off-diagonal entries (such as diagonally dominant matrices), the random signs distribution is often the best.

However, for other problems, the sphere distribution is preferable to random signs. The sphere distribution is rotation-invariant, so $\Var(\hat{\tr}_{\rm sphere})$ is independent of the eigenvectors of the (symmetric) matrix $A$ , depending only on $A$ ‘s eigenvalues. By contrast, the variance of the Girard–Hutchinson estimator with the random signs distribution can significantly depend on the eigenvectors of the matrix $A$ . For a given set of eigenvalues and the worst-case choice of eigenvectors, $\Var(\hat{\tr}_{\rm sphere})$ will always be smaller than $\Var(\hat{\tr}_{\rm signs})$ . In fact, $\Var(\hat{\tr}_{\rm sphere})$ is the minimum variance distribution for Girard–Hutchinson trace estimation for a matrix with fixed eigenvalues and worst-case eigenvectors; see this section of my previous post for details.

In my experience, random signs and the sphere distribution are both perfectly adequate for trace estimation and either is a sensible default if you’re developing software. The Gaussian distribution on the other hand… don’t use it unless you have a good reason to.

Distribution	Variance (divided by $\tr(A)^2$ )
Gaussian	$2.0\times 10^{-3}$
Sphere	$6.7\times 10^{-6}$
Random signs	$6.7\times 10^{-6}$

Don’t Use Gaussians in Stochastic Trace Estimation

Which Distribution Should You Use: Signs vs. Sphere

One thought on “Don’t Use Gaussians in Stochastic Trace Estimation”

Leave a Reply Cancel reply