where the vectors are independent, identically distributed (iid) random vectors satisfying the isotropy condition
Examples of vectors satisfying this condition include
- Gaussian: The entries of each are iid standard Gaussian random variables.
- Random signs: The entries of each are iid random numbers taking the values with equal probabilities (i.e., Rademacher random variables).
- Sphere: Each is a uniformly random point on the (Euclidean) sphere of radius , .
Stochastic trace estimation has a number of applications: log-determinant computations in machine learning, partition function calculations in statistical physics, generalized cross validation for smoothing splines, and triangle counting in large networks. Several improvements to the basic Girard–Hutchinson estimator have been developed recently. I am partial to XTrace, an improved trace estimator that I developed with my collaborators.
This post is addressed at the question:
Which distribution should be used for the test vectors for stochastic trace estimation?
Since the Girard–Hutchinson estimator is unbiased , the variance of is equal to the mean-square error. Thus, the lowest variance trace estimate is the most accurate. In my previous post on trace estimation, I discussed formulas for the variance of the Girard–Hutchinson estimator with different choices of test vectors. In that post, I stated the formulas for different choices of test vectors (Gaussian, random signs, sphere) and showed how those formulas could be proven.
In this post, I will take the opportunity to editorialize on which distribution to pick. The thesis of this post is as follows:
The sphere distribution is essentially always preferable to the Gaussian distribution for trace estimation.
To explain why, let’s focus on the case when is real and symmetric.1The same principles hold in the general case, but the variance formulas are more delicate to state. See my previous post for the formulas. Let be the eigenvalues of and define the eigenvalue mean
Then the variance of the Girard–Hutchinson estimator with Gaussian vectors is
For vectors drawn from the sphere, we have
The sphere distribution improves on the Gaussian distribution in two ways. First, the variance of is smaller than by a factor of . This improvement is quite minor. Second, and more importantly, is proportional to the sum of ‘s squared eigenvalues whereas is proportional to the sum of ‘s squared eigenvalues after having been shifted to be mean-zero!
The difference between Gaussian and sphere test vectors can be large. To see this, consider a matrix with eigenvalues uniformly distributed between and with a (Haar orthgonal) random matrix of eigenvectors. For simplicity, since the variance of all Girard–Hutchinson estimates is proportional to , we take . Below show the variance of Girard–Hutchinson estimator for different distributions for the test vector. We see that the sphere distribution leads to a trace estimate which has a variance 300× smaller than the Gaussian distribution. For this example, the sphere and random sign distributions are similar.
|Variance (divided by )
Which Distribution Should You Use: Signs vs. Sphere
The main point of this post is to argue against using the Gaussian distribution. But which distribution should you use: Random signs? The sphere distribution? The answer, for most applications, is one of those two, but exactly which depends on the properties of the matrix .
The variance of the Girard–Hutchinson estimator with the random signs estimator is
Thus, depends on the size of the off-diagonal entries of ; does not depend on the diagonal of at all! For matrices with small off-diagonal entries (such as diagonally dominant matrices), the random signs distribution is often the best.
However, for other problems, the sphere distribution is preferable to random signs. The sphere distribution is rotation-invariant, so is independent of the eigenvectors of the (symmetric) matrix , depending only on ‘s eigenvalues. By contrast, the variance of the Girard–Hutchinson estimator with the random signs distribution can significantly depend on the eigenvectors of the matrix . For a given set of eigenvalues and the worst-case choice of eigenvectors, will always be smaller than . In fact, is the minimum variance distribution for Girard–Hutchinson trace estimation for a matrix with fixed eigenvalues and worst-case eigenvectors; see this section of my previous post for details.
In my experience, random signs and the sphere distribution are both perfectly adequate for trace estimation and either is a sensible default if you’re developing software. The Gaussian distribution on the other hand… don’t use it unless you have a good reason to.