Gaussian Integration by Parts

Gaussian random variables are wonderful, and there are lots of clever tricks for doing computations with them. One particularly nice tool is the Gaussian integration by parts formula, which I learned from my PhD advisor Joel Tropp. Here it is:

Gaussian integration by parts. Let z be a standard Gaussian random variable. Then \expect[zf(z)] = \expect[f'(z)].

This formula makes many basic computations effortless. For instance, to compute the second moment \expect[z^2] of a standard Gaussian random variable z, we apply the formula with f(x) = x to obtain

    \[\expect[z^2] = \expect[z f(z)] = \expect[f'(z)] = \expect[1] = 1.\]

Therefore, the second moment is one. Since z has mean zero, this also means that the variance of a standard Gaussian random variable is one.

The fourth moment is no harder to compute. Using f(x) = x^3, we compute

    \[\expect[z^4] = \expect[zf(z)] = \expect[f'(z)] = 3\expect[z^2] = 3.\]

Easy peesy. We’ve shown the fourth moment of a standard Gaussian variable is three—no fancy integral tricks required.

Iterating this trick, we can compute all the even moments of a standard Gaussian random variable. Indeed,

    \[\expect[z^{2p}] = \expect[z\cdot z^{2p-1}] = (2p-1) \expect[z^{2p-2}] = (2p-1)(2p-3) \expect[z^{2p-4}] = \cdots = (2p-1)(2p-3)\cdots 1.\]

We conclude that the (2p)th moment of a standard Gaussian random variable is (2p-1)!!, where !! indicates the (in)famous double factorial (2p-1)!! = (2p-1)\times(2p-3)\times\cdots \times 3 \times 1.

As a spicier application, let us now compute \expect[|z|]. To do so, we choose f(x) = \operatorname{sign}(x) to be the sign function:

    \[\operatorname{sign}(x) = \begin{cases} 1, & x > 0, \\ 0 & x = 0, \\ -1 & x < 0.\end{cases}\]

This function is not differentiable in a “Calculus 1” sense because it is discontinuous at zero. However, it is differentiable in a “distributional sense” and its derivative is f'(x) = 2\delta(x), where \delta denotes the famous Dirac delta “function”. We then compute

    \[\expect[|z|] = \expect[zf(z)] = \expect[f'(z)] = 2\expect[\delta(z)].\]

To compute \expect[\delta(z)], write the integral out using the probability density function \phi of the standard Gaussian distribution:

    \[\expect[|z|] = 2\expect[\delta(z)] = 2\int_{-\infty}^\infty \phi(x) \delta(x) \, \mathrm{d} x = 2\phi(0).\]

The standard Gaussian distribution has density

    \[\phi(x) = \frac{1}{\sqrt{2\pi}} \exp \left( - \frac{x^2}{2} \right),\]

so we conclude

    \[\expect[|z|] = 2 \cdot \frac{1}{\sqrt{2 \pi}} = \sqrt{\frac{2}{\pi}}}.\]

An unpleasant computation involving integrating against the Gaussian density has again been made trivial by using the Gaussian integration by parts formula.

Application: Power Method from a Random Start

As an application of the Gaussian integration by parts formula, we can analyze the famous power method for eigenvalue computations with a (Gaussian) random initialization. This discussion is adapted from the tutorial of Kireeva and Tropp (2024).

Setup

Before we can get to the cool application of the Gaussian integration by parts formula, we need to setup the problem and do a bit of algebra. Let A be a matrix, which we’ll assume for simplicity to be symmetric and positive semidefinite. Let \lambda_1 > \lambda_2 \ge \lambda_3 \ge \cdots \ge \lambda_n \ge 0 denote the eigenvalues of A. We assume the largest eigenvalue \lambda_1 is strictly larger than the next eigenvalue \lambda_2.

The power method computes the largest eigenvalue of A by repeating the iteration x \gets Ax / \norm{Ax}. After many iterations, x approaches an eigenvector of A and x^\top A x approaches an eigenvalue. Letting x^{(0)} denote the initial vector, the tth power iterate is x^{(t)} = A^t x^{(0)} / \norm{A^t x^{(0)}} and the tth eigenvalue estimate is

    \[\mu^{(t)} = \left(x^{(t)}\right)^\top Ax^{(t)}= \frac{\left(x^{(0)}\right)^\top A^{2t+1} x^{(0)}}{\left(x^{(0)}\right)^\top A^{2t} x^{(0)}}.\]

It is common to initialize the power method with a vector x^{(0)} with (independent) standard Gaussian random coordinates. In this case, the components z_1,\ldots,z_n of x^{(0)} in an eigenvector basis of A are also independent standard Gaussians, owing to the rotational invariance of the (standard multivariate) Gaussian distribution. Then the tth eigenvalue estimate is

    \[\mu^{(t)} = \frac{\sum_{i=1}^n \lambda_i^{2t+1} z_i^2}{\sum_{i=1}^n \lambda_i^{2t} z_i^2},\]

and the error of \mu^{(t)} as an approximation to the dominant eigenvalue \lambda_1 is

    \[\frac{\lambda_1 - \mu^{(t)}}{\lambda_1} = \frac{\sum_{i=1}^n (\lambda_1 - \lambda_i)/\lambda_1 \cdot \lambda_i^{2t} z_i^2}{\sum_{i=1}^n \lambda_i^{2t} z_i^2}.\]

Analysis

Having set everything up, we can now use the Gaussian integration by parts formula to make quick work of the analysis. To begin, observe that the quantity

    \[\frac{\lambda_1 - \lambda_i}{\lambda_1}\]

is zero for i = 1 and is at most one for i > 1. Therefore, the error is bounded as

    \[\frac{\lambda_1 - \mu^{(t)}}{\lambda_1} \le \frac{\sum_{i=2}^n \lambda_i^{2t} z_i^2}{\lambda_1^{2t} z_1^2 + \sum_{i=2}^n \lambda_i^{2t} z_i^2} = \frac{c^2}{\lambda_1^{2t} z_1^2 + c^2} \quad \text{for } c^2 = \sum_{i=2}^n \lambda_i^{2t} z_i^2.\]

We have introduced a parameter c to consolidate the terms in this expression not depending on z_1. Since z_1,\ldots,z_n are independent, z_1 and c are independent as well.

Now, let us bound the expected value of the error. First, we take an expectation \expect_{z_1} with respect to only the randomness in the first Gaussian variable z_1. Here, we use Gaussian integration by parts in the reverse direction. Introduce the function

    \[f'(x) = \frac{c^2}{\lambda_1^{2t}x^2 + c^2}.\]

This function is the derivative of

    \[f(x) = \frac{c}{\lambda_1^t} \arctan \left( \frac{\lambda_1^t}{c} \cdot x \right).\]

Thus, by the Gaussian integration by parts formula, we have

    \[\expect_{z_1}\left[\frac{\lambda_1 - \mu^{(t)}}{\lambda_1}\right] \le \expect_{z_1} \left[z_1 \cdot \frac{c}{\lambda_1} \arctan \left( \frac{\lambda_1^t}{c} \cdot z_1 \right) \right] = \expect_{z_1} \left[|z_1| \cdot \frac{c}{\lambda_1^t} \left|\arctan \left( \frac{\lambda_1^t}{c} \cdot z_1 \right)\right| \right].\]

In the last line, we observed that the bracketed quantity is nonnegative, so we are free to introduce absolute value signs. The arctangent function is always at most \pi/2, so we can bound

    \[\expect_{z_1}\left[\frac{\lambda_1 - \mu^{(t)}}{\lambda_1}\right] \le = \frac{\pi}{2} \cdot \frac{c}{\lambda_1^t} \cdot \expect_{z_1} \left[|z_1|\right] \le \sqrt{\frac{\pi}{2}} \cdot \frac{c}{\lambda_1^t}.\]

Here, we used our result from above that \expect[|z|] = \sqrt{2/\pi} for a standard Gaussian variable z.

We’re in the home stretch! We can bound c as

    \[c = \sqrt{\sum_{i=2} \lambda_i^{2t} z_i^2} \le \lambda_2^{t} \sqrt{\sum_{i=2}^n z_i^2} = \lambda_2^t \norm{(z_2,\ldots,z_n)}.\]

We see that c is at most \lambda_2^t times the length of a vector of n-1 standard Gaussian entries. As we’ve seen before on this blog, the expected length of a Gaussian vector with n-1 entries is at most \sqrt{n-1}. Thus, \expect[c] \le \lambda_2^t \sqrt{n-1}. We conclude that the expected error for power iteration is

    \[\expect\left[\frac{\lambda_1 - \mu^{(t)}}{\lambda_1}\right] = \expect_c\left[\expect_{z_1} \left[ \frac{\lambda_1 - \mu^{(t)}}{\lambda_1}  \right]\right] \le \sqrt{\frac{\pi}{2}} \cdot \expect\left[ \frac{c}{\lambda_1^t} \right] =\sqrt{\frac{\pi}{2}} \cdot \left(\frac{\lambda_2}{\lambda_1}\right)^t \cdot \sqrt{n-1} .\]

We see that the power iteration converges geometrically at a rate of at least (\lambda_2/\lambda_1)^t.

The first analyses of power iteration from a random start were done by Kuczyński and Woźniakowski (1992) and require pages of detailed computations involving integrals. This simplified analysis, due to Tropp (2020), makes the analysis effortless by comparison.

Leave a Reply

Your email address will not be published. Required fields are marked *