Markov Musings 5: Poincaré Inequalities

In the previous posts, we’ve been using eigenvalues to understand the mixing of reversible Markov chains. Our main convergence result was as follows:

$\chi^2\left(\rho^{(n)} \, \middle|\middle| \, \pi\right) \le \left( \max \{ \lambda_2, -\lambda_n \} \right)^{2n} \chi^2\left(\rho^{(0)} \, \middle|\middle| \, \pi\right).$

Here, $\rho^{(n)}$ denotes the distribution of the chain at time $n$ , $\pi$ denotes the stationary distribution, $\chi^2(\cdot \mid\mid \cdot)$ denotes the $\chi^2$ divergence, and $1 = \lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_m \ge -1$ denote the decreasingly ordered eigenvalues of the Markov transition matrix $P$ .

Bounding the the rate of convergence requires an upper bound on $\lambda_2$ and a lower bound on $\lambda_m$ . In this post, we will talk about techniques for bounding $\lambda_2$ . For more on the smallest eigenvalue $\lambda_m$ , see the previous post.

Setting

Let’s begin by establishing some notation, mostly the same as previous posts as this series. We work with a reversible Markov chain with transition matrix $P$ and stationary distribution $\pi$ .

As in previous posts, we identify vectors $f \in \real^m$ and functions $f : \{1,\ldots,m\} \to \real$ , treating them as one and the same $f_i = f(i)$ .

For a vector/function $f$ , $\expect_\pi[f]$ and $\Var_\pi(f)$ denote the variance with respect to the stationary distribution $\pi$ :

$\expect_\pi[f] = \sum_{i=1}^m f(i) \pi_i, \quad \Var_\pi(f) \coloneqq \expect_\pi[(f-\expect_\pi[f])^2].$

We will make frequent use of the $\pi$ -inner product

$\langle f, g\rangle \coloneqq \expect_\pi[f\cdot g] = \sum_{i=1}^m f(i) g(i) \pi_i.$

We shall also use expressions such as $\expect_{x \sim \sigma, y\sim \tau} [f(x,y)]$ to denote the expectation of $f(x,y)$ where $x$ is drawn from distribution $\sigma$ and $y$ is drawn from $\tau$ .

We denote the eigenvalues of the transition matrix are $1 = \lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_m \ge -1$ . The associated eigenvectors (eigenfunctions) $\varphi_1,\ldots,\varphi_m$ are orthonormal in the $\pi$ -inner product

$\langle \varphi_i ,\varphi_j\rangle = \begin{cases}1, & i = j, \\0, & i \ne j.\end{cases}$

Variance and Local Variance

To discover methods for bounding $\lambda_2$ , we begin by investigating a seemingly simple question:

How much variable is the output of a function $f : \{1,\ldots,m\} \to \real$ ?

There are two natural quantities which provide answers to this question: the variance and the local variance. Poincaré inequalities—the main subject of this post—establish a relation between these two numbers. As a consequence, Poincaré inequalities will provide a bound on $\lambda_2$ .

Variance

We begin with the first of our two main characters, the variance $\Var_\pi(f)$ . The variance is a very familiar measure of variation, as it is defined for any random variable. It measures the average squared deviation of $f(x)$ from its mean, where $x$ is drawn from the stationary distribution $\pi$ .

Another helpful formula for the variance is the exchangeable pairs formula:

$\Var_\pi(f) = \frac{1}{2} \expect_{x,y \sim \pi} [(f(x) - f(y))^2].$

The exchangeable pairs formula states that the variance of $f$ is proportional to the average square difference of $f$ ‘s values when measured at locations $x$ and $y$ sampled (independently) from the stationary distribution $\pi$ .

Local Variance

The exchangeable pairs formula shows that variance is a measure of the global variability of the function: It measures the amount $f$ varies across locations $x$ and $y$ sampled randomly from the entire set of possible states $\{1,\ldots,m\}$ .

The local variance measures how much $f$ varies between points $x$ and $y$ which are separated by just one step of the Markov chain, thus providing a more local measure of variability. Let $x_0 \sim \pi$ be sampled from the stationary distribution, and let $x_1$ denote one step of the Markov chain after $x_0$ . The local variance is

$\mathcal{E}(f) = \frac{1}{2} \expect [(f(x_0) - f(x_1))^2].$

Other names for the local variance include the Dirichlet form and the quadratic form of the Markov chain.

An important note: The variance of a function $f$ depends only on the stationary distribution $\pi$ . By contrast, the local variance depends on the Markov transition matrix $P$ .

Poincaré Inequalities

If $f$ does not vary much over a single step of the Markov chain, then it seems reasonable to expect that it doesn’t vary much globally. This intuition is made quantitative using Poincaré inequalities.

Definition (Poincaré inequality). A Markov chain is said to satisfy a Poincaré inequality with constant $\alpha$ if
(1) $\Var_\pi(f)\le \alpha \cdot \mathcal{E}(f) \quad \text{for every function } f.$

Poincaré Inequalities and Mixing

Poincaré inequalities are intimately related with the speed of mixing for a Markov chain.

To see why, consider a function $f$ with small local variance. Because $f$ has small local variance, $f(x_0)$ is close to $f(x_1)$ , $f(x_1)$ is close to $f(x_2)$ , etc.; the function $f$ does not change much over a single step of the Markov chain. Does this mean that the (global) variance of $f$ will also be small? Not necessarily. If the Markov chain takes a long time to mix, the small local variance can accumulate to a large global variance over many steps of the Markov chain. Thus, a slowly mixing chain has a large Poincaré constant $\alpha$ . Conversely, if the chain mixes rapidly, the Poincaré constant $\alpha$ is small.

This relation between mixing and Poincaré inequalities is quantified by the following theorem:

Theorem (Poincaré inequalities from eigenvalues). The Markov chain satisfies a Poincaré inequality with constant
$\alpha= \frac{1}{1-\lambda_2}.$
This is the smallest possible Poincaré inequality for the Markov chain.

One way to interpret this result is that the eigenvalue $\lambda_2$ gives you Poincaré inequality (1). But we can flip this result around: Poincaré inequalities (1) establish bounds on the eigenvalue $\lambda_2$ .

Corollary (Eigenvalue bounds from Poincaré inequalities). If the Markov chain satisfies a Poincaré inequality (1) for a certain constant $\alpha$ , then
$\lambda_2 \le \frac{1}{1-\alpha}.$

A View to the Continuous Setting

For a particularly vivid example of a Poincaré inequality, it will be helpful to take a brief detour to the world of continuous Markov processes. This series has—to this point—exclusively focused on Markov chains $x_0,x_1,x_2,\ldots$ that have finitely many possible states and are indexed by discrete times $0,1,2,\ldots$ .. We can generalize Markov chains by lifting both of these restrictions, considering Markov processes $(x_t)_{t\ge 0}$ which take values in continuous space (such as the real line $\real$ ) and are indexed by continuous times $t\ge 0$ .

The mathematical details for Markov processes are a lot more complicated than for their Markov chain siblings, so we will keep it light on details.

For this example, our Markov process will be the Ornstein–Uhlenbeck process. This process has the somewhat mysterious form

$x_t = e^{-t}x_0 + e^{-t} B_{e^{2t}-1},$

where $(B_s)_{s\ge 0}$ denotes a (standard) Brownian motion, independent of the starting state $x_0$ . At time $s$ , the Brownian motion $B_s$ has a Gaussian distribution with variance $s$ . Thus,

Conditional on its starting value $x_0$ , the Ornstein–Uhlenbeck process $x_t$ is has a Gaussian distribution with mean $e^{-t}x_0$ and variance $1-e^{-2t}$ .

From this observation, it appears that the stationary distribution of the Ornstein–Uhlenbeck process is the standard Gaussian distribution. Indeed, this is the case, and the Ornstein–Uhlenbeck process converges to stationarity exponentially fast.

Since we have exponential convergence to stationarity,¹ there’s a Poincaré inequality lurking in the background, known as the Gaussian Poincaré inequality. Letting $Z$ denote a standard Gaussian random variable, Gaussian Poincaré inequality states that

(2) $\Var(f(Z)) \le \expect \big[(f'(Z))^2\big].$

The right-hand side of this inequality is the local variance of the Ornstein–Uhlenbeck process, equal to the expected squared derivative:

$\mathcal{E}(f) = \expect \big[(f'(Z))^2\big].$

The Gaussian Poincaré inequality presents a very clear demonstration of what a Poincaré inequality is: The global variance of the function $f(Z)$ is controlled by its local variability, here quantified by the expected squared derivative:

$\mathcal{E}(f) = \expect \big[(f'(Z))^2\big].$

For general Markov chains or processes, it can remain helpful to thinking of the local variance $\mathcal{E}(f)$ as a generalization of the “expected squared derivative” of the function $f$ .

Our main interest in Poincaré inequalities in this post is instrumental, we seek to use Poincaré inequalities to understand the mixing properties of Markov chains. But the Gaussian Poincaré inequality demonstrates that Poincaré inequalities are also interesting on their own terms. The inequality (2) is a useful inequality for bounding the variance of a function of a Gaussian random variable. As an immediate example, observe that the function $f(x) = \tanh(x)$ has derivative bounded by $1$ : $|f'(x)| \le 1$ . Thus,

$\Var(\tanh Z) \le \expect[(f'(Z))^2] \le 1.$

This inequality is not too difficult to prove directly,² but the point stands that the Gaussian Poincaré inequality—and Poincaré inequalities in general—can be useful on their own terms.³

Poincaré Inequalities and Eigenvalues

For the remainder of this post, we will develop the connection between Poincaré inequalities and eigenvalues, leading to a proof of our main theorem:

Theorem (Poincaré inequalities from eigenvalues). The Markov chain satisfies a Poincaré inequality with constant
$\alpha= \frac{1}{1-\lambda_2}.$
That is,
(3) $\Var_\pi(f)\le \frac{1}{1-\lambda_2}\cdot\mathcal{E}(f) \quad \text{for all } f\in\real^m.$
There exists a function $f$ for which equality is attained.

We begin by showing that it suffices to consider mean-zero functions $\expect[f] = 0$ to prove (3). Next, we derive formulas for $\Var_\pi(f)$ and $\mathcal{E}(f)$ using the $\pi$ -inner product $\langle\cdot,\cdot\rangle$ . We conclude by expanding $f$ in eigenvectors of $P$ and deriving the Poincaré inequality (3).

Shift to Mean-Zero

To prove the Poincaré inequality (3), we are free to assume that $f$ has mean zero, $\expect_\pi[f]=0$ . Indeed, both the variance $\Var_\pi(f)$ and local variance $\mathcal{E}(f)$ don’t change if we shift $f$ by a constant $c$ . That is, letting $\mathbb{1}$ denote the function

$\mathbb{1}(i) = 1 \quad\text{for }i =1,\ldots,m,$

then

$\Var_\pi(f+c\mathbb{1})=\Var_\pi(f)\quad\text{and}\quad\mathcal{E}(f+c\mathbb{1})=\mathcal{E}(f)$

for every function $f$ and constant $c$ . Therefore, for proving our Poincaré inequality, we can always shift $f$ so that it is mean-zero:

$\expect_\pi[f] = 0.$

Variance

Our strategy for proving the main theorem will be to develop a more linear algebraic formula for the variance and local variance. Let’s begin with the variance.

Assume $\expect_\pi[f] = 0$ . Then the variance is

$\Var_\pi(f)=\expect[f^2]=\sum_{i=1}^m f(i)f(i)\pi_i.$

Using the definition of the $\pi$ -inner product, we have shown that

$\Var_\pi(f)=\langle f,f\rangle.$

Local Variance

Now we derive a formula for the local variance:

$\mathcal{E}(f) = \frac{1}{2} \expect[(f(x_0)-f(x_1))^2]\quad \text{where }x_0\sim\pi.$

The probability that $x_0=i$ and $x_1=j$ is $\pi_iP_{ij}$ . Thus,

$\mathcal{E}(f) = \frac{1}{2} \sum_{i,j=1}^m (f(i)-f(j))^2 \pi_iP_{ij}.$

Expanding the parentheses and regrouping strategically, we obtain

$\mathcal{E}(f) = {\rm A} + {\rm B} + {\rm C}$

where

$\begin{align*}{\rm A} &= \frac{1}{2} \sum_{i=1}^n (f(i))^2 \pi_i \left( \sum_{j=1}^m P_{ij}\right),\\{\rm B} &= \frac{1}{2} \sum_{j=1}^m (f(j))^2 \left( \sum_{i=1}^m \pi_i P_{ij} \right), \\{\rm C} &= -\sum_{i=1}^m f(i)\left( \sum_{j=1}^m P_{ij} f(j) \right) \pi_i.\end{align*}$

Let’s take each of these terms one-by-one. For ${\rm A}$ , recognize that $\sum_{j=1}^m P_{ij} = 1$ . Thus,

${\rm A} = \frac{1}{2}\sum_{i=1}^m (f(i))^2 \pi_i = \frac{1}{2}\langle f, f\rangle.$

For $\rm B$ , use detailed balance $\pi_i P_{ij} = \pi_j P_{ji}$ . Then, using the condition $\sum_{i=1}^m P_{ji} = 1$ , we obtain

${\rm B} = \frac{1}{2} \sum_{j=1}^m (f(j))^2 \left(\sum_{i=1}^m \pi_j P_{ji} \right) = \frac{1}{2} \sum_{j=1}^m (f(j))^2 \pi_j = \frac{1}{2} \langle f, f\rangle.$

Finally, for ${\rm C}$ , recognize that $\sum_{j=1}^m P_{ij} f(j)$ is the $i$ th entry of the matrix–vector product $Pf$ . Thus,

${\rm C} = - \sum_{i=1}^m f(i) Pf(i) \,\pi_i = -\langle f, Pf\rangle.$

Thus, we conclude

$\mathcal{E}(f) = \langle f, (I-P)f\rangle,$

where $I$ denotes the identity matrix.

Conclusion

The Poincaré inequality

$\Var_\pi(f) \le \frac{1}{1-\lambda_2} \cdot\mathcal{E}(f) \quad \text{for all $f$ with $\expect_\pi[f] = 0$}.$

is equivalent to showing

$\frac{\mathcal{E}(f)}{\Var_\pi(f)}\ge 1-\lambda_2 \quad \text{for all $f$ with $\expect_\pi[f] = 0$}.$

Using our newly derived formulas, this in turn is equivalent to showing

$\frac{\langle f, (I-P)f\rangle}{\langle f, f\rangle} \ge 1-\lambda_2 \quad \text{for all $f$ with $\expect_\pi[f] = 0$}.$

We shall prove this version of the Poincaré inequality by expanding $f$ as a linear combination of eigenvectors.

Consider a decomposition of $f$ as a linear combination of $P$ ‘s eigenvectors:

$f = c_1 \varphi_1 + c_2 \varphi_2 + \cdots + c_m\varphi_m.$

As we showed in last post, the condition $\expect[f] = 0$ is equivalent to saying that $c_1 = 0$ .

Using the orthonormality of $\varphi_1,\ldots,\varphi_m$ under the $\pi$ -inner product and the eigenvalue relation $P\varphi_i = \lambda_i \, \varphi_i$ , we have that

$\begin{align*}\langle f, f\rangle &= c_2^2 + \cdots + c_m^2, \\\langle f, (I-P)f\rangle &= (1-\lambda_2)c_2^2 + \cdots + (1-\lambda_m)c_m^2.\end{align*}$

Thus,

(4) $\begin{align*}\frac{\langle f, (I-P)f\rangle}{\langle f, f\rangle} &= \frac{(1-\lambda_2)c_2^2 + \cdots + (1-\lambda_m)c_m^2}{c_2^2 + \cdots + c_m^2} \\&= (1-\lambda_2)a_2 + \cdots + (1-\lambda_m)a_m,\end{align*}$

where

$a_i = \frac{c_i^2}{c_2^2 + \cdots + c_m^2}.$

The coefficients $a_i$ are nonnegative and add to $1$ :

$a_2+\cdots+a_m = \frac{c_2^2+\cdots+c_m^2}{c_2^2+\cdots+c_m^2} = 1.$

Therefore, the smallest possible value for (4) is achieved by setting $a_2 = 1$ and $a_3 = \cdots = a_m = 0$ (equivalently, setting $c_3 = \cdots=c_m = 0$ ). Thus, we conclude

$\frac{\langle f, (I-P)f\rangle}{\langle f, f\rangle}\ge 1-\lambda_2,$

with equality when $f$ is a multiple of $\varphi_2$ .

Setting

Variance and Local Variance

Variance

Local Variance

Poincaré Inequalities

Poincaré Inequalities and Mixing

A View to the Continuous Setting

Poincaré Inequalities and Eigenvalues

Shift to Mean-Zero

Variance

Local Variance

Conclusion

Leave a Reply Cancel reply