Vandermonde Matrices are Merely Exponentially Ill-Conditioned

I am excited to share that my paper Does block size matter in randomized block Krylov low-rank approximation? has recently been released on arXiv. In that paper, we study the randomized block Krylov iteration (RBKI) algorithm for low-rank approximation. Existing results show that RBKI is efficient at producing rank-k approximations with a large block size of k or a small block size of 1, but these results give poor results for intermediate block sizes 1\ll b \ll k. But often these intermediate block sizes are the most efficient in practice. In our paper, we close this theoretical gap, showing RBKI is efficient for any block size 1\le b \le k. Check out the paper for details!

In our paper, the core technical challenge is understanding the condition number of a random block Krylov matrix of the form

    \[K = \begin{bmatrix} G & AG & \cdots & A^{t-1}G \end{bmatrix},\]

where A is an n\times n real symmetric positive semidefinite matrix and G is an n\times (n/b) random Gaussian matrix. Proving this result was not easy, and our proof required several ingredients. In this post, I want to talk about just one of them: Gautschi’s bound on the conditioning of Vandermonde matrices. (Check out Gautschi’s original paper here.)

Evaluating a Polynomial and Vandermonde Matrices

Let us begin with one the humblest but most important characters in mathematics, the univariate polynomial

    \[p_a(u) = a_1 + a_2 u+ \cdots + a_t u^{t-1}.\]

Here, we have set the degree to be t-1 and have written the polynomial p_a(\cdot) parametrically in terms of its vector of coefficients a = (a_1,\ldots,a_t). The polynomials of degree at most t-1 form a linear space of dimension t, and the monomials 1,u,\ldots,u^{t-1} provide a basis for this space. In this post, we will permit the coefficients a \in \complex^t to be complex numbers.

Given a polynomial, we often wish to evaluate it at a set of inputs. Specifically, let \lambda_1,\ldots,\lambda_s \in \complex be s (distinct) input locations. If we evaluate p_a at each number, we obtain a list of (output) values, which we denote by f = (p_a(\lambda_1),\ldots,p_a(\lambda_s)) of s (distinct) values, each of which given by the formula

    \[p_a(\lambda_i) = \sum_{j=1}^t \lambda_i^{j-1} a_j.\]

Observe that the outputs f are a nonlinear function of the input values \lambda but a linear function of the coefficients a. We may call the mapping a \mapsto f the coefficients-to-values map.

Every linear transformation between vectors can be realized as a matrix–vector product, and the matrix for the coefficients-to-values map is called a Vandermonde matrix V. It is given by the formula

    \[V= \begin{bmatrix} 1 & \lambda_1 & \lambda_1^2 & \cdots & \lambda_1^{t-1} \\ 1 & \lambda_2 & \lambda_2^2 & \cdots & \lambda_2^{t-1} \\ 1 & \lambda_3 & \lambda_3^2 & \cdots & \lambda_3^{t-1} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \lambda_s & \lambda_s^2 & \cdots & \lambda_s^{t-1} \end{bmatrix} \in \complex^{s\times t}.\]

The Vandermonde matrix defines the coefficients-to-values map in the sense that f = Va.

Interpolating by a Polynomial and Inverting a Vandermonde Matrix

Going forward, let us set s = t so the number of locations \lambda = (\lambda_1,\ldots,\lambda_t) equals the number of coefficients a = (a_1,\ldots,a_t). The Vandermonde matrix maps the vector of coefficients a to the vector of values f = (f_1,\ldots,f_t) = (p_a(\lambda_1),\ldots,p_a(\lambda_t)). Its inverse V^{-1} maps a set of values f to a set of coefficients a defining a polynomial p_a which interpolates the values f:

    \[p_a(\lambda_i) = f_i \quad \text{for } i =1,\ldots,t .\]

More concisely, multiplying the inverse of a Vandermonde matrix solves the problem of polynomial interpolation.

To solve the polynomial interpolation problem with Vandermonde matrices, we can do the following. Given values f, we first solve the linear system of equations Va = f, obtaining a vector of coefficients a = V^{-1}f. Then, define the interpolating polynomial

(1)   \[q(u) = a_1 + a_2 u + \cdots + a_t u^{t-1} \quad \text{with } a = V^{-1} f. \]

The polynomial q interpolates the values f at the locations \lambda_i, q(\lambda_i) = p_a(\lambda_i) = f_i.

Lagrange Interpolation

Inverting a the Vandermonde matrix is one way to solve the polynomial interpolation problem, but the polynomial interpolation can also be solved directly. To do so, first notice that we can construct a special polynomial (u - \lambda_2)(u - \lambda_3)\cdots(u-\lambda_t) that is zero at the locations \lambda_2,\ldots,\lambda_t but nonzero at the first location \lambda_1. (Remember that we have assumed that \lambda_1,\ldots,\lambda_t are distinct.) Further, by rescaling this polynomial to

    \[\ell_1(u) = \frac{(u - \lambda_2)(u - \lambda_3)\cdots(u-\lambda_t)}{(\lambda_1 - \lambda_2)(\lambda_1 - \lambda_3)\cdots(\lambda_1-\lambda_t)},\]

we obtain a polynomial whose value at \lambda_1 can be set to 1. Likewise, for each i, we can define a similar polynomial

(2)   \[\ell_i(u) = \frac{(u - \lambda_1)\cdots(u-\lambda_{i-1}) (u-\lambda_{i+1})\cdots(u-\lambda_t)}{(\lambda_i - \lambda_1)\cdots(\lambda_i-\lambda_{i-1}) (\lambda_i-\lambda_{i+1})\cdots(\lambda_i-\lambda_t)}, \]

which is 1 at \lambda_i and zero at \lambda_j for j\ne i. Using the Dirac delta symbol, we may write

    \[\ell_i(\lambda_j) = \delta_{ij} = \begin{cases} 1, & i = j, \\ 0, & i\ne j. \end{cases}\]

The polynomials \ell_i are called the Lagrange polynomials of the locations \lambda_1,\ldots,\lambda_q. Below is an interactive illustration of the second Lagrange polynomial \ell_2 associated with the points \lambda_i = i (with t = 5).

With the Lagrange polynomials in hand, the polynomial interpolation problem is easy. To obtain a polynomial whose values are f, simply multiply each Lagrange polynomial \ell_i by the value f_i and sum up, obtaining

    \[q(u) = \sum_{i=1}^t f_i \ell_i(u).\]

The polynomial q interpolates the values f. Indeed,

(3)   \[q(\lambda_j) = \sum_{i=1}^t f_i \ell_i(\lambda_j) = \sum_{i=1}^t f_i \delta_{ij} = f_j. \]

The interpolating polynomial computed is shown in the interactive display below:

From Lagrange to Vandermonde via Elementary Symmetric Polynomials

We now have two ways of solving the polynomial interpolation problem, the Vandermonde way (1) and the Lagrange way (3). Ultimately, the difference between these formulas is one of basis: The Vandermonde formula (1) expresses the interpolating polynomial q as a linear combination of monomials 1,u,\ldots,u^{t-1} and the Lagrange formula (3) expresses q as a linear combination of the Lagrange polynomials \ell_1,\ldots,\ell_n. To convert between these formulas, we just need to express the Lagrange polynomial basis in the monomial basis.

To do so, let us examine the Lagrange polynomials more closely. Consider first the case t = 4, and consider the fourth unnormalized Lagrange polynomial

    \[(u - \lambda_1) (u - \lambda_2) (u - \lambda_3).\]

Expanding this polynomial in the monomials u^i, we obtain the expression

    \[(u - \lambda_1) (u - \lambda_2) (u - \lambda_3) = u^3 - (\lambda_1 + \lambda_2 + \lambda_3) u^2 + (\lambda_1\lambda_2 + \lambda_1\lambda_3 + \lambda_2\lambda_3)u - \lambda_1\lambda_2\lambda_3.\]

Looking at the coefficients of this polynomial in u, we recognize some pretty distinctive expressions involving the \lambda_i‘s:

    \begin{align*}\text{coefficient of $u^2$} &= - (\lambda_1 + \lambda_2 + \lambda_3), \\\text{coefficient of $u$} &= \lambda_1\lambda_2 + \lambda_1\lambda_3 + \lambda_2\lambda_3, \\\text{coefficient of $1$} &=  -\lambda_1\lambda_2\lambda_3.\end{align*}

Indeed, these expressions are special. Up to a plus-or-minus sign, they are called the elementary symmetric polynomials of the locations \lambda_i. Specifically, given numbers \mu_1,\ldots,\mu_s, the kth elementary symmetric polynomial e_k(\mu_1,\ldots,\mu_s) is defined as the sum of all products \mu_{i_1} \mu_{i_2} \cdots \mu_{i_k} of k values, i.e.,

    \[e_k(\mu_1,\ldots,\mu_{t-1}) = \sum_{i_1 < i_2 < \cdots < i_k} \mu_{i_1}\mu_{i_2}\cdots \mu_{i_k}.\]

The zeroth elementary symmetric polynomial is 1 by convention.

The elementary symmetric polynomials appear all the time in mathematics. In particular, they are the coefficients of the characteristic polynomial of a matrix and feature heavily in the theory of determinantal point processes. For our purposes, the key observation will be that the elementary symmetric polynomials appear whenever one expands out an expression like (u + \mu_1) \cdots (u + \mu_k):

Lemma 1 (Expanding a product of linear functions). It holds that

    \[(u + \mu_1) (u + \mu_2) \cdots (u+\mu_k) = \sum_{j=0}^k e_{k-j}(\mu_1,\ldots,\mu_k) u^j.\]

Using this fact, we obtain an expression for the Lagrange polynomials in the monomial basis. Let \lambda_{-i} = (\lambda_1,\ldots, \lambda_{i-1},\lambda_{i+1},\ldots,\lambda_t) denote the list of locations without \lambda_i. Then the ith Lagrange polynomial is given by

    \[\ell_i(u) = \frac{\sum_{j=1}^t e_{t-j}(-\lambda_{-i})u^{j-1}}{\prod_{k\ne i} (\lambda_i - \lambda_k)}.\]

Not exactly a beautiful expression, but it will get the job done.

Indeed, we can write the interpolating polynomial as

    \[q(u) = \sum_{i=1}^t f_i\ell_i(u) = \sum_{i=1}^t \sum_{j=1}^t f_i \frac{e_{t-j}(-\lambda_{-i})}{\prod_{k\ne i} (\lambda_i - \lambda_k)} u^{j-1}.\]

To make progress, we interchange the order of summation and regroup:

    \[q(u) = \sum_{j=1}^t \sum_{i=1}^t f_i \frac{e_{t-j}(-\lambda_{-i})}{\prod_{k\ne i} (\lambda_i - \lambda_k)} u^{j-1} = \sum_{j=1}^t \left(\sum_{i=1}^t \frac{e_{t-j}(-\lambda_{-i})}{\prod_{k\ne i} (\lambda_i - \lambda_k)}\cdot f_i \right) u^{j-1}.\]

We see that the coefficients of the interpolating polynomial (in the monomial basis) are

    \[a_j = \sum_{i=1}^t \frac{e_{t-j}(-\lambda_{-i})}{\prod_{k\ne i} (\lambda_i - \lambda_k)}\cdot f_i.\]

But we also know that the coefficients are given by a = V^{-1}f. Therefore, we conclude that the entries of the inverse-Vandermonde matrix V^{-1} are

(4)   \[(V^{-1})_{ji} = \frac{e_{t-j}(-\lambda_{-i})}{\prod_{k\ne i} (\lambda_i - \lambda_k)}. \]

Vandermonde Matrices are Merely Exponentially Ill-Conditioned

Vandermonde matrices are notoriously ill-conditioned, meaning that small changes to the values f can cause large changes to the coefficients a. On its face, this might seem like the problem of polynomial interpolation itself is ill-conditioned, but this is too hasty a conclusion. After all, it is only the mapping from values f to coefficients a in the monomial basis that is ill-conditioned. Fortunately, there are much better, more numerically stable bases for representing a polynomial like the Chebyshev polynomials.

But these more stable methods of polynomial interpolation and approximation are not the subject of this post: Here, our task is to will be to characterize just how ill-conditioned the computation of a = V^{-1}f is. To characterize this ill-conditioning, we will utilize the condition number of the matrix V. Given a norm \uinorm{\cdot}, the condition number of V is defined to be

    \[\kappa_{\uinorm{\cdot}}(V) = \uinorm{V} \uinorm{\smash{V^{-1}}}.\]

For this post, we will focus on the case where \uinorm\cdot} is chosen to be the (operator) 1norm, defined as

    \[\norm{A}_1= \max_{x \ne 0} \frac{\norm{Ax}_1}{\norm{x}_1} \quad \text{where } \norm{x}_1 = \sum_i |x_i|.\]

The 1-norm has a simple characterization: It is the maximum sum of the absolute values of the entries in any column

(5)   \[\norm{A}_\infty = \max_j \sum_i |A_{ij}|. \]

We shall denote the 1-norm condition number \kappa_1(\cdot).

Bounding \norm{V}_1 is straightforward. Indeed, setting M = \max_{1\le i \le t} |\lambda_i| and using (5), we compute

    \[\norm{V}_1= \max_{1\le j \le t} \sum_{i=1}^t |\lambda_i|^{j-1} \le \max_{1\le j \le t} tM^{j-1} = t\max\{1,M^{t-1}\}.\]

An even weaker, but still useful, bound is

    \[\norm{V}_1\le t(1+M)^{t-1}.\]

The harder task is bounding \norm{\smash{V^{-1}}}_1. Fortunately, we have already done most of the hard work needed to bound this quantity. Using our expression (4) for the entries of V^{-1} and using the formula (5) for the 1-norm, we have

    \[\norm{\smash{V^{-1}}}_1= \max_{1\le j \le t} \sum_{i=1}^t |(V^{-1})_{ij}| = \max_{1\le j \le t}\frac{ \sum_{i=1}^t |e_{t-i}(-\lambda_{-j})|}{\prod_{k\ne j} |\lambda_j- \lambda_k|}.\]

To bound this expression, we make use of the following “triangle inequality” for elementary symmetric polynomials

    \[|e_k(\mu_1,\ldots,\mu_s)| \le e_k(|\mu_1|,\ldots,|\mu_s|).\]

Using this bound and defining |\lambda_{-j}| = (|\lambda_1|,\ldots,|\lambda_{j-1}|,|\lambda_{j+1}|,\ldots,|\lambda_t|), we obtain

    \[\norm{\smash{V^{-1}}}_1\le \max_{1\le j \le t}\frac{ \sum_{i=1}^t e_{t-i}(|\lambda_{-j}|)}{\prod_{k\ne j} |\lambda_j- \lambda_k|}.\]

We now use the Lemma 1 with u = 1 to obtain the expression

(6)   \[\norm{\smash{V^{-1}}}_1\le \max_{1\le j \le t}\frac{ \prod_{k\ne j} (1 + |\lambda_k|)}{\prod_{k\ne j} |\lambda_j- \lambda_k|}. \]

Equation (6) is Gautschi’s bound on the norm of the inverse of a Vandermonde matrix, the result we were working towards proving.

Often, it is helpful to simply Gautschi’s bound a bit. Setting M = \max_{1\le i \le t} |\lambda_i| as above, the numerator is bounded as (1+M)^{t-1}. To bound the denominator, let \mathrm{gap} \coloneqq \min_{k \le j} |\lambda_j - \lambda_k| be the smallest distance between two locations. Using M and \mathrm{gap}, we can weaken the bound (6) to obtain

    \[\norm{\smash{V^{-1}}}_1\le \left(\frac{1+M}{\mathrm{gap}}\right)^{t-1}.\]

Combining this with our bound on \norm{V}_\infty from above, we obtain a bound on the condition number

    \[\kappa_1(V) \le t \left( \frac{(1+M)^2}{\mathrm{gap}} \right)^{t-1}.\]

We record these results:

Theorem 2 (Gautschi’s bound, simplified). Introduce M = \max_{1\le i \le t} |\lambda_i| and \mathrm{gap} \coloneqq \min_{k < j} |\lambda_j - \lambda_k|. Then

    \[\norm{\smash{V^{-1}}}_1\le \left(\frac{1+M}{\mathrm{gap}}\right)^{t-1}\]

and

    \[\kappa_1(V) \le t \left( \frac{(1+M)^2}{\mathrm{gap}} \right)^{t-1}.\]

Gautschi’s bound suggests that Vandermonde matrices can be very ill-conditioned, which is disappointing. But Gautschi’s bound also shows that Vandermonde matrices are merely exponentially ill-conditioned—that is, they are not worse than exponentially conditioned. The fact that Vandermonde matrices are only exponentially ill-conditioned plays a crucial role in our analysis of randomized block Krylov iteration.

Gautschi’s Bound as a Robust Version of the Fundamental Theorem of Algebra

The fundamental theorem of algebra states that a degree-(t-1) polynomial has precisely t-1 roots. Consequently, at t locations, it must be nonzero at least one. But how nonzero must the polynomial be at that one location? How large must it be? On this subject, the fundamental theorem of algebra is moot. However, Gautschi’s bound provides an answer.

To answer this question, we ask: What is the minimum possible size \norm{f}_1 of the values f = Va = (p_a(\lambda_1),\ldots,p_a(\lambda_t))? Well, if we set all the coefficients a = (0,\ldots,0) to zero, then f = 0 as well. So to avoid this trivial case, we should enforce a normalization condition on the coefficient vector a, say \norm{a}_1 = 1. With this setting, we are ready to compute. Begin by observing that

    \[\min_{\norm{a}_1 = 1} \norm{Va}_1 = \min_{a\ne 0} \frac{\norm{Va}_1}{\norm{a}_1}.\]

Now, we make the change of variables f = V^{-1}a, obtaining

    \[\min_{\norm{a}_1 = 1} \norm{Va}_1 = \min_{a\ne 0} \frac{\norm{Va}_1}{\norm{a}_1} = \min_{f\ne 0} \frac{\norm{f}_1}{\norm{\smash{V^{-1}f}}_1}.\]

Now, take the inverse of both sides to obtain

    \[\left(\min_{\norm{a}_1 = 1} \norm{Va}_1\right)^{-1} = \max_{f\ne 0} \frac{\norm{\smash{V^{-1}f}}_1}{\norm{f}_1} = \norm{\smash{V^{-1}}}_1.\]

Ergo, we conclude

    \[\min_{\norm{a}_1 = 1} \norm{Va}_1 = \norm{\smash{V^{-1}}}_1^{-1}.\]

Indeed, this relation holds for all operator norms:

Proposition 3 (Minimum stretch). For vector norm \uinorm{\cdot} and its induced operator norm \uinorm{\cdot}, it holds that for any square invertible matrix A that

    \[\min_{\uinorm{v} = 1} \uinorm{Av} = \min_{v\ne 0} \frac{\uinorm{Av}}{\uinorm{v}} = \uinorm{\smash{A^{-1}}}^{-1}.\]

Using this result, we obtain the lower bound \norm{f}_1\ge \norm{a}_1/\norm{\smash{V^{-1}}}_1 on the values f of a polynomial with coefficients a. Combining with Gautschi’s bound gives the following robust version of the fundamental theorem of algebra:

Theorem 4 (Robust fundamental theorem of algebra). Fix a polynomial p(u) = a_1 + a_2 t + \cdots + a_t u^{t-1} and locations \lambda_1,\ldots,\lambda_t. Define M = \max_{1\le i \le t} |\lambda_i| and \mathrm{gap} \coloneqq \min_{k \le j} |\lambda_j - \lambda_k|. Then

    \[|p(\lambda_1)| + \cdots + |p(\lambda_t)| \ge \left(\frac{\mathrm{gap}}{1+M}\right)^{t-1} (|a_1| + \cdots + |a_t|).\]

Thus, at t locations, a degree-(t-1) polynomial must be nonzero at least one point. In fact, the sum of the values at these t locations must be no worse than exponentially small in t.

Leave a Reply

Your email address will not be published. Required fields are marked *