# The Vandermonde Decomposition

In this post, I want to discuss a beautiful and somewhat subtle matrix factorization known as the Vandermonde decomposition that appears frequently in signal processing and control theory. We’ll begin from the very basics, introducing the controls-and-signals context, how the Vandermonde decomposition comes about, and why it’s useful. By the end, I’ll briefly share how we can push the Vandermonde decomposition beyond matrices to the realm of tensors, which will can allow us to separate mixed signals from multiple measurements. This tensorial generalization plays an important role in my paper -decompositions, sparse component analysis, and the blind separation of sums of exponentials, joint work with Nithin Govindajaran and Lieven De Lathauwer, which recently appeared in the SIAM Journal of Matrix Analysis and Applications.

## Finding the Frequencies

Suppose I give you a short recording of a musical chord consisting of three notes. How could you determine which three notes they were? Mathematically, we can represent such a three-note chord as a combination of scaled and shifted cosine functions

(1)

We are interested in obtaining the (angular) frequencies , , and .

In the extreme limit, when we are given the values of the signal for all , both positive and negative, the frequencies are immediately given by taking a Fourier transform of the function . In practice, we only have access to the function at certain times which we assume are equally spaced

Given the samples

we could try to identify , , and using a discrete Fourier transform.1The discrete Fourier transform can be computed very quickly using the fast Fourier transform, as I discussed in a previous post. Unfortunately, this generally requires a large number of samples to identify , , and accurately. (The accuracy scales roughly like , where is the number of samples.) We are interested in finding a better way to identify the frequencies.

Now that we’ve moved from the function , defined for any real input , to a set of samples it will be helpful to rewrite our formula (1) for in a different way. By Euler’s identity, the cosines can be rewritten as

As a consequence, we can rewrite one of the frequency components in (1) as

Here, and are complex coefficients and which contain the same information as the original parameters and . Now notice that we are only interest in values which are multiples of the spacing . Thus, our frequency component can be further rewritten as

where and . Performing these reductions, our samples take the form

(2)

We’ve now reformulated our frequency problems in identifying the parameters and in the relation (2) from a small number of measurements .

## Frequency Finding as a Matrix Factorization

We will return to the algorithmic problem of identifying the parameters in the relation (2) from measurements in a little bit. First, we will see that (2) can actually be written as a matrix factorization. Understanding computations by matrix factorization has been an extremely successful paradigm in applied mathematics, and we will see in this post how viewing (2) as a matrix factorization can be very useful.

While it may seem odd at first,2As pointed out to me on math stack exchange, one reason forming the Hankel matrix is sensible is because it effectively augments the sequence of numbers into a sequence of vectors given by the columns of . This can reveal patterns in the sequence which are less obvious when it is represented as given just as numbers. For instance, any seven columns of are linearly dependent, a surprising fact since the columns of have length which can be much larger than seven. In addition, as we will soon effectively exploit later, vectors in the nullspace of (or related Hankel matrices derived from the sequence) give recurrence relations obeyed by the sequence. This speaks to a general phenomenon where properties of sequence (say, arising from snapshots of a dynamical system) can sometimes become more clear by this procedure of delay embedding. it will be illuminating to repackage the measurements as a matrix:

(3)

Here, we have assumed is odd. The matrix is known as the Hankel matrix associated with the sequence . Observe that the entry in position of depends only on the sum of the indices and , . (We use a zero-indexing system to label the rows and columns of where, for instance, the first row of is row .)

Let’s see how we can interpret the frequency decomposition (2) as a factorization of the Hankel matrix . We first write out using (2):

(4)

The power was just begging to be factorized as , which we did. Equation (4) almost looks like the formula for the product of two matrices with entries , so it makes sense to introduce the matrix with entry . This is a so-called Vandermonde matrix associated with and has the form

If we also introduce the diagonal matrix , the formula (4) for can be written as the matrix factorization3In the Vandermonde decomposition , the factor appears transposed even when is populated with complex numbers! This differs from the usual case in linear algebra where we use the conjugate transpose rather than the ordinary transpose when working with complex matrices. As a related issue, observe that if at least one of the measurements is a (non-real) complex number, the Hankel matrix is symmetric but not Hermitian.

(5)

This is the Vandermonde decomposition of the Hankel matrix , a factorization of as a product of the transpose of a Vandermonde matrix, a diagonal matrix, and that same Vandermonde matrix.

The Vandermonde decomposition immediately tells us all the information and describing our sampled recording via (2). Thus, the problem of determining and is equivalent to finding the Vandermonde decomposition (5) of the Hankel matrix .

## Computing the Vandermonde Decomposition: Prony’s Method

Computing the Vandermonde decomposition accurately can be a surprisingly hard task, particularly if the measurements are corrupted by even a small amount of measurement error. In view of this, I want to present a very classical way of computing this decomposition (dating back to 1795!) known as Prony’s method. This method is conceptually simple and will be a vehicle to continue exploring frequency finding and its connection with Hankel matrices. It remains in use, though it’s accuracy may be significantly worse compared to other methods.

As a first step to deriving Prony’s method, let’s reformulate the frequency finding problem in a different way. Sums of cosines like the ones in our expression (1) for the function often appear as the solution to a (linear) ordinary differential equation (ODE). This means that one way we could find the frequencies comprising would be to find a differential equation which satisfies. Together with the initial condition , determining all the frequencies would be very straightforward.

Since we only have access to samples of at regular time intervals, we will instead look for the “discrete-time” analog of a linear ODE, a linear recurrence relation. This is an expression of the form

(6)

In our case, we’ll have because there are six terms in the formula (2) for . Together with initial conditions , such a recurrence will allow us to determine the parameters and in our formula (2) for our sampled recordings and hence also allow us to compute the Vandermonde decomposition (5).

Observe that the recurrence (6) is a linear equation in the variables . A very good rule of thumb in applied mathematics is to always write down linear equations in matrix–vector notation in see how it looks. Doing this, we obtain

(7)

Observe that the matrix on the right-hand side of this equation is also a Hankel matrix (like in (3)) formed from the samples . Call this Hankel matrix . Unlike in (3), is rectangular. If is much larger than , will be tall, possessing many more rows than columns. We assume going forward.4 would also be fine for our purposes, but we assume to illustrate this highly typical case.

Let’s write (7) a little more compactly as

(8)

where we’ve introduced for the vector on the left-hand side of (7) and collected the recurrence coefficients into a vector . For a typical system of linear equations like (8), we would predict the system to have no solution : Because has more rows than columns (if ), the system equations (8) has more equations than unknowns. Fortunately, we are not in the typical case. Despite the fact that we have more equations than unknowns, the linear equations (8) have a unique solution .5This solution can be computed by solving the system of linear equations In particular, the matrix on the right-hand side of this equation is guaranteed to be nonsingular under our assumptions. Using the Vandermonde decomposition, can you see why? The existence of a unique solution is a consequence of the fact that the samples satisfy the formula (2). As a fun exercise, you might want to verify the existence of a unique satisfying (8)!

As a quick aside, if the measurements are corrupted by small measurement errors, then the equations (8) will usually not possess a solution. In this case, it would be appropriate to find the least squares solution to equation (8) as a way of mitigating these errors.

Hurrah! We’ve found the coefficients providing a recurrence relation (6) for our measurements . All that is left is to find the parameters and in our signal formula (2) and the Vandermonde decomposition (5). Fortunately, this is just a standard computation for linear recurrence relations, nicely paralleling the solution of (homogenous) linear ODEs by means of the so-called “characteristic equation”. I’ll go through fairly quickly since this material is well-explained elsewhere on the internet (like Wikipedia). Let’s guess that our recurrence (6) has a solution of the form ; we seek to find all complex numbers for which this is a bonafide solution. Plugging this solution into the formula (6) for gives

(9)

This is the so-called characteristic equation for the recurrence (6). As a single-variable polynomial equation of degree six, it has six complex solutions . These numbers are precisely those numbers which appear in the sequence formula (2) and the Vandermonde decomposition (5).

Finally, we need to compute the coefficients . But this is easy. Observe that the formula (2) provides the following system of linear equations for :

(10)

Again, this system of equations will have a unique solution if the measurements are uncorrupted by errors (and can be solved in the least squares sense if corrupted). This gives , completing our goal of computing the parameters in the formula (2) or, equivalently, finding the Vandermonde decomposition (5).

We have accomplished our goal of computing the Vandermonde decomposition. The approach by which we did so is known as Prony’s method, as mentioned in the introduction to this section. As suggested, this method may not always give high-accuracy results. There are two obvious culprits that jump out about why this is the case. Prony’s method requires solving for the roots of the polynomial equation (9) expressed “in the monomial basis” and solving a system of linear equations (10) with a (transposed) Vandermonde matrix. Both of these problems can be notoriously ill-conditioned and thus challenging to solve accurately and may require the measurements to be done to very high accuracy. Notwithstanding this, Prony’s method does useful results in some cases and forms the basis for potentially more accurate methods, such as those involving generalized eigenvalue problems.

## Separating Signals: Extending the Vandermonde Decomposition to Tensors

In our discussion of the frequency identification problem, the Vandermonde decomposition (5) has effectively been an equivalent way of showing the samples are a combination of exponentials . So far, the benefits of the matrix factorization perspective have yet to really reveal themselves.

So what are the benefits of the Vandermonde decompostions? A couple of nice observations related to the Vandermonde decomposition and the “Hankelization” of the signals have already been lurking in the background. For instance, the rank of the Hankel matrix is the number of frequency components needed to describe the samples and the representation of the samples as a mixture of exponentials is uniquely determined only if the matrix does not have full rank; I have a little more to say about this at the very end. There are also benefits to certain computational problems; one can use Vandermonde decompositions to compute super high accuracy singular value decompositions of Hankel matrices.

The power of the Vandermonde decomposition really starts to shine when we go beyond the basic frequency finding problem we discussed by introducing more signals. Suppose now there are three short recordings , , and . (Here, the superscript denotes an index rather than differentiation.) Each signal is a weighted mixture of three sources , , and , each of which plays a musical chord of three notes (thus representable as a sum of cosines as in (1)). One can think of the sources of being produced three different musical instruments at different places in a room and the recordings , , and being taken from different microphones in the room.6This scenario of instruments and microphones ignores the finite propagation speed of sound, which also would introduce time delays in the sources in the recorded signals. We effectively treat the speed of sound as being instantaneous. Our goal is now not just to identify the musical notes in the recordings but also to identify how to assign those notes to reconstruct the source signals , , and .

Taking inspiration from earlier, we record samples for each recording and form each collection of samples into a Hankel matrix

Here comes the crazy part: Stack the Hankelized recordings , , and as slices of a tensor . A tensor, in this context, just means a multidimensional array of numbers. Just as a vector is a one-dimensional array and a matrix is a two-dimensional array, a tensor could have any number of dimensions. In our case, we need just three. If we use a MATLAB-esque indexing notation, is a array given by

The remarkable thing is that the source signals can be determined (under appropriate conditions) by computing a special kind of Vandermonde decomposition of the tensor ! (Specifically, the required decomposition is a Vandermonde-structured -block term decomposition of the tensor .) Even more cool, this decomposition can be computed using general-purpose software like Tensorlab.

If this sounds interesting, I would encourage you to check out my recently published paper -decompositions, sparse component analysis, and the blind separation of sums of exponentials, joint work with Nithin Govindajaran and Lieven De Lathauwer and recently published in the SIAM Journal on Matrix Analysis and Applications. In the paper, we explain what this -decomposition is and how applying it to can be used to separate mixtures of exponentials signals from the resulting Vandermonde structure, an idea originating in the work of De Lathauewer. A very important question for these signal separation problems is that of uniqueness. Given the three sampled recordings (comprising the tensor ), is there just one way of unscrambling the mixtures into different sources or multiple? If there are multiple, then we might have possibly computed the wrong one. If there is just a single unscrambling, though, then we’ve done our job and unmixed the scrambled signals. The uniqueness of these tensor decompositions is fairly complicated math, and we survey existing results and prove new ones in this paper.7One of our main technical contributions is a new notion of uniqueness of -decompositions which we believe is nicely adapted to the signal separation context. Specfically, we prove mathematized versions of the statement “if the source signals are sufficiently different from each others and the measurements of sufficiently high quality, then the signals can uniquely be separated”.

## Conclusions, Loose Ends, and Extensions

The central idea that we’ve been discussing is how it can be useful to convert between a sequence of observations and a special matricization of this sequence into a Hankel matrix (either square, as in (3), or rectangular, as in (7)). By manipulating the Hankel matrix, say, by computing its Vandermonde decomposition (5), we learn something about the original signal, namely a representation of the form (2).

This is a powerful idea which appears implicitly or explicitly throughout various subfields of mathematics, engineering, and computation. As with many other useful ideas, this paradigm admits many natural generalizations and extensions. We saw one already in this post, where we extended the Vandermonde decomposition to the realm of tensors to solve signal separation problems. To end this post, I’ll place a few breadcrumbs at the beginning of a few of the trails of these generalizations for any curious to learn more, wrapping up a few loose ends on the way.

Is the Vandermonde Decomposition Unique?
A natural question is whether the Vandermonde decomposition (5) is unique. That is, is it possible that there exists two Vandermonde decompositions

of the same (square) Hankel matrix ? This is equivalent to whether the frequency components can be uniquely determined from the measurements .

Fortunately, the Vandermonde decomposition is unique if (and only if) the matrix is a rank-deficient matrix. Let’s unpack this a little bit. (For those who could use a refresher on rank, I have a blog post on precisely this topic.) Note that the Vandermonde decomposition is a rank factorization8Rank factorizations are sometimes referred to as “rank-revealing factorizations”. I discuss my dispreference for this term in my blog post on low-rank matrices. since has rows, has full (row) rank, and is invertible. This means that if take enough samples of a function which is a (finite) combinations of exponentials, the matrix will be rank-deficient and the Vandermonde decomposition unique.9The uniqueness of the Vandermonde decomposition can be proven by showing that, in our construction by Prony’s method, the ‘s, ‘s, and ‘s are all uniquely determined. If too few samples are taken, then does not contain enough information to determine the frequency components of the signal and thus the Vandermonde decomposition is non-unique.

Does Every Hankel Matrix Have a Vandermonde Decomposition?
This post has exclusively focused on a situation where we are provided with sequence we know to be represented as a mixture of exponentials (i.e., taking the form (2)) from which the existence of the Vandermonde decomposition (5) follows immediately. What if we didn’t know this were the case, and we were just given a (square) Hankel matrix . Is guaranteed to possess a Vandermonde decomposition of the form (5)?

Unfortunately, the answer is no; there exist Hankel matrices which do not possess a Vandermonde decomposition. The issue is related to the fact that the appropriate characteristic equation (analogous to (9)) might possess repeated roots, making the solutions to the recurrence (6) not just take the form but also and perhaps , , etc.

Are There Cases When the Vandermonde Decomposition is Guaranteed To Exist?
There is one natural case when a (square) Hankel matrix is guaranteed to possess a Vandermonde decomposition, namely when the matrix is nonsingular/invertible/full-rank. Despite this being a widely circulated fact, I am unaware of a simple proof for why this is the case. Unfortunately, there is not just one but infinitely many Vandermonde decompositions for a nonsingular Hankel matrix, suggesting these decompositions are not useful for the frequency finding problem that motivated this post.
What If My Hankel Matrix Does Not Possess a Vandermonde Decomposition?
As discussed above, a Hankel matrix may fail to have a Vandermonde decomposition if the characteristic equation (a la (9)) has repeated roots. This is very much analogous to the case of a non-diagonalizable matrix for which the characteristic polynomial has repeated roots. In this case, while diagonalization is not possible, one can “almost-diagonalize” the matrix by reducing it to its Jordan normal form. In total analogy, every Hankel matrix can be “almost Vandermonde decomposed” into a confluent Vandermonde decomposition (a discovery that appears to have been made independently several times). I will leave these links to discuss the exact nature of this decomposition, though I warn any potential reader that these resources introduce the decomposition first for Hankel matrices with infinitely many rows and columns before considering the finite case as we have. One is warned that while the Vandermonde decomposition is always a rank decomposition, the confluent Vandermonde decomposition is not guaranteed to be one.10Rather, the confluent Vandermonde decomposition is a rank decomposition for an infinite extension of a finite Hankel matrix. Consider the Hankel matrix This matrix has rank-two but no rank-two confluent Vandermonde decomposition. The issue is that when extended to an infinite Hankel matrix this (infinite!) matrix has a rank exceeding the size of the original Hankel matrix .
The Toeplitz Vandermonde Decomposition
Just as it proved useful to arrange samples into a Hankel matrix, it can also be useful to form them into a Toeplitz matrix

The Toeplitz matrix has the appealing propery that the matrix–vector product computes a (discrete) convolution of the sampled signal with the sampled signal which has all sorts of uses in signal processing and related fields.11I discuss Toeplitz matrices and a fast algorithm to compute the product using the fast Fourier transform more in a blog post I wrote about the subject.

One can interconvert between Hankel and Toeplitz matrices by reversing the order of the rows. As such, to the extent to which Hankel matrices possess Vandermonde decompositions (with all the asterisks and fine print just discussed), Toeplitz matrices do as well but with the rows of the first factor reversed:

There is a special and important case where more is true. If a Toeplitz matrix is (Hermitian) positive semidefinite, then always possesses a Vandermonde decomposition of the form

where is a Vandermonde matrix associated with parameters which are complex numbers of absolute value one and is a diagonal matrix with real positive entries.12The keen-eyed reader will note that appears conjugate transposed in this formula rather than transposed as in the Hankel Vandermonde decomposition (5). This Vandermonde decomposition is unique if and only if is rank-deficient. Positive semidefinite Toeplitz matrices are important as they occur as autocorrelation matrices which effectively describe the similarity between a signal and different shifts of itself in time. Autocorrelation matrices appear under different names in everything from signal processing to random processes to near-term quantum algorithms (a topic near and dear to my heart). A delightfully simple and linear algebraic derivation of this result is given by Yang and Xie (see Theorem 1).13Unfortunately, Yang and Xie incorrectly claim that every Toeplitz matrix possesses a rank factorization Vandermonde decomposition of the form where is a Vandermonde matrix populated with entries on the unit circle and is a diagonal matrix of possibly *complex* entries. This claim is disproven by the example . This decomposition can be generalized to infinite positive semidefinite Toeplitz matrices (appropriately defined).14Specifically, one can show that an infinite positive semidefinite Toeplitz matrix (appropriately defined) also has a “Vandermonde decomposition” (appropriately defined). This result is often known as Herglotz’s theorem and is generalized by the Bochner–Weil theorem.