The Vandermonde Decomposition

In this post, I want to discuss a beautiful and somewhat subtle matrix factorization known as the Vandermonde decomposition that appears frequently in signal processing and control theory. We’ll begin from the very basics, introducing the controls-and-signals context, how the Vandermonde decomposition comes about, and why it’s useful. By the end, I’ll briefly share how we can push the Vandermonde decomposition beyond matrices to the realm of tensors, which will can allow us to separate mixed signals from multiple measurements. This tensorial generalization plays an important role in my paper $(L_r,L_r,1)$ -decompositions, sparse component analysis, and the blind separation of sums of exponentials, joint work with Nithin Govindajaran and Lieven De Lathauwer, which recently appeared in the SIAM Journal of Matrix Analysis and Applications.

Finding the Frequencies

Suppose I give you a short recording of a musical chord consisting of three notes. How could you determine which three notes they were? Mathematically, we can represent such a three-note chord as a combination of scaled and shifted cosine functions

(1) $f(t) = a_0 \cos(\omega_0 t - \phi_0) + a_1 \cos(\omega_1 t - \phi_1) + a_2 \cos(\omega_2 t - \phi_2).$

We are interested in obtaining the (angular) frequencies $\omega_1$ , $\omega_2$ , and $\omega_3$ .

In the extreme limit, when we are given the values of the signal for all $t$ , both positive and negative, the frequencies are immediately given by taking a Fourier transform of the function $f(\cdot)$ . In practice, we only have access to the function $f$ at certain times $t_0,\ldots,t_{n-1}$ which we assume are equally spaced

$t_j = j\Delta t \quad \textnormal{for} \quad j = 0,1,2,\ldots,n-1.$

Given the samples

$f_j = f(t_j) \quad \textnormal{for} \quad j = 0,1,2,\ldots,n-1$

we could try to identify $\omega_1$ , $\omega_2$ , and $\omega_3$ using a discrete Fourier transform.¹ Unfortunately, this generally requires a large number of samples to identify $\omega_1$ , $\omega_2$ , and $\omega_3$ accurately. (The accuracy scales roughly like $1/n$ , where $n$ is the number of samples.) We are interested in finding a better way to identify the frequencies.

Now that we’ve moved from the function $f(\cdot)$ , defined for any real input $t$ , to a set of samples $f_0,f_1,\ldots,f_{n-1}$ it will be helpful to rewrite our formula (1) for $f$ in a different way. By Euler’s identity, the cosines can be rewritten as

$\cos \alpha = \frac{\mathrm{e}^{\mathrm{i} \alpha}+\mathrm{e}^{-\mathrm{i} \alpha}}{2}.$

As a consequence, we can rewrite one of the frequency components in (1) as

$a_0 \cos(\omega_0 t - \phi_0) = d_0 \mathrm{e}^{\mathrm{i} \omega_0t} + d_1 \mathrm{e}^{-\mathrm{i} \omega_0t}.$

Here, $d_0$ and $d_1$ are complex coefficients $d_0 = a_0 \mathrm{e}^{-\mathrm{i} \phi_0}/2$ and $d_1 = a_0 \mathrm{e}^{\mathrm{i} \phi_0}/2$ which contain the same information as the original parameters $a_0$ and $\phi_0$ . Now notice that we are only interest in values $t_j = j\, \Delta t$ which are multiples of the spacing $\Delta t$ . Thus, our frequency component can be further rewritten as

$a_0 \cos(\omega_0 t_j - \phi_0) = d_0 z_0^j + d_1 z_1^j$

where $z_0 := \mathrm{e}^{\mathrm{i} \omega_0\, \Delta t}$ and $z_1 := \mathrm{e}^{-\mathrm{i}\omega_0\, \Delta t}$ . Performing these reductions, our samples $f_j$ take the form

(2) $f_j = d_0 z_0^j + d_1 z_1^j + \cdots + d_5 z_5^j.$

We’ve now reformulated our frequency problems in identifying the parameters $d_0,\ldots,d_5$ and $z_0,\ldots,z_5$ in the relation (2) from a small number of measurements $f_0,f_1,\ldots,f_{n-1}$ .

Frequency Finding as a Matrix Factorization

We will return to the algorithmic problem of identifying the parameters in the relation (2) from measurements in a little bit. First, we will see that (2) can actually be written as a matrix factorization. Understanding computations by matrix factorization has been an extremely successful paradigm in applied mathematics, and we will see in this post how viewing (2) as a matrix factorization can be very useful.

While it may seem odd at first,² it will be illuminating to repackage the measurements $f_0,f_1,\ldots,f_{n-1}$ as a matrix:

(3) $H = \begin{bmatrix} f_0 & f_1 & f_2 & \cdots & f_{(n-1)/2} \\f_1 & f_2 & f_3 & \cdots & f_{(n-1)/2+1} \\f_2 & f_3 & f_4 & \cdots & f_{(n-1)/2+2} \\\vdots & \vdots & \vdots & \ddots & \vdots \\f_{(n-1)/2} & f{(n-1)/2+1} & f{(n-1)/2+2} & \cdots & f_{n-1} \end{bmatrix}.$

Here, we have assumed $n$ is odd. The matrix $H$ is known as the Hankel matrix associated with the sequence $f_0,\ldots,f_{n-1}$ . Observe that the entry in position $ij$ of $H$ depends only on the sum of the indices $i$ and $j$ , $H_{ij} = f_{i+j}$ . (We use a zero-indexing system to label the rows and columns of $H$ where, for instance, the first row of $H$ is row $0$ .)

Let’s see how we can interpret the frequency decomposition (2) as a factorization of the Hankel matrix $H$ . We first write out $H_{ij}$ using (2):

(4) $H_{ij} = f_{i+j} = \sum_{k=0}^5 d_k z_k^{i+j} = \sum_{k=0}^5 d_k z_k^i \cdot z_k^j.$

The power $z_k^{i+j}$ was just begging to be factorized as $z_k^i\cdot z_k^j$ , which we did. Equation (4) almost looks like the formula for the product of two matrices with entries $z_k^i$ , so it makes sense to introduce the $6\times (n-1)/2$ matrix $V$ with entry $V_{ki} = z_k^i$ . This is a so-called Vandermonde matrix associated with $z_0,\ldots,z_5$ and has the form

$V = \begin{bmatrix}z_0^0 & z_0^1 & z_0^2 & \cdots & z_0^{(n-1)/2} \\z_1^0 & z_1^1 & z_1^2 & \cdots & z_1^{(n-1)/2} \\\vdots & \vdots & \vdots & \ddots & \vdots \\z_5^0 & z_5^1 & z_5^2 & \cdots & z_5^{(n-1)/2}\end{bmatrix}.$

If we also introduce the $6\times 6$ diagonal matrix $D = \operatorname{diag}(d_0,d_1,\ldots,d_5)$ , the formula (4) for $H$ can be written as the matrix factorization³

(5) $H = V^\top D V.$

This is the Vandermonde decomposition of the Hankel matrix $H$ , a factorization of $H$ as a product of the transpose of a Vandermonde matrix, a diagonal matrix, and that same Vandermonde matrix.

The Vandermonde decomposition immediately tells us all the information $d_0,\ldots,d_5$ and $z_0,z_1,\ldots,z_5$ describing our sampled recording $f_0,\ldots,f_{n-1}$ via (2). Thus, the problem of determining $d_0,\ldots,d_5$ and $z_0,z_1,\ldots,z_5$ is equivalent to finding the Vandermonde decomposition (5) of the Hankel matrix $H$ .

Computing the Vandermonde Decomposition: Prony’s Method

Computing the Vandermonde decomposition accurately can be a surprisingly hard task, particularly if the measurements $f_0,f_1,\ldots,f_{n-1}$ are corrupted by even a small amount of measurement error. In view of this, I want to present a very classical way of computing this decomposition (dating back to 1795!) known as Prony’s method. This method is conceptually simple and will be a vehicle to continue exploring frequency finding and its connection with Hankel matrices. It remains in use, though it’s accuracy may be significantly worse compared to other methods.

As a first step to deriving Prony’s method, let’s reformulate the frequency finding problem in a different way. Sums of cosines like the ones in our expression (1) for the function $f(\cdot)$ often appear as the solution to a (linear) ordinary differential equation (ODE). This means that one way we could find the frequencies comprising $f(\cdot)$ would be to find a differential equation which $f(\cdot)$ satisfies. Together with the initial condition $f(0)$ , determining all the frequencies would be very straightforward.

Since we only have access to samples $f_0, f_1,\ldots,f_{n-1}$ of $f(\cdot)$ at regular time intervals, we will instead look for the “discrete-time” analog of a linear ODE, a linear recurrence relation. This is an expression of the form

(6) $f_m = c_{k-1} f_{m-1} + \cdots + c_1f_{m-k+1}+ c_0 f_{m-k} \quad \textnormal{for every} \quad m = k,\,{k+1},\,{k+2},\cdots.$

In our case, we’ll have $k = 6$ because there are six terms in the formula (2) for $f_j$ . Together with initial conditions $f_0,f_1,\ldots,f_{k-1}$ , such a recurrence will allow us to determine the parameters $z_0,\ldots,z_5$ and $d_0,\ldots,d_5$ in our formula (2) for our sampled recordings $f_0,\ldots,f_{n-1}$ and hence also allow us to compute the Vandermonde decomposition (5).

Observe that the recurrence (6) is a linear equation in the variables $c_0,\ldots,c_5$ . A very good rule of thumb in applied mathematics is to always write down linear equations in matrix–vector notation in see how it looks. Doing this, we obtain

(7) $\begin{bmatrix} f_6 \\ f_7 \\ \vdots \\ f_{n-1} \end{bmatrix} = \underbrace{\begin{bmatrix} f_0 & f_1 & f_2 & f_3 & f_4 & f_5 \\ f_1 & f_2 & f_3 & f_4 & f_5 & f_6 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ f_{n-7} & f_{n-6} & f_{n-5} & f_{n-4} & f_{n-3} & f_{n-2} \end{bmatrix}}_{=F}\begin{bmatrix} c_0 \\ c_1 \\ \vdots \\ c_5 \end{bmatrix}.$

Observe that the matrix on the right-hand side of this equation is also a Hankel matrix (like $H$ in (3)) formed from the samples $f_0,\ldots,f_{n-1}$ . Call this Hankel matrix $F$ . Unlike $H$ in (3), $F$ is rectangular. If $n$ is much larger than $6$ , $F$ will be tall, possessing many more rows than columns. We assume $n > 12$ going forward.⁴

Let’s write (7) a little more compactly as

(8) $f_{6 \, :\, n-1} = F c,$

where we’ve introduced $f_{6\,:\, n-1}$ for the vector on the left-hand side of (7) and collected the recurrence coefficients $c_0,\ldots,c_5$ into a vector $c$ . For a typical system of linear equations like (8), we would predict the system to have no solution $c$ : Because $F$ has more rows than columns (if $n > 12$ ), the system equations (8) has more equations than unknowns. Fortunately, we are not in the typical case. Despite the fact that we have more equations than unknowns, the linear equations (8) have a unique solution $c$ .⁵ The existence of a unique solution is a consequence of the fact that the samples $f_0,\ldots,f_{n-1}$ satisfy the formula (2). As a fun exercise, you might want to verify the existence of a unique $c$ satisfying (8)!

As a quick aside, if the measurements $f_0,\ldots,f_{n-1}$ are corrupted by small measurement errors, then the equations (8) will usually not possess a solution. In this case, it would be appropriate to find the least squares solution to equation (8) as a way of mitigating these errors.

Hurrah! We’ve found the coefficients $c_0,\ldots,c_5$ providing a recurrence relation (6) for our measurements $f_0,\ldots,f_{n-1}$ . All that is left is to find the parameters $z_0,\ldots,z_5$ and $d_0,\ldots,d_5$ in our signal formula (2) and the Vandermonde decomposition (5). Fortunately, this is just a standard computation for linear recurrence relations, nicely paralleling the solution of (homogenous) linear ODEs by means of the so-called “characteristic equation”. I’ll go through fairly quickly since this material is well-explained elsewhere on the internet (like Wikipedia). Let’s guess that our recurrence (6) has a solution of the form $f_j = z^j$ ; we seek to find all complex numbers $z$ for which this is a bonafide solution. Plugging this solution into the formula (6) for $f_6$ gives

(9) $z^6 = c_0 z^5 + c_1 z^4 + \cdots + c_6.$

This is the so-called characteristic equation for the recurrence (6). As a single-variable polynomial equation of degree six, it has six complex solutions $z_0,z_1,\ldots,z_5$ . These numbers $z_0,z_1,\ldots,z_5$ are precisely those numbers which appear in the sequence formula (2) and the Vandermonde decomposition (5).

Finally, we need to compute the coefficients $d_0,\ldots,d_5$ . But this is easy. Observe that the formula (2) provides the following system of linear equations for $d_0,\ldots,d_5$ :

(10) $\begin{bmatrix}f_0 \\ f_1 \\ \vdots \\ f_{n-1}\end{bmatrix} = \begin{bmatrix}1 & 1 & \cdots & 1 \\ z_0 & z_1 & \cdots & z_5 \\ \vdots & \vdots & \ddots & \vdots \\ z_0^{n-1} & z_1^{n-1} & \cdots & z_5^{n-1}\end{bmatrix} \begin{bmatrix}d_0 \\ d_1 \\ \vdots \\ d_{n-1}.\end{bmatrix}.$

Again, this system of equations will have a unique solution if the measurements $f_0,\ldots,f_{n-1}$ are uncorrupted by errors (and can be solved in the least squares sense if corrupted). This gives $d_0,\ldots,d_5$ , completing our goal of computing the parameters in the formula (2) or, equivalently, finding the Vandermonde decomposition (5).

We have accomplished our goal of computing the Vandermonde decomposition. The approach by which we did so is known as Prony’s method, as mentioned in the introduction to this section. As suggested, this method may not always give high-accuracy results. There are two obvious culprits that jump out about why this is the case. Prony’s method requires solving for the roots of the polynomial equation (9) expressed “in the monomial basis” and solving a system of linear equations (10) with a (transposed) Vandermonde matrix. Both of these problems can be notoriously ill-conditioned and thus challenging to solve accurately and may require the measurements $f_0,\ldots,f_{n-1}$ to be done to very high accuracy. Notwithstanding this, Prony’s method does useful results in some cases and forms the basis for potentially more accurate methods, such as those involving generalized eigenvalue problems.

Separating Signals: Extending the Vandermonde Decomposition to Tensors

In our discussion of the frequency identification problem, the Vandermonde decomposition (5) has effectively been an equivalent way of showing the samples $f_j$ are a combination of exponentials $z^j$ . So far, the benefits of the matrix factorization perspective have yet to really reveal themselves.

So what are the benefits of the Vandermonde decompostions? A couple of nice observations related to the Vandermonde decomposition and the “Hankelization” of the signals $H$ have already been lurking in the background. For instance, the rank of the Hankel matrix $H$ is the number of frequency components $z_k$ needed to describe the samples and the representation of the samples as a mixture of exponentials is uniquely determined only if the matrix $H$ does not have full rank; I have a little more to say about this at the very end. There are also benefits to certain computational problems; one can use Vandermonde decompositions to compute super high accuracy singular value decompositions of Hankel matrices.

The power of the Vandermonde decomposition really starts to shine when we go beyond the basic frequency finding problem we discussed by introducing more signals. Suppose now there are three short recordings $f^{(1)}(\cdot)$ , $f^{(2)}(\cdot)$ , and $f^{(3)}(\cdot)$ . (Here, the superscript denotes an index rather than differentiation.) Each signal is a weighted mixture of three sources $s^{(1)}(\cdot)$ , $s^{(2)}(\cdot)$ , and $s^{(3)}(\cdot)$ , each of which plays a musical chord of three notes (thus representable as a sum of cosines as in (1)). One can think of the sources of being produced three different musical instruments at different places in a room and the recordings $f^{(1)}(\cdot)$ , $f^{(2)}(\cdot)$ , and $f^{(3)}(\cdot)$ being taken from different microphones in the room.⁶ Our goal is now not just to identify the musical notes in the recordings but also to identify how to assign those notes to reconstruct the source signals $s^{(1)}(\cdot)$ , $s^{(2)}(\cdot)$ , and $s^{(3)}(\cdot)$ .

Taking inspiration from earlier, we record samples $f_0^{(\ell)},\ldots,f_{n-1}^{(\ell)}$ for each recording $\ell = 1,2,3$ and form each collection of samples into a Hankel matrix

$H^{(\ell)} = \begin{bmatrix} f_0^{(\ell)} & f_1^{(\ell)} & f_2^{(\ell)} & \cdots & f_{(n-1)/2}^{(\ell)} \\f_1^{(\ell)} & f_2^{(\ell)} & f_3^{(\ell)} & \cdots & f_{(n-1)/2+1}^{(\ell)} \\f_2^{(\ell)} & f_3^{(\ell)} & f_4^{(\ell)} & \cdots & f_{(n-1)/2+2}^{(\ell)} \\\vdots & \vdots & \vdots & \ddots & \vdots \\f_{(n-1)/2}^{(\ell)} & f_{(n-1)/2+1}^{(\ell)} & f_{(n-1)/2+2}^{(\ell)} & \cdots & f_{n-1}^{(\ell)} \end{bmatrix}.$

Here comes the crazy part: Stack the Hankelized recordings $H^{(1)}$ , $H^{(2)}$ , and $H^{(3)}$ as slices of a tensor $\mathcal{H}$ . A tensor, in this context, just means a multidimensional array of numbers. Just as a vector is a one-dimensional array and a matrix is a two-dimensional array, a tensor could have any number of dimensions. In our case, we need just three. If we use a MATLAB-esque indexing notation, $\mathcal{H}$ is a $(n-1)/2\times (n-1)/2 \times 3$ array given by

$\mathcal{H}(:,:,\ell) = H^{(\ell)} \quad \textnormal{for} \quad \ell=1,2,3.$

The remarkable thing is that the source signals can be determined (under appropriate conditions) by computing a special kind of Vandermonde decomposition of the tensor $\mathcal{H}$ ! (Specifically, the required decomposition is a Vandermonde-structured $(L_r,L_r,1)$ -block term decomposition of the tensor $\mathcal{H}$ .) Even more cool, this decomposition can be computed using general-purpose software like Tensorlab.

If this sounds interesting, I would encourage you to check out my recently published paper $(L_r,L_r,1)$ -decompositions, sparse component analysis, and the blind separation of sums of exponentials, joint work with Nithin Govindajaran and Lieven De Lathauwer and recently published in the SIAM Journal on Matrix Analysis and Applications. In the paper, we explain what this $(L_r,L_r,1)$ -decomposition is and how applying it to $\mathcal{H}$ can be used to separate mixtures of exponentials signals from the resulting Vandermonde structure, an idea originating in the work of De Lathauewer. A very important question for these signal separation problems is that of uniqueness. Given the three sampled recordings (comprising the tensor $\mathcal{H}$ ), is there just one way of unscrambling the mixtures into different sources or multiple? If there are multiple, then we might have possibly computed the wrong one. If there is just a single unscrambling, though, then we’ve done our job and unmixed the scrambled signals. The uniqueness of these tensor decompositions is fairly complicated math, and we survey existing results and prove new ones in this paper.⁷

Conclusions, Loose Ends, and Extensions

The central idea that we’ve been discussing is how it can be useful to convert between a sequence of observations $f_0,f_1,\ldots,f_{n-1}$ and a special matricization of this sequence into a Hankel matrix (either square, as in (3), or rectangular, as in (7)). By manipulating the Hankel matrix, say, by computing its Vandermonde decomposition (5), we learn something about the original signal, namely a representation of the form (2).

This is a powerful idea which appears implicitly or explicitly throughout various subfields of mathematics, engineering, and computation. As with many other useful ideas, this paradigm admits many natural generalizations and extensions. We saw one already in this post, where we extended the Vandermonde decomposition to the realm of tensors to solve signal separation problems. To end this post, I’ll place a few breadcrumbs at the beginning of a few of the trails of these generalizations for any curious to learn more, wrapping up a few loose ends on the way.

Is the Vandermonde Decomposition Unique?

A natural question is whether the Vandermonde decomposition (5) is unique. That is, is it possible that there exists two Vandermonde decompositions

$H = V^\top DV = \tilde{V}^\top \tilde{D} \tilde{V}$

of the same (square) Hankel matrix $H$ ? This is equivalent to whether the frequency components $z_0,z_1,\ldots$ can be uniquely determined from the measurements $f_0,f_1,\ldots,f_{n-1}$ .

Fortunately, the Vandermonde decomposition is unique if (and only if) the matrix $H$ is a rank-deficient matrix. Let’s unpack this a little bit. (For those who could use a refresher on rank, I have a blog post on precisely this topic.) Note that the Vandermonde decomposition is a rank factorization⁸ since $V$ has $\rank H$ rows, $V$ has full (row) rank, and $D$ is invertible. This means that if take enough samples $f_0,\ldots,f_{n-1}$ of a function $f(\cdot)$ which is a (finite) combinations of exponentials, the matrix $H$ will be rank-deficient and the Vandermonde decomposition unique.⁹ If too few samples are taken, then $H$ does not contain enough information to determine the frequency components of the signal $f(\cdot)$ and thus the Vandermonde decomposition is non-unique.

Does Every Hankel Matrix Have a Vandermonde Decomposition?

This post has exclusively focused on a situation where we are provided with sequence we know to be represented as a mixture of exponentials (i.e., taking the form (2)) from which the existence of the Vandermonde decomposition (5) follows immediately. What if we didn’t know this were the case, and we were just given a (square) Hankel matrix $H$ . Is $H$ guaranteed to possess a Vandermonde decomposition of the form (5)?

Unfortunately, the answer is no; there exist Hankel matrices which do not possess a Vandermonde decomposition. The issue is related to the fact that the appropriate characteristic equation (analogous to (9)) might possess repeated roots, making the solutions to the recurrence (6) not just take the form $z^j$ but also $jz^j$ and perhaps $j^2z^j$ , $j^3z^j$ , etc.

Are There Cases When the Vandermonde Decomposition is Guaranteed To Exist?

There is one natural case when a (square) Hankel matrix is guaranteed to possess a Vandermonde decomposition, namely when the matrix is nonsingular/invertible/full-rank. Despite this being a widely circulated fact, I am unaware of a simple proof for why this is the case. Unfortunately, there is not just one but infinitely many Vandermonde decompositions for a nonsingular Hankel matrix, suggesting these decompositions are not useful for the frequency finding problem that motivated this post.

What If My Hankel Matrix Does Not Possess a Vandermonde Decomposition?

As discussed above, a Hankel matrix may fail to have a Vandermonde decomposition if the characteristic equation (a la (9)) has repeated roots. This is very much analogous to the case of a non-diagonalizable matrix for which the characteristic polynomial has repeated roots. In this case, while diagonalization is not possible, one can “almost-diagonalize” the matrix by reducing it to its Jordan normal form. In total analogy, every Hankel matrix can be “almost Vandermonde decomposed” into a confluent Vandermonde decomposition (a discovery that appears to have been made independently several times). I will leave these links to discuss the exact nature of this decomposition, though I warn any potential reader that these resources introduce the decomposition first for Hankel matrices with infinitely many rows and columns before considering the finite case as we have. One is warned that while the Vandermonde decomposition is always a rank decomposition, the confluent Vandermonde decomposition is not guaranteed to be one.¹⁰

The Toeplitz Vandermonde Decomposition

Just as it proved useful to arrange samples $f_0,\ldots, f_{n-1}$ into a Hankel matrix, it can also be useful to form them into a Toeplitz matrix

$T = \begin{bmatrix} f_{(n-1)/2} & f_{(n-1)/2+1} & f_{(n-1)/2+2} & \cdots & f_{n-1} \\\\f_{(n-1)/2-1} & f_{(n-1)/2} & f_{(n-1)/2+1} & \cdots & f_{n-2} \\\\f_{(n-1)/2-2} & f_{(n-1)/2-1} & f_{(n-1)/2} & \cdots & f_{n-3} \\\\\vdots & \vdots & \vdots & \ddots & \vdots \\\\ f_0 & f_1 & f_2 & \cdots & f_{(n-1)/2} \end{bmatrix}.$

The Toeplitz matrix $T$ has the appealing propery that the matrix–vector product $Tx$ computes a (discrete) convolution of the sampled signal $f$ with the sampled signal $x$ which has all sorts of uses in signal processing and related fields.¹¹

One can interconvert between Hankel and Toeplitz matrices by reversing the order of the rows. As such, to the extent to which Hankel matrices possess Vandermonde decompositions (with all the asterisks and fine print just discussed), Toeplitz matrices do as well but with the rows of the first factor reversed:

$T = \operatorname{ReversedRows}(V^\top) \cdot DV.$

There is a special and important case where more is true. If a Toeplitz matrix is (Hermitian) positive semidefinite, then $T$ always possesses a Vandermonde decomposition of the form

$T = V^* D V,$

where $V$ is a Vandermonde matrix associated with parameters $z_0,z_1,\ldots,z_{k-1}$ which are complex numbers of absolute value one and $D$ is a diagonal matrix with real positive entries.¹² This Vandermonde decomposition is unique if and only if $T$ is rank-deficient. Positive semidefinite Toeplitz matrices are important as they occur as autocorrelation matrices which effectively describe the similarity between a signal and different shifts of itself in time. Autocorrelation matrices appear under different names in everything from signal processing to random processes to near-term quantum algorithms (a topic near and dear to my heart). A delightfully simple and linear algebraic derivation of this result is given by Yang and Xie (see Theorem 1).¹³ This decomposition can be generalized to infinite positive semidefinite Toeplitz matrices (appropriately defined).¹⁴