Big Ideas in Applied Math: The Fast Fourier Transform

The famous law of the instrument states that “when all you have is a hammer, every problem looks like a nail.” In general, this tendency is undesirable: most problems in life are not nails and could better be addressed by a more appropriate tool. However, one can also review the law of the instrument in a more positive framing: when presented with a powerful new tool, it is worth checking how many problems it can solve. The fast Fourier transform (FFT) is one of the most important hammers in an applied mathematician’s toolkit. And it has made many seemingly unrelated problems look like nails.

In this article, I want to consider three related questions:

What is the FFT—what problem is it solving and how does it solve it fast?
How can the ideas behind the FFT be used to solve other problems?
How can the FFT be used as a building block in solving a seemingly unrelated problem?

The FFT is widely considered one of the most important numerical algorithms, and as such every sub-community of applied mathematics is inclined to see the most interesting applications of the FFT as those in their particular area. I am unapologetically victim to this tendency myself, and thus will discuss an application of the FFT that I find particularly beautiful and surprising. In particular, this article won’t focus on the manifold applications of the FFT in signal processing, which I think has been far better covered by authors more familiar with that field.

The Discrete Fourier Transform

At its core, the FFT is a fast algorithm to compute $n$ complex numbers $\hat{f}_0,\ldots,\hat{f}_{n-1}$ given $n$ real or complex numbers $f_0,\ldots,f_{n-1}$ defined by the formula¹

(1) $\begin{equation*} \hat{f}_k = \frac{1}{\sqrt{n}} \sum_{j=0}^{n-1} f_j e^{-(2\pi i/n)jk} \quad \mbox{for } k = 0,1,2,\ldots,n-1. \end{equation*}$

The outputs $\hat{f} = (\hat{f}_0,\ldots,\hat{f}_{n-1})$ is called the discrete Fourier transform (DFT) of $f = (f_0,\ldots,f_{n-1})$ . The FFT is just one possible algorithm to evaluate the DFT.

The DFT has the following interpretation. Suppose that $f$ is a periodic function defined on the integers with period $n$ —that is, $f(j + n) = f(j)$ for every integer $j$ . Choose $f_0,\ldots,f_{n-1}$ to be the values of $f$ given by $f_j = f(j)$ for $j=0,1,2,\ldots,n-1$ . Then, in fact, $\hat{f}_0,\ldots,\hat{f}_{n-1}$ gives an expression for $f$ as a so-called trigonometric polynomial²:

(2) $\begin{equation*} f(j) = \frac{1}{\sqrt{n}} \sum_{k=0}^{n-1} \hat{f}_k e^{(2\pi i/n)jk} \mbox{ for every integer } j. \end{equation*}$

This shows that (1) converts function values $f_0,f_1,\ldots,f_{n-1}$ of a periodic function $f$ to coefficients $\hat{f}_0,\hat{f}_1,\ldots,\hat{f}_{n-1}$ of a trigonometric polynomial representation of $f$ , which can be called the Fourier series of $f$ . Eq. (2), referred to as the inverse discrete Fourier transform, inverts this, converting coefficients $\hat{f}_0,\hat{f}_1,\ldots,\hat{f}_{n-1}$ to function values $f_0,f_1,\ldots,f_{n-1}$ .

Fourier series are an immensely powerful tool in applied mathematics. For example again, if $f$ represents a sound wave produced by a chord on a piano, its Fourier coefficients $\hat{f}$ represents the intensity of each pitch comprising the chord. An audio engineer could, for example, compute a Fourier series for a piece of music and zero out Fourier coefficients, thus reducing the amount of data needed to store a piece of music. This idea is indeed part of the way audio compression standards like MP3 work. In addition to many more related applications in signal processing, the Fourier series is also a natural way to solve differential equations, either by pencil and paper or by computer via so-called Fourier spectral methods. As these applications (and more to follow) show, the DFT is a very useful computation to perform. The FFT allows us to perform this calculation fast.

The Fast Fourier Transform

The first observation to make is that Eq. (1) is a linear transformation: if we think of Eq. (1) as describing a transformation $f \mapsto \hat{f}$ , then we have that $\widehat{\alpha f + \beta g} = \alpha \hat{f} + \beta \hat{g}$ . Recall the crucial fact from linear algebra that every linear transformation can be represented by a matrix-vector muliplication.³ In my experience, one of the most effective algorithm design strategies in applied mathematics is, when presented with a linear transformation, to write its matrix down and poke and prod it to see if there are any patterns in the numbers which can be exploited to give a fast algorithm. Let’s try to do this with the DFT.

We have that $\hat{f} = F_n f$ for some $n\times n$ matrix $F_n$ . (We will omit the subscript $n$ when its value isn’t particularly important to the discussion.) Let us make the somewhat non-standard choice of describing rows and columns of $F$ by zero-indexing, so that the first row of $F$ is row $0$ and the last is row $n-1$ . Then we have that $\hat{f}_k = \sum_{j=0}^{n-1} F_{kj} f_j$ . Comparing with Eq. (1), we see that $F_{kj} = \tfrac{1}{\sqrt{n}} e^{(-2\pi i/n) jk}$ . Let us define $\omega_n = e^{-2\pi i/n}$ . Thus, we can write the matrix $F$ out as

(3) $\begin{equation*} F_n = \frac{1}{\sqrt{n}} \begin{bmatrix}\omega_n^0 & \omega_n^0 & \omega_n^0 & \cdots & \omega_n^0 \\\omega_n^{0} & \omega_n^1 & \omega_n^2 & \cdots & \omega_n^{n-1} \\\omega_n^0 & \omega_n^2 & \omega_n^4 & \cdots & \omega_n^{2(n-1)} \\ \omega_n^0 & \omega_n^3 & \omega_n^6 & \cdots & \omega_n^{3(n-1)}\\ \vdots & \vdots & \vdots & \ddots & \vdots \\\omega_n^0 & \omega_n^{n-1} & \omega_n^{2(n-1)} & \cdots & \omega_n^{(n-1)(n-1)}\end{bmatrix} \end{equation*}$

This is a highly structured matrix. The patterns in this matrix are more easily seen for a particular value of $n$ . We shall focus on $n = 8$ in this discussion, but what will follow will generalize in a straightforward way to $n$ any power of two (and in less straightforward ways to arbitrary $n$ —we will return to this point at the end).

Instantiating Eq. (3) with $n = 8$ (and writing $\omega = \omega_8$ ), we have

(4) $\begin{equation*} F_8 = \frac{1}{\sqrt{8}} \begin{bmatrix}\omega^{0} &\omega^{0} &\omega^{0} &\omega^{0} &\omega^{0} &\omega^{0} &\omega^{0} &\omega^{0} \\ \omega^{0} &\omega^{1} &\omega^{2} &\omega^{3} &\omega^{4} &\omega^{5} &\omega^{6} &\omega^{7} \\ \omega^{0} &\omega^{2} &\omega^{4} &\omega^{6} &\omega^{8} &\omega^{10} &\omega^{12} &\omega^{14} \\ \omega^{0} &\omega^{3} &\omega^{6} &\omega^{9} &\omega^{12} &\omega^{15} &\omega^{18} &\omega^{21} \\ \omega^{0} &\omega^{4} &\omega^{8} &\omega^{12} &\omega^{16} &\omega^{20} &\omega^{24} &\omega^{28} \\ \omega^{0} &\omega^{5} &\omega^{10} &\omega^{15} &\omega^{20} &\omega^{25} &\omega^{30} &\omega^{35} \\ \omega^{0} &\omega^{6} &\omega^{12} &\omega^{18} &\omega^{24} &\omega^{30} &\omega^{36} &\omega^{42} \\ \omega^{0} &\omega^{7} &\omega^{14} &\omega^{21} &\omega^{28} &\omega^{35} &\omega^{42} &\omega^{49} \\ \end{bmatrix} \end{equation*}$

To fully exploit the patterns in this matrix, we note that $\omega$ represents a clockwise rotation of the complex plane by an eighth of the way around the circle. So, for example $\omega^{21}$ is twenty-one eighths of a turn or simply just $21-16 = 5$ turns. Thus $\omega^{21} = \omega^5$ and more generally $\omega^m = \omega^{m \operatorname{mod} 8}$ . This allows us to simplify as follows:

(5) $\begin{equation*} F_8 = \frac{1}{\sqrt{8}}\begin{bmatrix}1 &1 &1 &1 &1 &1 &1 &1 \\ 1 &\omega^{1} &\omega^{2} &\omega^{3} &\omega^{4} &\omega^{5} &\omega^{6} &\omega^{7} \\ 1 &\omega^{2} &\omega^{4} &\omega^{6} &1 &\omega^{2} &\omega^{4} &\omega^{6} \\ 1 &\omega^{3} &\omega^{6} &\omega^{1} &\omega^{4} &\omega^{7} &\omega^{2} &\omega^{5} \\ 1 &\omega^{4} &1 &\omega^{4} &1 &\omega^{4} &1 &\omega^{4} \\ 1 &\omega^{5} &\omega^{2} &\omega^{7} &\omega^{4} &\omega^{1} &\omega^{6} &\omega^{3} \\ 1 &\omega^{6} &\omega^{4} &\omega^{2} &1 &\omega^{6} &\omega^{4} &\omega^{2} \\ 1 &\omega^{7} &\omega^{6} &\omega^{5} &\omega^{4} &\omega^{3} &\omega^{2} &\omega^{1} \\ \end{bmatrix} \end{equation*}$

Now notice that, since $\omega$ represents a clockwise rotation of an eighth of the way around the circle, $\omega^2$ represents a quarter turn of the circle. This fact leads to the surprising observation we can actually find the DFT matrix $F_4$ for $n = 4$ hidden inside the DFT matrix $F_8$ for $n = 8$ !

To see this, rearrange the columns of $F_8$ to interleave every other column. In matrix language this is represented by right-multiplying with an appropriate⁴ permutation matrix $\Pi$ :

(6) $\begin{equation*} F_8 \Pi = \frac{1}{\sqrt{8}}\begin{bmatrix} 1 &1 &1 &1 &1 &1 &1 &1 \\ 1 &\omega^{2} &\omega^{4} &\omega^{6} &\omega^{1} &\omega^{3} &\omega^{5} &\omega^{7} \\ 1 &\omega^{4} &1 &\omega^{4} &\omega^2 &\omega^{6} &\omega^{2} &\omega^{6} \\ 1 &\omega^{6} &\omega^{4} &\omega^{2} &\omega^{3} &\omega^{1} &\omega^{7} &\omega^{5} \\ 1 &1 &1 &1 &\omega^4 &\omega^{4} &\omega^4 &\omega^{4} \\ 1 &\omega^{2} &\omega^{4} &\omega^{6} &\omega^{5} &\omega^{7} &\omega^{1} &\omega^{3} \\ 1 &\omega^{4} &1 &\omega^{4} &\omega^{6} &\omega^{2} & \omega^6 &\omega^{2} \\ 1 &\omega^{6} &\omega^{4} &\omega^2 & \omega^7 & \omega^5 & \omega^3 & \omega^1 \end{bmatrix} \end{equation*}$

The top-left $4\times 4$ sub-block is precisely $F_4$ (up to scaling). In fact, defining the diagonal matrix $\Omega_4 = \operatorname{diag}(1,\omega,\omega^2,\omega^3)$ (called the twiddle factor) and noting that $\omega^4 = -1$ , we have

(7) $\begin{equation*} F_8 \Pi = \frac{1}{\sqrt{2}} \begin{bmatrix} F_4 & \Omega_4 F_4 \\ F_4 & -\Omega_4 F_4 \end{bmatrix}. \end{equation*}$

The matrix $F_8$ is entirely built up of simple scalings of the smaller DFT matrix $F_4$ ! This suggests the following decomposition to compute $F_8x$ :

(8) $\begin{equation*} F_8 x = (F_8 \Pi)(\Pi^\top x) = \frac{1}{\sqrt{2}} \begin{bmatrix} F_4 & \Omega_4 F_4 \\ F_4 & -\Omega_4 F_4 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} \frac{F_4 x_1 + \Omega_4 (F_4 x_2)}{\sqrt{2}} \\ \frac{F_4 x_1 - \Omega_4 (F_4 x_2)}{\sqrt{2}}\end{bmatrix}. \end{equation*}$

Here $x_1$ represent the even-indexed entries of $x$ and $x_2$ the odd-indexed entries. Thus, we see that we can evaluate $F_8 x$ by evaluating the two expressions $F_4x_1$ and $F_4x_2$ . We have broken our problem into two smaller problems, which we then recombine into a solution of our original problem.

How then, do we compute the smaller DFTs $F_4x_1$ and $F_4x_2$ ? We just use the same trick again, breaking, for example, the product $F_4x_1$ into further subcomputations $F_2x_{11}$ and $F_2x_{12}$ . Performing this process one more time, we need to evaluate expressions of the form $F_1x_{111}$ , which are simply given by $F_1 x_{111}= x_{111}$ since the matrix $F_1$ is just a $1\times 1$ matrix whose single entry is $1$ .

This procedure is an example of a recursive algorithm: we designed an algorithm which solves a problem by breaking it down into one or more smaller problems, solve each of the smaller problems by using this same algorithm, and then reassemble the solutions of the smaller problems to solve our original problem. Eventually, we will break our problems into such small pieces that they can be solved directly, which is referred to as the base case of our recursion. (In our case, the base case is multiplication by $F_1$ ). Algorithms using this recursion in this way are referred to as divide-and-conquer algorithms.

Let us summarize this recursive procedure we’ve developed. We want to compute the DFT $y = F_n x$ where $n$ is a power of two. First, we use the DFT to recursively compute $F_{n/2}x_1$ and $F_{n/2}x_2$ . Next, we combine these computations to evaluate $y = F_n x$ by the formula

(9) $\begin{equation*} F_n x = \begin{bmatrix} \frac{F_{n/2} x_1 + \Omega_4 (F_{n/2} x_2)}{\sqrt{2}} \\ \frac{F_{n/2} x_1 - \Omega_4 (F_{n/2} x_2)}{\sqrt{2}}\end{bmatrix}. \end{equation*}$

This procedure is the famous fast Fourier transform (FFT), whose modern incarnation was presented by Cooley and Tukey in 1965 with lineage that can be traced back to work by Gauss in the early 1800s. There are many variants of the FFT using similar ideas.

Let us see why the FFT is considered “fast” by analyzing its operation count. As is common for divide-and-conquer algorithms, the number of operations for computing $F_n x$ using the FFT can be determined by solving a certain recurrence relation. Let $T(n)$ be the number of operations required by the FFT. Then the cost of computing $F_n x$ consists of

proportional-to- operations (or operations, in computer science language⁵ $\mathcal{O}(\cdot)$ refers to big-O notation. Saying an algorithm takes $\mathcal{O}(n\log n)$ operations is stating that, more or less, the algorithm takes less than some multiple of $n\log n$ operations to complete.) to:
- add, subtract, and scale vectors and
- multiply by the diagonal matrix $\Omega_{n/2} = \operatorname{diag}(1,\omega_{2n},\ldots,\omega_{2n}^{n-1})$ and
two recursive computations of $F_{n/2}x_j$ for $j= 1,2$ , each of which requires $T(n/2)$ operations.

This gives us the recurrence relation

(10) $\begin{equation*} T(n) = 2T(n/2) + \mathcal{O}(n). \end{equation*}$

Solving recurrences is a delicate art in general, but a wide class of recurrences are immediately solved by the flexible master theorem for recurrences. Appealing to this result, we deduce that the FFT requires $T(n) = \mathcal{O}(n\log n)$ operations. This is a dramatic improvement of the $\mathcal{O}(n^2)$ operations to compute $F_n x$ directly using Eq. (1). This dramatic improvement in speed is what makes the FFT “fast”.

Extending the FFT Idea

The FFT is a brilliant algorithm. It exploits the structure of the discrete Fourier transform problem Eq. (1) for dramatically lower operation counts. And as we shall see a taste of, the FFT is useful in a surprisingly broad range of applications. Given the success of the FFT, we are naturally led to the question: can we learn from our success with the FFT to develop fast algorithms for other problems?

I think the FFT speaks to the power of a simple problem-solving strategy for numerical algorithm design⁶: whenever you have a linear transformation, write it as a matrix-vector product; whenever you have a matrix, write it down and see if there are any patterns.⁷ We often like to present mathematics with each step of a derivation follows almost effortlessly from the last from a firm basis of elegant mathematical intuition. Often, however, noticing patterns by staring at symbols on a page can be more effective than reasoning grounded in intuition. Once the pattern has been discovered, intuition and elegance sometimes will follow quickly behind.

The most natural generalization of the FFT is the fast inverse discrete Fourier transform, providing a fast algorithm to compute the inverse discrete Fourier transform Eq. (2). The inverse FFT is quite an easy generalization of the FFT presented in the past section; it is a good exercise to see if you can mimic the development in the previous section to come up with this generalization yourself. The FFT can also be generalized to other discrete trigonometric transforms and 2D and 3D discrete Fourier transforms.

I want to consider a problem more tangentially related to the FFT, the evaluation of expressions of the form $y = (A \otimes B)x$ , where $A$ is an $m_1\times n_1$ matrix, $B$ is an $m_2\times n_2$ matrix, $x$ is a vector of length $n_1n_2$ , and $\otimes$ denotes the Kronecker product. For the unitiated, the Kronecker product of $A$ and $B$ is a $m_1m_2\times n_1n_2$ matrix defined as the block matrix

(11) $\begin{equation*} A\otimes B = \begin{bmatrix} A_{11} B & A_{12} B & \cdots & A_{1n_1} B\\ A_{21}B & A_{22} B & \cdots & A_{2n_1} B \\ \vdots & \vdots & \ddots & \vdots \\ A_{m_1 1} B & A_{m_1 2} B & \cdots & A_{m_1 n_1} B\end{bmatrix}. \end{equation*}$

We could just form this $m_1 m_2\times n_1n_2$ matrix and compute the matrix-vector product $y = (A\otimes B)x$ directly, but this takes a hefty $\mathcal{O}(m_1m_2n_1n_2)$ operations.⁸ We can do better.

The insight is much the same as with the FFT: scaled copies of the matrix $B$ are embedded in $A\otimes B$ . In the FFT, we needed to rearrange the columns of the DFT matrix $F_n$ to see this; for the Kronecker product, this pattern is evident in the natural ordering. To exploit this fact, chunk the vectors $x$ and $y$ into pieces $x_1,x_2,\ldots,x_{n_1}$ and $y_1,y_2,\ldots,y_{m_1}$ of length $n_2$ and $m_2$ respectively so that our matrix vector product can be written as⁹

(12) $\begin{equation*} \underbrace{\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_{m_1} \end{bmatrix}}_{=y} = \underbrace{\begin{bmatrix} A_{11} B & A_{12} B & \cdots & A_{1n_1} B\\ A_{21}B & A_{22} B & \cdots & A_{2n_1} B \\ \vdots & \vdots & \ddots & \vdots \\ A_{m_1 1} B & A_{m_1 2} B & \cdots & A_{m_1 n_1} B\end{bmatrix}}_{=A\otimes B} \underbrace{\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_{n_1} \end{bmatrix}}_{=x}. \end{equation*}$

To compute this product efficiently, we proceed in two steps. First, we compute the products $Bx_1,Bx_2,\ldots,Bx_{n_1}$ which takes time $\mathcal{O}(m_2n_2\cdot n_1)$ in total. Next, we compute each component $y_1,\ldots,y_{m_1}$ by using the formula

(13) $\begin{equation*} y_j = A_{j1} (Bx_1) + A_{j2} (Bx_2) + \cdots + A_{jn_1} (Bx_{n_1}), \end{equation*}$

which takes a total of $\mathcal{O}(m_1 n_1 \cdot m_2)$ operations to compute all the $y_j$ ‘s. This leads to a total operation count of $\mathcal{O}(m_2n_1(m_1+n_2))$ for computing the matrix-vector product $y = (A\otimes B)x$ , much better than our earlier operation count of $\mathcal{O}(m_1m_2n_1n_2)$ .¹⁰

While this idea might seem quite far from the FFT, if one applies this idea iteratively, one can use this approach to rapidly evaluate a close cousin of the DFT called the Hadamard-Walsh transform. Using the Kronecker product, the Hadamard-Walsh transform $\hat{f}$ of a vector $f$ is defined to be

(14) $\begin{equation*} \hat{f} = \left( \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \otimes \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \otimes \cdots \otimes \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \right) f. \end{equation*}$

If one applies the Kronecker product trick we developed repeatedly, this gives an algorithm to evaluate the Hadamard-Walsh transform of a vector $f$ of length $n$ in $\mathcal{O}(n \log n)$ operations, just like the FFT.

The Hadamard-Walsh transform can be thought of as a generalization of the discrete Fourier transform to Boolean functions, which play an integral role in computer science. The applications of the Hadamard-Walsh transform are numerous and varied, from everything to voting systems to quantum computing. This is really just the tip of the iceberg. The ideas behind the FFT (and related ideas from the fast multipole method) allow for the rapid evaluation of a large number of transformations, some of which are connected by deep and general theories.

Resisting the temptation to delve into these interesting subjects in any more depth, we return to our main idea: when presented with a linear transformation, write it as a matrix-vector product; whenever you have a matrix, write it down and see if there are any patterns. The FFT exploits one such pattern, noticing that (after a reordering) a matrix contains many scaled copies of the same matrix. Rapidly evaluation expressions of the form $y = (A\otimes B)x$ involves an even simpler application of the same idea. But there are many other patterns that can be exploited: sparsity, (approximate) low rank, off-diagonal blocks approximately of low rank, and displacement structure are other examples. Very often in applied math, our problems have additional structure that can be exploited to solve problems much faster, and sometimes finding that structure is as easy as just trying to look for it.

An Application of the FFT

A discussion of the FFT would be incomplete without exploring at least one reason why you’d want to compute the discrete Fourier transform. To focus our attention, let us consider another linear algebraic calculation which appears to have no relation to the FFT on its face: computing a matrix-vector product with a Toeplitz matrix. A matrix $T$ is said to be Toeplitz if it has the following structure:

(15) $\begin{equation*} T = \begin{bmatrix} t_0 & t_1 & t_2 & \cdots & t_{n-1} \\ t_{-1} & t_0 & t_1 & \cdots & t_{n-2} \\ t_{-2} & t_{-1} & t_0 & \cdots & t_{n-3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ t_{-(n-1)} & t_{-(n-2)} & t_{-(n-3)} & \cdots & t_0 \end{bmatrix}. \end{equation*}$

Toeplitz matrices and their relatives appear widely across applications of applied mathematics including control and systems theory, time series, numerical partial differential equations, and signal processing.

We seek to compute the matrix-vector product $y = Tx$ . Let us by considering a special case of a Toeplitz matrix, a circulant matrix. A circulant matrix $C$ has the form

(16) $\begin{equation*} \begin{bmatrix} c_0 & c_{n-1} & c_{n-2} & \cdots & c_1 \\ c_1 & c_0 & c_{n-1} & \cdots & c_2\\ c_2 & c_1 & c_0 & \cdots & c_3 \\ \vdots & \vdots & \vdots &\ddots & \vdots \\ c_{n-1} & c_{n-2} & c_{n-3} & \cdots & c_0 \end{bmatrix}. \end{equation*}$

By direct computation, the matrix-vector product $y = Cx$ is given by

(17) $\begin{equation*} y_k = \sum_{j=0}^{n-1}x_j c_{\operatorname{mod}(k-j,n)}. \end{equation*}$

A surprising and non-obvious fact is that the circulant matrix $C$ is diagonalized by the discrete Fourier transform. Specifically, we have $C = F_n^* \operatorname{diag}(\sqrt{n} F_n c) F_n$ where $c=(c_0,\ldots,c_{n-1})$ . This gives a fast algorithm to compute $y = Cx$ in time $\mathcal{O}(n\log n)$ : compute the DFTs of $x$ and $c$ and multiply them together entrywise, take the inverse Fourier transform, and scale by $\sqrt{n}$ .

There is a connection with signal processing and differential equations that may help to shed light on why technique works for those familiar with those areas. In the signal processing context, the matrix-vector product $y = Cx$ can be interpreted as the discrete convolution of $x$ with $c$ (see Eq. (17)) which is a natural extension of the convolution $f* g$ of two functions $f$ and $g$ on the real line. It is an important fact that the Fourier transform of a convolution is the same as multiplication of the Fourier transforms: $\widehat{f * g} = \hat{f} \cdot \hat{g}$ (up to a possible normalizing constant).¹¹ The fact that the DFT diagonalizes a circulant matrix is just the analog of this fact for the discrete Fourier transform and the discrete convolution.

This fast algorithm for circulant matrix-vector products is already extremely useful. One can naturally reframe the problems of multiplying integers and polynomials as discrete convolutions, which can then be computed rapidly by applying the $\mathcal{O}(n\log n)$ algorithm for fast circulant matrix-vector products. This video gives a great introduction to the FFT with this as its motivating application.

Let’s summarize where we’re at. We are interested in computing the Toeplitz matrix-vector product $y = Tx$ . We don’t know how to do this for a general Toeplitz matrix yet, but we can do it for a special Toeplitz matrix called a circulant matrix $C$ . By use of the FFT, we can compute the circulant matrix-vector product $y = Cx$ in $\mathcal{O}(n\log n)$ operations.

We can now leverage what we’ve done with circulant matrices to accelerate Toeplitz matrix-vector product. The trick is very simple: embedding. We construct a big circulant matrix which contains the Toeplitz matrix as a sub-matrix and then use multiplications by the bigger matrix to compute multiplications by the smaller matrix.

Consider the following circulant matrix, which contains as $T$ as defined in Eq. (15) a sub-matrix in its top-left corner:

(18) $\begin{equation*} C_T = \begin{bmatrix} t_0 & t_1 & \cdots & t_{n-1} & 0 & 0 & \cdots \\ t_{-1} & t_0 & \cdots & t_{n-2} & t_{n-1} & 0 & \cdots\\ \vdots & \vdots & \ddots & \vdots & \vdots &\vdots &\ddots\\ t_{-(n-1)} & t_{-(n-2)} & \cdots & t_0 & t_1 & t_2&\cdots\\ 0 & t_{-(n-1)}&\cdots&t_{-1} & t_0 & t_1& \cdots\\ 0 & 0 & \cdots & t_{-2} & t_{-1} & t_0& \cdots\\ \vdots& \vdots & \ddots & \vdots & \vdots & \vdots &\ddots\\ 0 & 0 &\cdots&&&& \cdots\\ t_{n-1} & 0 & \cdots&&&&\cdots\\ t_{n-2} & t_{n-1} & \cdots &&&&\cdots\\ \vdots & \vdots & \ddots & \vdots & \vdots & \vdots &\ddots\\ t_1 & t_2 & \cdots & 0 & 0 & 0 & \cdots \end{bmatrix} \end{equation*}$

This matrix is hard to write out, but essentially we pad the Toeplitz matrix with extra zeros to embed it into a circulant matrix. The “ $c$ ” vector for this larger circulant matrix is obtained from the parameters $t_0,t_{\pm 1},\ldots,t_{\pm (n-1)}$ of the Toeplitz matrix Eq. (15) by $c = (t_0,t_{-1},\ldots,t_{-(n-1)},0,\ldots,0,t_{n-1},t_{n-2},\ldots,t_1)$ .

Here comes another clever observation: we can choose the number of padding zeros used cleverly to make the size of $C_T$ exactly equal to a power of two. This is useful because it allows us to compute matrix-vector products $w=C_Tz$ with the power-of-two FFT described above, which we know is fast.

Finally, let’s close the loop and use fast multiplications with $C_T$ to compute fast multiplications with $T$ . We wish to compute the product $y = Tx$ fast. To do this, vector $x$ into a larger vector $z$ by padding with zeros to get

(19) $\begin{equation*} C_T z = \begin{bmatrix} T & \star \\ \star & \star \end{bmatrix} \begin{bmatrix} x \\ 0 \end{bmatrix} = \begin{bmatrix} y \\ \star \end{bmatrix}, \end{equation*}$

where we use $\star$ to denote matrix or vector entries which are immaterial to us. We compute $Tx$ by using our fast algorithm to compute $C_T z = w$ and then discarding everything but the first entries of $w$ to obtain $y$ . If you’re careful to analyze how much padding we need to make this work, we see that this algorithm also takes only $\mathcal{O}(n \log n)$ operations. Thus, we’ve completed our goal: we can compute Toeplitz matrix-vector products in a fast $\mathcal{O}(n\log n)$ operations.

Finally, let us bring this full circle and see a delightfully self-referential use of this algorithm: we can use the FFT-accelerated fast Toeplitz matrix-vector multiply to compute DFT itself. Recall that the FFT algorithm we presented above was particularized to $n$ which were powers of $2$ . There are natural generalizations of the along the lines of what we did above to more general $n$ which are highly composite and possess many small prime factors. But what if we want to evaluate the DFT for $n$ which is a large prime?

Recall that the DFT matrix $F_n$ has $jk$ th entry $(F_n)_{jk} = \tfrac{1}{\sqrt{n}} \omega_n^{jk}$ . We now employ a clever trick. Let $D$ be a diagonal matrix with the $jj$ th entry equal to $\omega_n^{-\tfrac{1}{2} j^2}$ . Then, defining $T = D F_n D$ , we have that $T_{jk} = \tfrac{1}{\sqrt{n}} \omega_n^{-\tfrac{1}{2} j^2 + jk - \tfrac{1}{2} k^2} = \tfrac{1}{\sqrt{n}} \omega_n^{-\tfrac{1}{2}(j-k)^2}$ , which means $T$ is a Toeplitz matrix! (Writing out the matrix $T$ entrywise may be helpful to see this.)

Thus, we can compute the DFT $\hat{f} = F_n f$ for any size $n$ by evaluating the DFT as $\hat{f} = D^{-1}(T(D^{-1} f))$ , where the product $Tx$ is computed using the fast Toeplitz matrix-vector product. Since our fast Toeplitz matrix-vector product only requires us to evaluate power-of-two DFTs, this technique allows us to evaluate DFTs of arbitrary size $n$ in only $\mathcal{O}(n\log n)$ operations.

Upshot: The discrete Fourier transform (DFT) is an important computation which occurs all across applied mathematics. The fast Fourier transform (FFT) reduces the operation count of evaluating the DFT of a vector of length $n$ to proportional to $n \log n$ , down from proportional to $n^2$ for direct evaluation. The FFT is an example of a broader matrix algorithm design strategy of looking for patterns in the numbers in a matrix and exploiting these patterns to reduce computation. The FFT can often have surprising applications, such as allowing for rapid computations with Toeplitz matrices.