Big Ideas in Applied Math: The Fast Fourier Transform

The famous law of the instrument states that “when all you have is a hammer, every problem looks like a nail.” In general, this tendency is undesirable: most problems in life are not nails and could better be addressed by a more appropriate tool. However, one can also review the law of the instrument in a more positive framing: when presented with a powerful new tool, it is worth checking how many problems it can solve. The fast Fourier transform (FFT) is one of the most important hammers in an applied mathematician’s toolkit. And it has made many seemingly unrelated problems look like nails.

In this article, I want to consider three related questions:

  1. What is the FFT—what problem is it solving and how does it solve it fast?
  2. How can the ideas behind the FFT be used to solve other problems?
  3. How can the FFT be used as a building block in solving a seemingly unrelated problem?

The FFT is widely considered one of the most important numerical algorithms, and as such every sub-community of applied mathematics is inclined to see the most interesting applications of the FFT as those in their particular area. I am unapologetically victim to this tendency myself, and thus will discuss an application of the FFT that I find particularly beautiful and surprising. In particular, this article won’t focus on the manifold applications of the FFT in signal processing, which I think has been far better covered by authors more familiar with that field.

The Discrete Fourier Transform

At its core, the FFT is a fast algorithm to compute n complex numbers \hat{f}_0,\ldots,\hat{f}_{n-1} given n real or complex numbers f_0,\ldots,f_{n-1} defined by the formula1The factor of \frac{1}{\sqrt{n}} is not universal. It is common to omit the factor in (1) and replace the \tfrac{1}{\sqrt{n}} in Eq. (2) with a \tfrac{1}{n}. We prefer this convention as it makes the DFT a unitary transformation. When working with Fourier analysis, it is important to choose formulas for the (discrete) Fourier transform and the inverse (discrete) Fourier transform which form a pair in the sense they are inverses of each other.

(1)   \begin{equation*} \hat{f}_k = \frac{1}{\sqrt{n}} \sum_{j=0}^{n-1} f_j e^{-(2\pi i/n)jk} \quad \mbox{for } k = 0,1,2,\ldots,n-1. \end{equation*}

The outputs \hat{f} = (\hat{f}_0,\ldots,\hat{f}_{n-1}) is called the discrete Fourier transform (DFT) of f = (f_0,\ldots,f_{n-1}). The FFT is just one possible algorithm to evaluate the DFT.

The DFT has the following interpretation. Suppose that f is a periodic function defined on the integers with period n—that is, f(j + n) = f(j) for every integer j. Choose f_0,\ldots,f_{n-1} to be the values of f given by f_j = f(j) for j=0,1,2,\ldots,n-1. Then, in fact, \hat{f}_0,\ldots,\hat{f}_{n-1} gives an expression for f as a so-called trigonometric polynomial2The name “trigonometric polynomial” is motivated by Euler’s formula which shows that e^{(2\pi i/n)jk} = (\cos (\tfrac{2\pi i}{n} j) + i\sin(\tfrac{2\pi i}{n} j))^k, so indeed the right-hand side of Eq. (2) is indeed a “polynomial” in the “variables” \sin(\tfrac{2\pi i}{n} j) and \cos(\tfrac{2\pi i}{n} j):

(2)   \begin{equation*} f(j) = \frac{1}{\sqrt{n}} \sum_{k=0}^{n-1} \hat{f}_k e^{(2\pi i/n)jk} \mbox{ for every integer } j. \end{equation*}

This shows that (1) converts function values f_0,f_1,\ldots,f_{n-1} of a periodic function f to coefficients \hat{f}_0,\hat{f}_1,\ldots,\hat{f}_{n-1} of a trigonometric polynomial representation of f, which can be called the Fourier series of f. Eq. (2), referred to as the inverse discrete Fourier transform, inverts this, converting coefficients \hat{f}_0,\hat{f}_1,\ldots,\hat{f}_{n-1} to function values f_0,f_1,\ldots,f_{n-1}.

Fourier series are an immensely powerful tool in applied mathematics. For example again, if f represents a sound wave produced by a chord on a piano, its Fourier coefficients \hat{f} represents the intensity of each pitch comprising the chord. An audio engineer could, for example, compute a Fourier series for a piece of music and zero out Fourier coefficients, thus reducing the amount of data needed to store a piece of music. This idea is indeed part of the way audio compression standards like MP3 work. In addition to many more related applications in signal processing, the Fourier series is also a natural way to solve differential equations, either by pencil and paper or by computer via so-called Fourier spectral methods. As these applications (and more to follow) show, the DFT is a very useful computation to perform. The FFT allows us to perform this calculation fast.

The Fast Fourier Transform

The first observation to make is that Eq. (1) is a linear transformation: if we think of Eq. (1) as describing a transformation f \mapsto \hat{f}, then we have that \widehat{\alpha f + \beta g} = \alpha \hat{f} + \beta \hat{g}. Recall the crucial fact from linear algebra that every linear transformation can be represented by a matrix-vector muliplication.3At least in finite dimensions; the story for infinite-dimensional vector spaces is more complicated. In my experience, one of the most effective algorithm design strategies in applied mathematics is, when presented with a linear transformation, to write its matrix down and poke and prod it to see if there are any patterns in the numbers which can be exploited to give a fast algorithm. Let’s try to do this with the DFT.

We have that \hat{f} = F_n f for some n\times n matrix F_n. (We will omit the subscript n when its value isn’t particularly important to the discussion.) Let us make the somewhat non-standard choice of describing rows and columns of F by zero-indexing, so that the first row of F is row 0 and the last is row n-1. Then we have that \hat{f}_k = \sum_{j=0}^{n-1} F_{kj} f_j. Comparing with Eq. (1), we see that F_{kj} = \tfrac{1}{\sqrt{n}} e^{(-2\pi i/n) jk}. Let us define \omega_n = e^{-2\pi i/n}. Thus, we can write the matrix F out as

(3)   \begin{equation*} F_n = \frac{1}{\sqrt{n}} \begin{bmatrix}\omega_n^0 & \omega_n^0 & \omega_n^0 & \cdots & \omega_n^0 \\\omega_n^{0} & \omega_n^1 & \omega_n^2 & \cdots & \omega_n^{n-1} \\\omega_n^0 & \omega_n^2 & \omega_n^4 & \cdots & \omega_n^{2(n-1)} \\ \omega_n^0 & \omega_n^3 & \omega_n^6 & \cdots & \omega_n^{3(n-1)}\\ \vdots & \vdots & \vdots & \ddots & \vdots \\\omega_n^0 & \omega_n^{n-1} & \omega_n^{2(n-1)} & \cdots & \omega_n^{(n-1)(n-1)}\end{bmatrix} \end{equation*}

This is a highly structured matrix. The patterns in this matrix are more easily seen for a particular value of n. We shall focus on n = 8 in this discussion, but what will follow will generalize in a straightforward way to n any power of two (and in less straightforward ways to arbitrary n—we will return to this point at the end).

Instantiating Eq. (3) with n = 8 (and writing \omega = \omega_8), we have

(4)   \begin{equation*} F_8 = \frac{1}{\sqrt{8}} \begin{bmatrix}\omega^{0} &\omega^{0} &\omega^{0} &\omega^{0} &\omega^{0} &\omega^{0} &\omega^{0} &\omega^{0} \\ \omega^{0} &\omega^{1} &\omega^{2} &\omega^{3} &\omega^{4} &\omega^{5} &\omega^{6} &\omega^{7} \\ \omega^{0} &\omega^{2} &\omega^{4} &\omega^{6} &\omega^{8} &\omega^{10} &\omega^{12} &\omega^{14} \\ \omega^{0} &\omega^{3} &\omega^{6} &\omega^{9} &\omega^{12} &\omega^{15} &\omega^{18} &\omega^{21} \\ \omega^{0} &\omega^{4} &\omega^{8} &\omega^{12} &\omega^{16} &\omega^{20} &\omega^{24} &\omega^{28} \\ \omega^{0} &\omega^{5} &\omega^{10} &\omega^{15} &\omega^{20} &\omega^{25} &\omega^{30} &\omega^{35} \\ \omega^{0} &\omega^{6} &\omega^{12} &\omega^{18} &\omega^{24} &\omega^{30} &\omega^{36} &\omega^{42} \\ \omega^{0} &\omega^{7} &\omega^{14} &\omega^{21} &\omega^{28} &\omega^{35} &\omega^{42} &\omega^{49} \\ \end{bmatrix} \end{equation*}

To fully exploit the patterns in this matrix, we note that \omega represents a clockwise rotation of the complex plane by an eighth of the way around the circle. So, for example \omega^{21} is twenty-one eighths of a turn or simply just 21-16 = 5 turns. Thus \omega^{21} = \omega^5 and more generally \omega^m = \omega^{m \operatorname{mod} 8}. This allows us to simplify as follows:

(5)   \begin{equation*} F_8 = \frac{1}{\sqrt{8}}\begin{bmatrix}1 &1 &1 &1 &1 &1 &1 &1 \\ 1 &\omega^{1} &\omega^{2} &\omega^{3} &\omega^{4} &\omega^{5} &\omega^{6} &\omega^{7} \\ 1 &\omega^{2} &\omega^{4} &\omega^{6} &1 &\omega^{2} &\omega^{4} &\omega^{6} \\ 1 &\omega^{3} &\omega^{6} &\omega^{1} &\omega^{4} &\omega^{7} &\omega^{2} &\omega^{5} \\ 1 &\omega^{4} &1 &\omega^{4} &1 &\omega^{4} &1 &\omega^{4} \\ 1 &\omega^{5} &\omega^{2} &\omega^{7} &\omega^{4} &\omega^{1} &\omega^{6} &\omega^{3} \\ 1 &\omega^{6} &\omega^{4} &\omega^{2} &1 &\omega^{6} &\omega^{4} &\omega^{2} \\ 1 &\omega^{7} &\omega^{6} &\omega^{5} &\omega^{4} &\omega^{3} &\omega^{2} &\omega^{1} \\ \end{bmatrix} \end{equation*}

Now notice that, since \omega represents a clockwise rotation of an eighth of the way around the circle, \omega^2 represents a quarter turn of the circle. This fact leads to the surprising observation we can actually find the DFT matrix F_4 for n = 4 hidden inside the DFT matrix F_8 for n = 8!

To see this, rearrange the columns of F_8 to interleave every other column. In matrix language this is represented by right-multiplying with an appropriate4In fact, this permutation has a special name: the perfect shuffle. permutation matrix \Pi:

(6)   \begin{equation*} F_8 \Pi = \frac{1}{\sqrt{8}}\begin{bmatrix} 1 &1 &1 &1 &1 &1 &1 &1 \\ 1 &\omega^{2} &\omega^{4} &\omega^{6} &\omega^{1} &\omega^{3} &\omega^{5} &\omega^{7} \\ 1 &\omega^{4} &1 &\omega^{4} &\omega^2 &\omega^{6} &\omega^{2} &\omega^{6} \\ 1 &\omega^{6} &\omega^{4} &\omega^{2} &\omega^{3} &\omega^{1} &\omega^{7} &\omega^{5} \\ 1 &1 &1 &1 &\omega^4 &\omega^{4} &\omega^4 &\omega^{4} \\ 1 &\omega^{2} &\omega^{4} &\omega^{6} &\omega^{5} &\omega^{7} &\omega^{1} &\omega^{3} \\ 1 &\omega^{4} &1 &\omega^{4} &\omega^{6} &\omega^{2} & \omega^6 &\omega^{2} \\ 1 &\omega^{6} &\omega^{4} &\omega^2 & \omega^7 & \omega^5 & \omega^3 & \omega^1 \end{bmatrix} \end{equation*}

The top-left 4\times 4 sub-block is precisely F_4 (up to scaling). In fact, defining the diagonal matrix \Omega_4 = \operatorname{diag}(1,\omega,\omega^2,\omega^3) (called the twiddle factor) and noting that \omega^4 = -1, we have

(7)   \begin{equation*} F_8 \Pi = \frac{1}{\sqrt{2}} \begin{bmatrix} F_4 & \Omega_4 F_4 \\ F_4 & -\Omega_4 F_4 \end{bmatrix}. \end{equation*}

The matrix F_8 is entirely built up of simple scalings of the smaller DFT matrix F_4! This suggests the following decomposition to compute F_8x:

(8)   \begin{equation*} F_8 x = (F_8 \Pi)(\Pi^\top x) = \frac{1}{\sqrt{2}} \begin{bmatrix} F_4 & \Omega_4 F_4 \\ F_4 & -\Omega_4 F_4 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} \frac{F_4 x_1 + \Omega_4 (F_4 x_2)}{\sqrt{2}} \\ \frac{F_4 x_1 - \Omega_4 (F_4 x_2)}{\sqrt{2}}\end{bmatrix}. \end{equation*}

Here x_1 represent the even-indexed entries of x and x_2 the odd-indexed entries. Thus, we see that we can evaluate F_8 x by evaluating the two expressions F_4x_1 and F_4x_2. We have broken our problem into two smaller problems, which we then recombine into a solution of our original problem.

How then, do we compute the smaller DFTs F_4x_1 and F_4x_2? We just use the same trick again, breaking, for example, the product F_4x_1 into further subcomputations F_2x_{11} and F_2x_{12}. Performing this process one more time, we need to evaluate expressions of the form F_1x_{111}, which are simply given by F_1 x_{111}= x_{111} since the matrix F_1 is just a 1\times 1 matrix whose single entry is 1.

This procedure is an example of a recursive algorithm: we designed an algorithm which solves a problem by breaking it down into one or more smaller problems, solve each of the smaller problems by using this same algorithm, and then reassemble the solutions of the smaller problems to solve our original problem. Eventually, we will break our problems into such small pieces that they can be solved directly, which is referred to as the base case of our recursion. (In our case, the base case is multiplication by F_1). Algorithms using this recursion in this way are referred to as divide-and-conquer algorithms.

Let us summarize this recursive procedure we’ve developed. We want to compute the DFT y = F_n x where n is a power of two. First, we use the DFT to recursively compute F_{n/2}x_1 and F_{n/2}x_2. Next, we combine these computations to evaluate y = F_n x by the formula

(9)   \begin{equation*} F_n x = \begin{bmatrix} \frac{F_{n/2} x_1 + \Omega_4 (F_{n/2} x_2)}{\sqrt{2}} \\ \frac{F_{n/2} x_1 - \Omega_4 (F_{n/2} x_2)}{\sqrt{2}}\end{bmatrix}. \end{equation*}

This procedure is the famous fast Fourier transform (FFT), whose modern incarnation was presented by Cooley and Tukey in 1965 with lineage that can be traced back to work by Gauss in the early 1800s. There are many variants of the FFT using similar ideas.

Let us see why the FFT is considered “fast” by analyzing its operation count. As is common for divide-and-conquer algorithms, the number of operations for computing F_n x using the FFT can be determined by solving a certain recurrence relation. Let T(n) be the number of operations required by the FFT. Then the cost of computing F_n x consists of

  • proportional-to-n operations (or \mathcal{O}(n) operations, in computer science language5\mathcal{O}(\cdot) refers to big-O notation. Saying an algorithm takes \mathcal{O}(n\log n) operations is stating that, more or less, the algorithm takes less than some multiple of n\log n operations to complete.) to:
    • add, subtract, and scale vectors and
    • multiply by the diagonal matrix \Omega_{n/2} = \operatorname{diag}(1,\omega_{2n},\ldots,\omega_{2n}^{n-1}) and
  • two recursive computations of F_{n/2}x_j for j= 1,2, each of which requires T(n/2) operations.

This gives us the recurrence relation

(10)   \begin{equation*} T(n) = 2T(n/2) + \mathcal{O}(n). \end{equation*}

Solving recurrences is a delicate art in general, but a wide class of recurrences are immediately solved by the flexible master theorem for recurrences. Appealing to this result, we deduce that the FFT requires T(n) = \mathcal{O}(n\log n) operations. This is a dramatic improvement of the \mathcal{O}(n^2) operations to compute F_n x directly using Eq. (1). This dramatic improvement in speed is what makes the FFT “fast”.

Extending the FFT Idea

The FFT is a brilliant algorithm. It exploits the structure of the discrete Fourier transform problem Eq. (1) for dramatically lower operation counts. And as we shall see a taste of, the FFT is useful in a surprisingly broad range of applications. Given the success of the FFT, we are naturally led to the question: can we learn from our success with the FFT to develop fast algorithms for other problems?

I think the FFT speaks to the power of a simple problem-solving strategy for numerical algorithm design6As mentioned earlier, the FFT also exemplifies a typical design pattern in general (not necessarily numerical) algorithm design, the divide-and-conquer strategy. In the divide-and-conquer strategy, find a clever way of dividing a problem into multiple subproblems, conquering (solving) each, and then recombining the solutions to the subproblems into a solution of the larger problem. The challenge with such problems is often finding a way of doing the recombination step, which usually relies on some clever insight. Other instances of divide-and-conquer algorithms include merge sort and Karatsuba’s integer multiplication algorithm.: whenever you have a linear transformation, write it as a matrix-vector product; whenever you have a matrix, write it down and see if there are any patterns.7Often, rearranging the matrix will be necessary to see any patterns. We often like to present mathematics with each step of a derivation follows almost effortlessly from the last from a firm basis of elegant mathematical intuition. Often, however, noticing patterns by staring at symbols on a page can be more effective than reasoning grounded in intuition. Once the pattern has been discovered, intuition and elegance sometimes will follow quickly behind.

The most natural generalization of the FFT is the fast inverse discrete Fourier transform, providing a fast algorithm to compute the inverse discrete Fourier transform Eq. (2). The inverse FFT is quite an easy generalization of the FFT presented in the past section; it is a good exercise to see if you can mimic the development in the previous section to come up with this generalization yourself. The FFT can also be generalized to other discrete trigonometric transforms and 2D and 3D discrete Fourier transforms.

I want to consider a problem more tangentially related to the FFT, the evaluation of expressions of the form y = (A \otimes B)x, where A is an m_1\times n_1 matrix, B is an m_2\times n_2 matrix, x is a vector of length n_1n_2, and \otimes denotes the Kronecker product. For the unitiated, the Kronecker product of A and B is a m_1m_2\times n_1n_2 matrix defined as the block matrix

(11)   \begin{equation*} A\otimes B = \begin{bmatrix} A_{11} B & A_{12} B & \cdots & A_{1n_1} B\\ A_{21}B & A_{22} B & \cdots & A_{2n_1} B \\ \vdots & \vdots & \ddots & \vdots \\ A_{m_1 1}  B & A_{m_1 2} B & \cdots & A_{m_1 n_1} B\end{bmatrix}. \end{equation*}

We could just form this m_1 m_2\times n_1n_2 matrix and compute the matrix-vector product y = (A\otimes B)x directly, but this takes a hefty \mathcal{O}(m_1m_2n_1n_2) operations.8Equally or perhaps more problematically, this also takes \mathcal{O}(m_1m_2n_1n_2) space. We can do better.

The insight is much the same as with the FFT: scaled copies of the matrix B are embedded in A\otimes B. In the FFT, we needed to rearrange the columns of the DFT matrix F_n to see this; for the Kronecker product, this pattern is evident in the natural ordering. To exploit this fact, chunk the vectors x and y into pieces x_1,x_2,\ldots,x_{n_1} and y_1,y_2,\ldots,y_{m_1} of length n_2 and m_2 respectively so that our matrix vector product can be written as9This way of writing this expression is referred to as a conformal partitioning to indicate that one can multiply the block matrices using the ordinary matrix product formula treating the block entries as if they were simple numbers.

(12)   \begin{equation*} \underbrace{\begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_{m_1} \end{bmatrix}}_{=y} = \underbrace{\begin{bmatrix} A_{11} B & A_{12} B & \cdots & A_{1n_1} B\\ A_{21}B & A_{22} B & \cdots & A_{2n_1} B \\ \vdots & \vdots & \ddots & \vdots \\ A_{m_1 1}  B & A_{m_1 2} B & \cdots & A_{m_1 n_1} B\end{bmatrix}}_{=A\otimes B} \underbrace{\begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_{n_1} \end{bmatrix}}_{=x}. \end{equation*}

To compute this product efficiently, we proceed in two steps. First, we compute the products Bx_1,Bx_2,\ldots,Bx_{n_1} which takes time \mathcal{O}(m_2n_2\cdot n_1) in total. Next, we compute each component y_1,\ldots,y_{m_1} by using the formula

(13)   \begin{equation*} y_j = A_{j1} (Bx_1) + A_{j2} (Bx_2) + \cdots + A_{jn_1} (Bx_{n_1}), \end{equation*}

which takes a total of \mathcal{O}(m_1 n_1 \cdot m_2) operations to compute all the y_j‘s. This leads to a total operation count of \mathcal{O}(m_2n_1(m_1+n_2)) for computing the matrix-vector product y = (A\otimes B)x, much better than our earlier operation count of \mathcal{O}(m_1m_2n_1n_2).10There is another way of interpreting this algorithm. If we interpret x and y as the vectorization of n_2\times n_1 and m_2 \times m_1 matrices X and Y, then we have Y = BXA^\top. The algorithm we presented is equivalent to evaluating this matrix triple product in the order Y = (BX)A^\top. This shows that this algorithm could be further accelerated using Strassenstyle fast matrix multiplication algorithms.

While this idea might seem quite far from the FFT, if one applies this idea iteratively, one can use this approach to rapidly evaluate a close cousin of the DFT called the Hadamard-Walsh transform. Using the Kronecker product, the Hadamard-Walsh transform \hat{f} of a vector f is defined to be

(14)   \begin{equation*} \hat{f} = \left( \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \otimes \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \otimes \cdots \otimes \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \right) f. \end{equation*}

If one applies the Kronecker product trick we developed repeatedly, this gives an algorithm to evaluate the Hadamard-Walsh transform of a vector f of length n in \mathcal{O}(n \log n) operations, just like the FFT.

The Hadamard-Walsh transform can be thought of as a generalization of the discrete Fourier transform to Boolean functions, which play an integral role in computer science. The applications of the Hadamard-Walsh transform are numerous and varied, from everything to voting systems to quantum computing. This is really just the tip of the iceberg. The ideas behind the FFT (and related ideas from the fast multipole method) allow for the rapid evaluation of a large number of transformations, some of which are connected by deep and general theories.

Resisting the temptation to delve into these interesting subjects in any more depth, we return to our main idea: when presented with a linear transformation, write it as a matrix-vector product; whenever you have a matrix, write it down and see if there are any patterns. The FFT exploits one such pattern, noticing that (after a reordering) a matrix contains many scaled copies of the same matrix. Rapidly evaluation expressions of the form y = (A\otimes B)x involves an even simpler application of the same idea. But there are many other patterns that can be exploited: sparsity, (approximate) low rank, off-diagonal blocks approximately of low rank, and displacement structure are other examples. Very often in applied math, our problems have additional structure that can be exploited to solve problems much faster, and sometimes finding that structure is as easy as just trying to look for it.

An Application of the FFT

A discussion of the FFT would be incomplete without exploring at least one reason why you’d want to compute the discrete Fourier transform. To focus our attention, let us consider another linear algebraic calculation which appears to have no relation to the FFT on its face: computing a matrix-vector product with a Toeplitz matrix. A matrix T is said to be Toeplitz if it has the following structure:

(15)   \begin{equation*} T = \begin{bmatrix} t_0 & t_1 & t_2 & \cdots & t_{n-1} \\ t_{-1} & t_0 & t_1 & \cdots & t_{n-2} \\ t_{-2} & t_{-1} & t_0 & \cdots & t_{n-3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ t_{-(n-1)} & t_{-(n-2)} & t_{-(n-3)} & \cdots & t_0 \end{bmatrix}. \end{equation*}

Toeplitz matrices and their relatives appear widely across applications of applied mathematics including control and systems theory, time series, numerical partial differential equations, and signal processing.

We seek to compute the matrix-vector product y = Tx. Let us by considering a special case of a Toeplitz matrix, a circulant matrix. A circulant matrix C has the form

(16)   \begin{equation*} \begin{bmatrix} c_0 & c_{n-1} & c_{n-2} & \cdots & c_1 \\ c_1 & c_0 & c_{n-1} & \cdots & c_2\\ c_2 & c_1 & c_0 & \cdots & c_3 \\ \vdots & \vdots & \vdots &\ddots & \vdots \\ c_{n-1} & c_{n-2} & c_{n-3} & \cdots & c_0 \end{bmatrix}. \end{equation*}

By direct computation, the matrix-vector product y = Cx is given by

(17)   \begin{equation*} y_k = \sum_{j=0}^{n-1}x_j c_{\operatorname{mod}(k-j,n)}. \end{equation*}

A surprising and non-obvious fact is that the circulant matrix C is diagonalized by the discrete Fourier transform. Specifically, we have C = F_n^* \operatorname{diag}(\sqrt{n} F_n c) F_n where c=(c_0,\ldots,c_{n-1}). This gives a fast algorithm to compute y = Cx in time \mathcal{O}(n\log n): compute the DFTs of x and c and multiply them together entrywise, take the inverse Fourier transform, and scale by \sqrt{n}.

There is a connection with signal processing and differential equations that may help to shed light on why technique works for those familiar with those areas. In the signal processing context, the matrix-vector product y = Cx can be interpreted as the discrete convolution of x with c (see Eq. (17)) which is a natural extension of the convolution f* g of two functions f and g on the real line. It is an important fact that the Fourier transform of a convolution is the same as multiplication of the Fourier transforms: \widehat{f * g} = \hat{f} \cdot \hat{g} (up to a possible normalizing constant).11A related identity also holds for the Laplace transform. The fact that the DFT diagonalizes a circulant matrix is just the analog of this fact for the discrete Fourier transform and the discrete convolution.

This fast algorithm for circulant matrix-vector products is already extremely useful. One can naturally reframe the problems of multiplying integers and polynomials as discrete convolutions, which can then be computed rapidly by applying the \mathcal{O}(n\log n) algorithm for fast circulant matrix-vector products. This video gives a great introduction to the FFT with this as its motivating application.

Let’s summarize where we’re at. We are interested in computing the Toeplitz matrix-vector product y = Tx. We don’t know how to do this for a general Toeplitz matrix yet, but we can do it for a special Toeplitz matrix called a circulant matrix C. By use of the FFT, we can compute the circulant matrix-vector product y = Cx in \mathcal{O}(n\log n) operations.

We can now leverage what we’ve done with circulant matrices to accelerate Toeplitz matrix-vector product. The trick is very simple: embedding. We construct a big circulant matrix which contains the Toeplitz matrix as a sub-matrix and then use multiplications by the bigger matrix to compute multiplications by the smaller matrix.

Consider the following circulant matrix, which contains as T as defined in Eq. (15) a sub-matrix in its top-left corner:

(18)   \begin{equation*} C_T = \begin{bmatrix} t_0 & t_1  & \cdots & t_{n-1} & 0 & 0 & \cdots \\ t_{-1} & t_0 & \cdots & t_{n-2} & t_{n-1} & 0 & \cdots\\ \vdots & \vdots & \ddots & \vdots & \vdots &\vdots &\ddots\\ t_{-(n-1)} & t_{-(n-2)}  & \cdots & t_0 & t_1 & t_2&\cdots\\ 0 & t_{-(n-1)}&\cdots&t_{-1} & t_0 & t_1& \cdots\\ 0 & 0 & \cdots & t_{-2} & t_{-1} & t_0& \cdots\\ \vdots& \vdots & \ddots & \vdots & \vdots & \vdots &\ddots\\ 0 & 0 &\cdots&&&& \cdots\\ t_{n-1} & 0 & \cdots&&&&\cdots\\ t_{n-2} & t_{n-1} & \cdots &&&&\cdots\\ \vdots & \vdots & \ddots & \vdots & \vdots & \vdots &\ddots\\ t_1 & t_2 & \cdots & 0 & 0 & 0 & \cdots \end{bmatrix} \end{equation*}

This matrix is hard to write out, but essentially we pad the Toeplitz matrix with extra zeros to embed it into a circulant matrix. The “c” vector for this larger circulant matrix is obtained from the parameters t_0,t_{\pm 1},\ldots,t_{\pm (n-1)} of the Toeplitz matrix Eq. (15) by c = (t_0,t_{-1},\ldots,t_{-(n-1)},0,\ldots,0,t_{n-1},t_{n-2},\ldots,t_1).

Here comes another clever observation: we can choose the number of padding zeros used cleverly to make the size of C_T exactly equal to a power of two. This is useful because it allows us to compute matrix-vector products w=C_Tz with the power-of-two FFT described above, which we know is fast.

Finally, let’s close the loop and use fast multiplications with C_T to compute fast multiplications with T. We wish to compute the product y = Tx fast. To do this, vector x into a larger vector z by padding with zeros to get

(19)   \begin{equation*} C_T z = \begin{bmatrix} T & \star \\ \star & \star \end{bmatrix} \begin{bmatrix} x \\ 0 \end{bmatrix} = \begin{bmatrix} y \\ \star \end{bmatrix}, \end{equation*}

where we use \star to denote matrix or vector entries which are immaterial to us. We compute Tx by using our fast algorithm to compute C_T z = w and then discarding everything but the first entries of w to obtain y. If you’re careful to analyze how much padding we need to make this work, we see that this algorithm also takes only \mathcal{O}(n \log n) operations. Thus, we’ve completed our goal: we can compute Toeplitz matrix-vector products in a fast \mathcal{O}(n\log n) operations.

Finally, let us bring this full circle and see a delightfully self-referential use of this algorithm: we can use the FFT-accelerated fast Toeplitz matrix-vector multiply to compute DFT itself. Recall that the FFT algorithm we presented above was particularized to n which were powers of 2. There are natural generalizations of the along the lines of what we did above to more general n which are highly composite and possess many small prime factors. But what if we want to evaluate the DFT for n which is a large prime?

Recall that the DFT matrix F_n has jkth entry (F_n)_{jk} = \tfrac{1}{\sqrt{n}} \omega_n^{jk}. We now employ a clever trick. Let D be a diagonal matrix with the jjth entry equal to \omega_n^{-\tfrac{1}{2} j^2}. Then, defining T = D F_n D, we have that T_{jk} = \tfrac{1}{\sqrt{n}} \omega_n^{-\tfrac{1}{2} j^2 + jk - \tfrac{1}{2} k^2} = \tfrac{1}{\sqrt{n}} \omega_n^{-\tfrac{1}{2}(j-k)^2}, which means T is a Toeplitz matrix! (Writing out the matrix T entrywise may be helpful to see this.)

Thus, we can compute the DFT \hat{f} = F_n f for any size n by evaluating the DFT as \hat{f} = D^{-1}(T(D^{-1} f)), where the product Tx is computed using the fast Toeplitz matrix-vector product. Since our fast Toeplitz matrix-vector product only requires us to evaluate power-of-two DFTs, this technique allows us to evaluate DFTs of arbitrary size n in only \mathcal{O}(n\log n) operations.

Upshot: The discrete Fourier transform (DFT) is an important computation which occurs all across applied mathematics. The fast Fourier transform (FFT) reduces the operation count of evaluating the DFT of a vector of length n to proportional to n \log n, down from proportional to n^2 for direct evaluation. The FFT is an example of a broader matrix algorithm design strategy of looking for patterns in the numbers in a matrix and exploiting these patterns to reduce computation. The FFT can often have surprising applications, such as allowing for rapid computations with Toeplitz matrices.

4 thoughts on “Big Ideas in Applied Math: The Fast Fourier Transform

Leave a Reply

Your email address will not be published. Required fields are marked *