The famous law of the instrument states that “when all you have is a hammer, every problem looks like a nail.” In general, this tendency is undesirable: most problems in life are not nails and could better be addressed by a more appropriate tool. However, one can also review the law of the instrument in a more positive framing: when presented with a powerful new tool, it is worth checking how many problems it can solve. The fast Fourier transform (FFT) is one of the most important hammers in an applied mathematician’s toolkit. And it has made many seemingly unrelated problems look like nails.

In this article, I want to consider three related questions:

- What is the FFT—what problem is it solving and how does it solve it fast?
- How can the
*idea*s behind the FFT be used to solve other problems? - How can the FFT be used as a building block in solving a seemingly unrelated problem?

The FFT is widely considered one of the most important numerical algorithms, and as such every sub-community of applied mathematics is inclined to see the most interesting applications of the FFT as those in their particular area. I am unapologetically victim to this tendency myself, and thus will discuss an application of the FFT that I find particularly beautiful and surprising. In particular, this article won’t focus on the manifold applications of the FFT in signal processing, which I think has been far better covered by authors more familiar with that field.

## The Discrete Fourier Transform

At its core, the FFT is a fast algorithm to compute complex numbers given real or complex numbers defined by the formula^{1}The factor of is not universal. It is common to omit the factor in (1) and replace the in Eq. (2) with a . We prefer this convention as it makes the DFT a unitary transformation. When working with Fourier analysis, it is important to choose formulas for the (discrete) Fourier transform and the inverse (discrete) Fourier transform which form a pair in the sense they are inverses of each other.

(1)

The outputs is called the **discrete Fourier transform** (DFT) of . The FFT is just one possible algorithm to evaluate the DFT.

The DFT has the following interpretation. Suppose that is a periodic function defined on the integers with period —that is, for every integer . Choose to be the values of given by for . Then, in fact, gives an expression for as a so-called trigonometric polynomial^{2}The name “trigonometric polynomial” is motivated by Euler’s formula which shows that , so indeed the right-hand side of Eq. (2) is indeed a “polynomial” in the “variables” and :

(2)

This shows that (1) converts function values of a periodic function to coefficients of a trigonometric polynomial representation of , which can be called the **Fourier series** of . Eq. (2), referred to as the **inverse discrete Fourier transform**, inverts this, converting coefficients to function values .

Fourier series are an immensely powerful tool in applied mathematics. For example again, if represents a sound wave produced by a chord on a piano, its Fourier coefficients represents the intensity of each pitch comprising the chord. An audio engineer could, for example, compute a Fourier series for a piece of music and zero out Fourier coefficients, thus reducing the amount of data needed to store a piece of music. This idea is indeed part of the way audio compression standards like MP3 work. In addition to many more related applications in signal processing, the Fourier series is also a natural way to solve differential equations, either by pencil and paper or by computer via so-called Fourier spectral methods. As these applications (and more to follow) show, the DFT is a very useful computation to perform. The FFT allows us to perform this calculation fast.

## The Fast Fourier Transform

The first observation to make is that Eq. (1) is a linear transformation: if we think of Eq. (1) as describing a transformation , then we have that . Recall the crucial fact from linear algebra that every linear transformation can be represented by a matrix-vector muliplication.^{3}At least in finite dimensions; the story for infinite-dimensional vector spaces is more complicated. In my experience, one of the most effective algorithm design strategies in applied mathematics is, when presented with a linear transformation, to write its matrix down and poke and prod it to see if there are any patterns in the numbers which can be exploited to give a fast algorithm. Let’s try to do this with the DFT.

We have that for some matrix . (We will omit the subscript when its value isn’t particularly important to the discussion.) Let us make the somewhat non-standard choice of describing rows and columns of by zero-indexing, so that the first row of is row and the last is row . Then we have that . Comparing with Eq. (1), we see that . Let us define . Thus, we can write the matrix out as

(3)

This is a highly structured matrix. The patterns in this matrix are more easily seen for a particular value of . We shall focus on in this discussion, but what will follow will generalize in a straightforward way to any power of two (and in less straightforward ways to arbitrary —we will return to this point at the end).

Instantiating Eq. (3) with (and writing ), we have

(4)

To fully exploit the patterns in this matrix, we note that represents a clockwise rotation of the complex plane by an eighth of the way around the circle. So, for example is twenty-one eighths of a turn or simply just turns. Thus and more generally . This allows us to simplify as follows:

(5)

Now notice that, since represents a clockwise rotation of an eighth of the way around the circle, represents a quarter turn of the circle. This fact leads to the surprising observation we can actually find the DFT matrix for hidden inside the DFT matrix for !

To see this, rearrange the columns of to interleave every other column. In matrix language this is represented by right-multiplying with an appropriate^{4}In fact, this permutation has a special name: the perfect shuffle. permutation matrix :

(6)

The top-left sub-block is precisely (up to scaling). In fact, defining the diagonal matrix (called the twiddle factor) and noting that , we have

(7)

The matrix is entirely built up of simple scalings of the smaller DFT matrix ! This suggests the following decomposition to compute :

(8)

Here represent the even-indexed entries of and the odd-indexed entries. Thus, we see that we can evaluate by evaluating the two expressions and . We have broken our problem into two smaller problems, which we then recombine into a solution of our original problem.

How then, do we compute the smaller DFTs and ? We just use the same trick again, breaking, for example, the product into further subcomputations and . Performing this process one more time, we need to evaluate expressions of the form , which are simply given by since the matrix is just a matrix whose single entry is .

This procedure is an example of a recursive algorithm: we designed an algorithm which solves a problem by breaking it down into one or more smaller problems, solve each of the smaller problems by using this same algorithm, and then reassemble the solutions of the smaller problems to solve our original problem. Eventually, we will break our problems into such small pieces that they can be solved directly, which is referred to as the base case of our recursion. (In our case, the base case is multiplication by ). Algorithms using this recursion in this way are referred to as divide-and-conquer algorithms.

Let us summarize this recursive procedure we’ve developed. We want to compute the DFT where is a power of two. First, we use the DFT to recursively compute and . Next, we combine these computations to evaluate by the formula

(9)

This procedure is the famous fast Fourier transform (FFT), whose modern incarnation was presented by Cooley and Tukey in 1965 with lineage that can be traced back to work by Gauss in the early 1800s. There are many variants of the FFT using similar ideas.

Let us see why the FFT is considered “fast” by analyzing its operation count. As is common for divide-and-conquer algorithms, the number of operations for computing using the FFT can be determined by solving a certain recurrence relation. Let be the number of operations required by the FFT. Then the cost of computing consists of

- proportional-to- operations (or operations, in computer science language
^{5}refers to big-O notation. Saying an algorithm takes operations is stating that, more or less, the algorithm takes less than some multiple of operations to complete.) to:- add, subtract, and scale vectors and
- multiply by the diagonal matrix and

- two recursive computations of for , each of which requires operations.

This gives us the recurrence relation

(10)

Solving recurrences is a delicate art in general, but a wide class of recurrences are immediately solved by the flexible master theorem for recurrences. Appealing to this result, we deduce that the FFT requires operations. This is a dramatic improvement of the operations to compute directly using Eq. (1). This dramatic improvement in speed is what makes the FFT “fast”.

## Extending the FFT Idea

The FFT is a brilliant algorithm. It exploits the structure of the discrete Fourier transform problem Eq. (1) for dramatically lower operation counts. And as we shall see a taste of, the FFT is useful in a surprisingly broad range of applications. Given the success of the FFT, we are naturally led to the question: can we learn from our success with the FFT to develop fast algorithms for other problems?

I think the FFT speaks to the power of a simple problem-solving strategy for numerical algorithm design^{6}As mentioned earlier, the FFT also exemplifies a typical design pattern in general (not necessarily numerical) algorithm design, the divide-and-conquer strategy. In the divide-and-conquer strategy, find a clever way of *dividing* a problem into multiple subproblems, *conquering* (solving) each, and then *recombining* the solutions to the subproblems into a solution of the larger problem. The challenge with such problems is often finding a way of doing the recombination step, which usually relies on some clever insight. Other instances of divide-and-conquer algorithms include merge sort and Karatsuba’s integer multiplication algorithm.: *whenever you have a linear transformation, write it as a matrix-vector product; whenever you have a matrix, write it down and see if there are any patterns.*^{7}Often, rearranging the matrix will be necessary to see any patterns. We often like to present mathematics with each step of a derivation follows almost effortlessly from the last from a firm basis of elegant mathematical intuition. Often, however, noticing patterns by staring at symbols on a page can be more effective than reasoning grounded in intuition. Once the pattern has been discovered, intuition and elegance sometimes will follow quickly behind.

The most natural generalization of the FFT is the fast *inverse* discrete Fourier transform, providing a fast algorithm to compute the inverse discrete Fourier transform Eq. (2). The inverse FFT is quite an easy generalization of the FFT presented in the past section; it is a good exercise to see if you can mimic the development in the previous section to come up with this generalization yourself. The FFT can also be generalized to other discrete trigonometric transforms and 2D and 3D discrete Fourier transforms.

I want to consider a problem more tangentially related to the FFT, the evaluation of expressions of the form , where is an matrix, is an matrix, is a vector of length , and denotes the Kronecker product. For the unitiated, the Kronecker product of and is a matrix defined as the block matrix

(11)

We could just form this matrix and compute the matrix-vector product directly, but this takes a hefty operations.^{8}Equally or perhaps more problematically, this also takes *space*. We can do better.

The insight is much the same as with the FFT: scaled copies of the matrix are embedded in . In the FFT, we needed to rearrange the columns of the DFT matrix to see this; for the Kronecker product, this pattern is evident in the natural ordering. To exploit this fact, chunk the vectors and into pieces and of length and respectively so that our matrix vector product can be written as^{9}This way of writing this expression is referred to as a conformal partitioning to indicate that one can multiply the block matrices using the ordinary matrix product formula treating the block entries as if they were simple numbers.

(12)

To compute this product efficiently, we proceed in two steps. First, we compute the products which takes time in total. Next, we compute each component by using the formula

(13)

which takes a total of operations to compute all the ‘s. This leads to a total operation count of for computing the matrix-vector product , much better than our earlier operation count of .^{10}There is another way of interpreting this algorithm. If we interpret and as the vectorization of and matrices and , then we have . The algorithm we presented is equivalent to evaluating this matrix triple product in the order . This shows that this algorithm could be further accelerated using Strassen–style fast matrix multiplication algorithms.

While this idea might seem quite far from the FFT, if one applies this idea iteratively, one can use this approach to rapidly evaluate a close cousin of the DFT called the Hadamard-Walsh transform. Using the Kronecker product, the Hadamard-Walsh transform of a vector is defined to be

(14)

If one applies the Kronecker product trick we developed repeatedly, this gives an algorithm to evaluate the Hadamard-Walsh transform of a vector of length in operations, just like the FFT.

The Hadamard-Walsh transform can be thought of as a generalization of the discrete Fourier transform to Boolean functions, which play an integral role in computer science. The applications of the Hadamard-Walsh transform are numerous and varied, from everything to voting systems to quantum computing. This is really just the tip of the iceberg. The ideas behind the FFT (and related ideas from the fast multipole method) allow for the rapid evaluation of a large number of transformations, some of which are connected by deep and general theories.

Resisting the temptation to delve into these interesting subjects in any more depth, we return to our main idea: *when presented with a linear transformation, write it as a matrix-vector product; whenever you have a matrix, write it down and see if there are any patterns.* The FFT exploits one such pattern, noticing that (after a reordering) a matrix contains many scaled copies of the same matrix. Rapidly evaluation expressions of the form involves an even simpler application of the same idea. But there are many other patterns that can be exploited: sparsity, (approximate) low rank, off-diagonal blocks approximately of low rank, and displacement structure are other examples. Very often in applied math, our problems have additional structure that can be exploited to solve problems much faster, and sometimes finding that structure is as easy as just trying to look for it.

## An Application of the FFT

A discussion of the FFT would be incomplete without exploring at least one reason *why* you’d want to compute the discrete Fourier transform. To focus our attention, let us consider another linear algebraic calculation which appears to have no relation to the FFT on its face: computing a matrix-vector product with a Toeplitz matrix. A matrix is said to be Toeplitz if it has the following structure:

(15)

Toeplitz matrices and their relatives appear widely across applications of applied mathematics including control and systems theory, time series, numerical partial differential equations, and signal processing.

We seek to compute the matrix-vector product . Let us by considering a special case of a Toeplitz matrix, a circulant matrix. A circulant matrix has the form

(16)

By direct computation, the matrix-vector product is given by

(17)

A surprising and non-obvious fact is that the circulant matrix is diagonalized by the discrete Fourier transform. Specifically, we have where . This gives a fast algorithm to compute in time : compute the DFTs of and and multiply them together entrywise, take the inverse Fourier transform, and scale by .

There is a connection with signal processing and differential equations that may help to shed light on why technique works for those familiar with those areas. In the signal processing context, the matrix-vector product can be interpreted as the discrete convolution of with (see Eq. (17)) which is a natural extension of the convolution of two functions and on the real line. It is an important fact that *the Fourier transform of a convolution is the same as multiplication of the Fourier transforms*: (up to a possible normalizing constant).^{11}A related identity also holds for the Laplace transform. The fact that the DFT diagonalizes a circulant matrix is just the analog of this fact for the discrete Fourier transform and the discrete convolution.

This fast algorithm for circulant matrix-vector products is already extremely useful. One can naturally reframe the problems of multiplying integers and polynomials as discrete convolutions, which can then be computed rapidly by applying the algorithm for fast circulant matrix-vector products. This video gives a great introduction to the FFT with this as its motivating application.

Let’s summarize where we’re at. We are interested in computing the Toeplitz matrix-vector product . We don’t know how to do this for a general Toeplitz matrix yet, but we can do it for a special Toeplitz matrix called a circulant matrix . By use of the FFT, we can compute the circulant matrix-vector product in operations.

We can now leverage what we’ve done with circulant matrices to accelerate Toeplitz matrix-vector product. The trick is very simple: *embedding*. We construct a big circulant matrix which contains the Toeplitz matrix as a sub-matrix and then use multiplications by the bigger matrix to compute multiplications by the smaller matrix.

Consider the following circulant matrix, which contains as as defined in Eq. (15) a sub-matrix in its top-left corner:

(18)

This matrix is hard to write out, but essentially we pad the Toeplitz matrix with extra zeros to embed it into a circulant matrix. The “” vector for this larger circulant matrix is obtained from the parameters of the Toeplitz matrix Eq. (15) by .

Here comes another clever observation: we can choose the number of padding zeros used cleverly to make the size of exactly equal to a power of two. This is useful because it allows us to compute matrix-vector products with the power-of-two FFT described above, which we know is fast.

Finally, let’s close the loop and use fast multiplications with to compute fast multiplications with . We wish to compute the product fast. To do this, vector into a larger vector by padding with zeros to get

(19)

where we use to denote matrix or vector entries which are immaterial to us. We compute by using our fast algorithm to compute and then discarding everything but the first entries of to obtain . If you’re careful to analyze how much padding we need to make this work, we see that this algorithm also takes only operations. Thus, we’ve completed our goal: we can compute Toeplitz matrix-vector products in a fast operations.

Finally, let us bring this full circle and see a delightfully self-referential use of this algorithm: we can use the FFT-accelerated fast Toeplitz matrix-vector multiply to compute DFT itself. Recall that the FFT algorithm we presented above was particularized to which were powers of . There are natural generalizations of the along the lines of what we did above to more general which are highly composite and possess many small prime factors. But what if we want to evaluate the DFT for which is a large prime?

Recall that the DFT matrix has th entry . We now employ a clever trick. Let be a diagonal matrix with the th entry equal to . Then, defining , we have that , which means is a Toeplitz matrix! (Writing out the matrix entrywise may be helpful to see this.)

Thus, we can compute the DFT for any size by evaluating the DFT as , where the product is computed using the fast Toeplitz matrix-vector product. Since our fast Toeplitz matrix-vector product only requires us to evaluate power-of-two DFTs, this technique allows us to evaluate DFTs of arbitrary size in only operations.

**Upshot:** The discrete Fourier transform (DFT) is an important computation which occurs all across applied mathematics. The fast Fourier transform (FFT) reduces the operation count of evaluating the DFT of a vector of length to proportional to , down from proportional to for direct evaluation. The FFT is an example of a broader matrix algorithm design strategy of looking for patterns in the numbers in a matrix and exploiting these patterns to reduce computation. The FFT can often have surprising applications, such as allowing for rapid computations with Toeplitz matrices.

Fantastic write up!