Fourier transform — why does it even work?

A geometrical interpretation of its inner workings

10 min readJan 5, 2021

Image source: Google images and http://firsttimeprogrammer.blogspot.com/2015/04/fourier-series-and-square-wave.html

What this post is about? What you need to know?

In this post I will try to explain the intuition behind Fourier series expansion — why does it at all work, what is the motivation behind it. So I assume a certain knowledge about the basics of Fourier expansion, at least its principle, as a pre-requisite. A little exposure to vectors and their projections will also do (particularly that found in linear algebra courses although linear algebra is not needed). The motivation for writing this post comes from a portion of the book “Linear Algebra And Its Application” by Gilbert Strang.

A little context

Fourier series expansion is one of the ubiquitous tools in the field of science — both pure and applied sciences. The way in which many of the engineering students get exposed to it is as a tool in the field of signal processing. Its specialty lies in its capability to express any function as a summation of simple components, thereby approximating it to any desired level of accuracy. The most widespread decomposition of f(x) into such simple components via Fourier expansion is as follows —

The expansion of Fourier series in terms of sin(nx) and cos(nx)

The power of this scheme comes from the fact that the a_i and b_i (s) are very simple to calculate. Say a_1 can be found very simply via —

This is because all combinations of sin(mx)sin(nx) (with m not equals to n) and cos(mx)sin(nx) vanish when integrated from 0 to 2*pi. This special property, many basic exposures to Fourier series refer to as orthogonality — sin(2x), sin(3x) and cos(x) are all mutually orthogonal. Just mentioning this point here because we will discover a meaning for it.

I will not elaborate on this any further and hope that the above serves as a good enough context for the questions I will put forth next —

Isn’t it strange that such a scheme has the capability to approximate any strange function, with more number of terms leading to better and better approximations? Is there something deeper going on behind those equations and steps? Is Fourier series expansion something more than just a coincidence (where a series of calculations lead to something useful)?

These are the questions I will be tackling here by providing a geometrical interpretation for Fourier expansion. It will also give us a deep understanding of the strange yet effective Fourier expansion scheme.

Functions as vectors

Before we move any further, I will first shed light on a very abstract yet useful concept — how functions might be seen as vectors. A function f(x)is just a height constructed at each and every x. That can be a way of looking at a function; at least “functionally” that’s what a function is. Just to be more precise instead of constructing heights we construct thin rectangles around any x and all these rectangles approximate the function. The more and thinner the rectangles we have, the better the approximation.

Now the area of the rectangle around any x i.e., f(x).dx can be thought of as the contribution of the function at x. By giving specific contributions at different x do we get our function. Thus a function can be written as an ordered series of “its contributions along the points in the x axis”.

So x_1, x_2, . . ., x_n might denote n independent components (or directions in an abstract manner) along which different contributions are recorded in an ordered fashion. And this vectorial representation is also a proper representation of the function f(x). The closer x_1, x_2, . . ., x_n are the better is this representation. x_1, x_2, . . ., x_n are the various directions along which the vector f(x) has components shown in above figure.

Well that might look fancy but does that bridge between vectors and functions remain intact; when we apply our pre-established knowledge from one domain does it give results that are in sync with results from other domain. We will now put our new found interpretation through that test.

Take functions f(x) and g(x) as per our above scheme. h(x) = f(x) + g(x). Thus at any point x the height of h(x) = f(x) + g(x) is known (and that knowledge itself defines h(x)). Now take the vector representations of f(x), g(x) and add them up as vectors (component-wise addition).

Vectorially adding up the vector representations of f(x) and g(x) using the fact that f(x)+g(x) for any x is h(x)

Isn’t our final output an equivalent vector representation of h(x)? A similar exercise can be performed to show that cf(x) i.e., scalar multiplication, when approached through either path — ordinary interpretation of functions or vectorial treatment of functions — do not lead to any contradiction. Without testing a whole lot of other cases we put our trust temporarily on this vector representation of a function and move ahead. If it is incorrect it possibly will anyway lead to contradictions down the line.

Length of function vectors

Now that we have functions as vectors what is the length of those vectors. For a vector a, |a|² = sum of the squares of the components of a. For the function vector f(x) it is —

It very beautifully transforms to an integral

(I hope now you understand why I deliberately took root of delta_x; so in the length formula we have only delta_x. In fact I would emphasize I cheat here a bit; the jump to the integrals is not entirely sound. If someone indeed refutes the jump to the integrals, I devote a section to clarifying that point at the end. But right now I would prefer the reader keep that doubt aside and move on).

But it also has a definite chance to move towards infinity for most f(x). So we constrain our range of study for f(x) within x_initial and x_final so that we have a finite length. That way we can use our function vectors in various calculations avoiding infinities. Our integral becomes —

That’s manageable and finite. Limiting our scope (domain) of study of f(x) is a small price to pay for that.

The dense nature of the vector space

The dimension of a vector is the number of components it has. By limiting the range within x_initial and x_final we limit the spectrum of components. But within that finite range we can have infinite number of components; infinitesimally separated x within the range [x_initial, x_final]. Thus it is like an infinitely dense space; infinite dimensions crammed within a finite scope. Not that it has some significance here. But felt like it was a good point to bring to attention.

The dot product of function vectors

Just like dot products of other vectors we have the dot product of two function vectors f(x) and g(x) as —

Just like a vector dot product gives a sense how much the vectors are aligned, the function-vector dot products do the same. If f(x) and g(x) are more and more similar at every x the dot product or the integral seems to be more and more positive; while if they are very much oppositely aligned the integral tends to be negative.

As mentioned in the section “Length of function vectors”, here too the transition from the summation to the integral is not proper. The end section would clarify that point. Right now, let’s carry on.

Vector projections

In the context of vector projections —

This theory about projections remains in tact if we extrapolate from what we learned till now. For a function f(x) projected on g(x) —

Hence the method of projecting one vector on another can be carried over to function-vectors as well.

Orthogonal vectors

We call two vectors orthogonal if their dot product is 0 or projection of one vector along another is 0. Remember we cannot stick to the geometrical definition of 90⁰ angle because that is only limited to 3-dimensional vectors. For n-dimensional vectors the dot product or projections being 0 is a more general perspective.

For function-vectors the same definition is applied. Just here we have integrals involved in the dot product.

The specialty of sines and cosines

Now we will see an important feature of sines and cosines that we saw towards the beginning.

These three properties form the bread and butter of Fourier expansion. The first line say that sin(mx) and cos(nx) are orthogonal functions. The next line says sin(mx) and sin(nx) for m and n not equal are orthogonal functions. The last line says —

So sin(mx) and cos(nx) for any m and n produce a system of orthogonal (any vector is orthogonal to all other vectors) unit length vectors. We will see how this special interpretation becomes helpful for the all too familiar Fourier expansion.

The piece-de-resistance

Now with all these interpretations we are ready for the final magic to unfold.

Suppose we have a 3-dimensional vector. If we project it along one direction from the projection we have some information about it. If we study its projection along two independent vectors or directions we capture more information about it from these two projections i.e., we better approximate it with these two projections. If we have its projections along all three independent directions we can capture all of the vector without any approximation errors.

Similar goes for n-dimensional vectors. As we capture its projection along more and more number of independent directions our approximation for the vector gets better and better. With n such projections we know the vector perfectly.

For a function we might know its value at a certain number of x points. As we know its value at more and more number of x points within a certain range we know the function more and more; our approximation gets better. To get an exact representation of the function within a certain range we need to know it at all points in that range i.e., an infinite number of projections or components that we saw in the vector representation.

Now all these projections are easier to calculate and handle if the vectors along which we project are perpendicular to each other. Better even if they are of unit length. Fourier series expansion is just that. An attempt to use system of orthogonal vectors (or directions) of unit length — sines and cosines — to get projections along them to approximate the function f(x). The more number of independent directions we use the better the approximation. To reduce the approximation error to 0 we need an infinite number of projections. That is what the infinite nature of the expansion signifies.

When we found a_1 in the beginning (at the context portion) by multiplying both sides of the expansion by sin(x) we were relying on the fact that most terms on the RHS would vanish because they are orthogonal to sin(x). Only sin(x).sin(x) would remain. This is the ease of dealing with orthogonal vectors (even if they are function-vectors).

So you see that Fourier expansion approximation scheme is quite natural to get better and better as more terms are included; and with an infinite number of terms it should represent f(x) exactly. That is the truth — abstract yet beautiful and ingenious — behind the FOURIER EXPANSION.

Transition from summation to integrals in the dot product of function vectors

While explaining dot products of two function-vectors I claimed that —

However, this transition to the integral in the final line is incorrect. In fact, if we use delta_x in place of (delta_x)^(1/2) the inconsistency is glaring.

To avoid this confusing point from leading the reader astray, I tried to trick the reader. But now the time has come to really address this issue. Because honestly speaking, if this point cannot be addressed to establish the transition from summation to integrals, then all that we discussed above turns futile. The use of orthogonal projections to explain Fourier expansion loses its truth and remains only a fancy idea. Projections! Voila: projections are exactly where the solution lies.

The dot product by itself does not mean a lot. At least not here. It is only in the context of projections that it becomes a part of calculation. True that if we want to know the magnitude of the dot product only, it is 0 in our form of treatment. But we postpone studying the magnitude of the dot product till we get to the projection of f(x) along g(x). Magnitude of projection of f(x) along g(x) —

So now you, that although the extra delta_x did create problems in the dot product, when it comes to projections, it vanishes off and all the transitions to integrals hold true. So in the end, the explanation of the Fourier expansion in terms of orthogonal projections stands very much true.