> Learning paths. Math majors — read everything, especially the orthogonality relations stated precisely in §22.3, the projection derivation in §22.4, and the Math-Major Sidebar on completeness and mean-square convergence in §22.9. CS / Data Science...
Prerequisites
- chapter-19-orthogonal-projection
- chapter-20-gram-schmidt-and-qr
Learning Objectives
- Treat functions as vectors in an inner product space with the inner product
= integral of f times g, recalling the vector-space abstraction of Chapter 5. - State precisely the orthogonality relations among the sines, cosines, and the constant function on a period, and verify them by integration.
- Compute a Fourier coefficient as the orthogonal projection of a function onto one orthonormal basis function, the identical formula from Chapters 19 and 20 in a new space.
- Reconstruct and approximate a signal by truncating its Fourier series, and explain why truncation is a best least-squares fit of given frequency content.
- Explain why orthogonality makes the Fourier coefficients independent, so that adding a new frequency never disturbs the coefficients already computed.
- Recognize the Gibbs phenomenon and other convergence subtleties, and implement fourier_coeffs by numerical projection, verifying the square-wave series against the known result.
In This Chapter
- 22.1 What does it mean to break a signal into frequencies?
- 22.2 How can a function be a vector?
- 22.3 Why are the sines and cosines an orthogonal basis?
- 22.4 What is a Fourier coefficient, and why is it a projection?
- 22.5 How do we decompose a square wave?
- 22.6 How does the reconstruction converge, and what is the Gibbs phenomenon?
- 22.7 Where does this show up — compression, audio, and beyond?
- 22.8 How does a computer compute these — sampling and the DFT?
- 22.9 What does the complex-exponential form add, and what about other signals?
- 22.10 Why is this the same projection, all the way down?
- 22.11 Summary
Application: Fourier Series as Projection onto Orthogonal Basis Functions
Learning paths. Math majors — read everything, especially the orthogonality relations stated precisely in §22.3, the projection derivation in §22.4, and the Math-Major Sidebar on completeness and mean-square convergence in §22.9. CS / Data Science — focus on the Geometric Intuition callouts, the square-wave reconstruction in §22.5–22.6, the numpy verifications, and the compression and spectrum applications in §22.7. Physics / Engineering — focus on the picture of a signal as a sum of pure tones, the orthogonality that separates them, and the convergence and Gibbs discussion in §22.8. This chapter is the payoff of Part IV: it takes the orthogonal projection of Chapter 19 and the orthonormal-basis coordinates of Chapter 20 and applies them, unchanged, in a space whose "vectors" are functions.
22.1 What does it mean to break a signal into frequencies?
Play a single note on a flute and a single note on a violin at the same pitch, and you can still tell them apart instantly. The pitch is the same — the air is vibrating the same number of times per second — yet the two sounds are unmistakably different. The difference is the shape of the wave. A flute's pressure wave is close to a pure sine; a violin's is a jagged, repeating curve. Somewhere inside that jagged curve is information about every overtone the instrument produces, and your ear, astonishingly, performs the decomposition automatically. This chapter is about doing that decomposition with mathematics, and the punchline is that it is nothing more than the orthogonal projection you have practiced for the last four chapters, carried out in a new space.
Here is the central claim, stated plainly so you can hold it in mind through everything that follows. Any reasonable periodic signal can be written as a sum of pure sine and cosine waves of different frequencies, and the amount of each frequency present is computed by projecting the signal onto that frequency — the very same projection formula from Chapter 19. The sines and cosines play the role of an orthogonal basis, exactly like the perpendicular axes you built with Gram-Schmidt in Chapter 20, except that now the "vectors" are functions and the basis is infinite. The list of how much of each frequency a signal contains is called its spectrum, and the recipe for building the signal back up out of those frequencies is called its Fourier series.
This is the boldest reuse of the vector-space abstraction in the book so far, and it is worth pausing to feel how strange and how natural it is at the same time. Back in Chapter 5 we insisted that a "vector" is anything that obeys the vector-space axioms — you can add two of them and scale one of them, and the usual rules hold. We noted then, almost in passing, that functions qualify: you can add two functions, you can scale a function, and the result is another function. We are about to cash in that observation completely. The functions on an interval form a vector space; we will equip it with an inner product; and once a space has an inner product, every tool of Part IV — length, angle, orthogonality, projection — becomes available, no modification required.
The phrase to keep at the front of your mind, the spine of this entire chapter, is Fourier coefficients as projection. A Fourier coefficient is a projection. Not "analogous to" a projection, not "reminiscent of" one — it is the orthogonal projection of a function onto a single basis function, computed by the identical formula you have used since Chapter 19. If you understand projection onto an orthonormal basis, you already understand Fourier analysis; what remains is to recognize the costume it is wearing.
The Key Insight — Decomposing a signal into frequencies is orthogonal projection onto a basis of sines and cosines. The signal is the vector; each pure frequency is a basis direction; the Fourier coefficient measuring "how much of this frequency is present" is the projection of the signal onto that direction. This is recurring theme #4 of the book — the same idea fits regression, builds orthonormal bases, and now extracts frequencies — wearing a new hat.
We will build the whole story from the geometry outward. First we make precise what it means for two functions to be perpendicular (§22.2–22.3). Then we show that the Fourier coefficient is a projection (§22.4). Then we decompose and rebuild a concrete signal — a square wave — and watch the reconstruction converge (§22.5–22.6), meeting the famous Gibbs overshoot along the way. We close with applications in audio and image compression (§22.7), the convergence subtleties (§22.8), and the deeper view that ties it to the rest of the book (§22.9).
FAQ: Why phrase frequency analysis as linear algebra at all?
Because it buys you everything Part IV proved, for free. Once you recognize the Fourier coefficient as a projection onto an orthonormal basis, you immediately know that the truncated series is the best possible approximation using those frequencies (projection gives the closest point, Chapter 19), that the coefficients are independent of one another (orthogonal directions do not contaminate, the theme of Part IV), and that the total energy of the signal is the sum of the energies in each frequency (the Pythagorean theorem in an orthonormal basis, Chapter 18). None of these facts has to be re-proved for Fourier series — they are corollaries of linear algebra you already own. That is what it means for a subject to be, in the words of the style bible, the most applied branch of pure mathematics.
22.2 How can a function be a vector?
Let's slow down on the move that makes everything possible: treating a function as a vector. We met this idea in Chapter 5, but here it does real work, so we rebuild it carefully.
Fix an interval — to be concrete and to match the audio picture, take all functions defined on $[-\pi, \pi]$, which we think of as one period of a repeating signal. Call a typical function $f$. We claim the set of all such functions is a vector space. The checks are exactly the ones from Chapter 5. Given two functions $f$ and $g$, their sum $f + g$ is the function whose value at $x$ is $f(x) + g(x)$ — pointwise addition, the only natural choice. Given a scalar $c$, the scaled function $cf$ has value $cf(x)$ at each $x$. The zero "vector" is the function that is zero everywhere. Addition is commutative and associative because addition of real numbers is; scaling distributes for the same reason. Every axiom holds because, at each point $x$, we are just doing ordinary arithmetic with real numbers.
Geometric Intuition — Picture a finite-dimensional vector as a list of numbers, one per coordinate: $\mathbf{v} = (v_1, v_2, \dots, v_n)$. Now imagine the list growing longer and longer until it has one entry for every point $x$ in the interval — the "$x$-th coordinate" of the function $f$ is simply its value $f(x)$. A function is a vector with a continuum of components. Adding two functions adds them component-by-component, exactly as you add two arrows in $\mathbb{R}^n$; the only change is that there are now infinitely many components, indexed by $x$ instead of by $1, 2, \dots, n$.
This picture also tells you how to build an inner product. In $\mathbb{R}^n$ the dot product sums the products of matching components: $\mathbf{u}\cdot\mathbf{v} = \sum_i u_i v_i$. For functions, "matching components" means "the same value of $x$," and "sum over all components" becomes "integrate over all $x$," because integration is the continuous limit of summation. So the natural inner product of two functions is
$$\langle f, g\rangle = \int_{-\pi}^{\pi} f(x)\, g(x)\, dx.$$
This is the locked notation $\langle f, g\rangle$ from the style bible, the abstract inner product first introduced in Chapter 18. Every property that made the dot product useful carries over, and for the same reasons. It is symmetric: $\langle f, g\rangle = \langle g, f\rangle$, since $f(x)g(x) = g(x)f(x)$ inside the integral. It is linear in each slot: $\langle f_1 + f_2, g\rangle = \langle f_1, g\rangle + \langle f_2, g\rangle$ and $\langle cf, g\rangle = c\langle f, g\rangle$, because the integral of a sum is the sum of integrals and constants pull out. And it is positive: $\langle f, f\rangle = \int_{-\pi}^{\pi} f(x)^2\, dx \ge 0$, an integral of something never negative, equal to zero only when $f$ is zero (essentially) everywhere. Those three properties — symmetry, linearity, positivity — are the definition of an inner product (Chapter 18), so we have a genuine inner product space.
With an inner product in hand, length and angle come for free, exactly by the Chapter 18 definitions. The norm of a function — its "length" — is
$$\lVert f\rVert = \sqrt{\langle f, f\rangle} = \left(\int_{-\pi}^{\pi} f(x)^2\, dx\right)^{1/2},$$
written with the locked double-bar notation $\lVert\cdot\rVert$. This number measures the overall "size" of the signal; in engineering, $\lVert f\rVert^2$ is proportional to the signal's energy, which is why the squared norm will keep reappearing. And two functions are orthogonal — perpendicular — when their inner product is zero, $\langle f, g\rangle = 0$, the direct generalization of $\mathbf{u}\cdot\mathbf{v} = 0$. The right angle you learned about as a child, the one Part IV is built on, now applies to functions.
Common Pitfall — Forgetting that "orthogonal" for functions means an integral is zero, not that the graphs look perpendicular. Two functions are orthogonal when $\int f g\, dx = 0$, which is a statement about cancellation: wherever the product $fg$ is positive it is balanced by regions where it is negative, so the signed area is zero. The graphs of $\sin x$ and $\cos x$ do not "cross at right angles" in any visual sense — yet they are orthogonal, because $\int_{-\pi}^{\pi}\sin x\cos x\, dx = 0$. Orthogonality lives in the inner product, never in the visual appearance of the curves.
FAQ: Is this inner product space finite-dimensional like the ones I know?
No — and that is the one genuinely new feature. In $\mathbb{R}^n$ a basis has $n$ vectors and every vector is a finite combination of them. The space of functions on an interval is infinite-dimensional: no finite list of functions can span it. As a result, the Fourier "basis" of sines and cosines is infinite, and writing a function in that basis gives an infinite sum, the Fourier series. This raises a convergence question that never arose in finite dimensions — when does an infinite sum of basis functions actually equal the original function? We will state the answer carefully in §22.8 and §22.9. For now, treat the infinite sum the way you treat any infinite series from calculus: a limit of finite partial sums, which we can compute and watch converge. Everything algebraic about projection works exactly as in finite dimensions; only the convergence of the full sum needs the extra care that infinity always demands.
22.3 Why are the sines and cosines an orthogonal basis?
Now the central fact, and the reason Fourier analysis is so clean: the sine and cosine functions, taken at integer frequencies, are mutually orthogonal under this inner product. They are the perpendicular axes of function space, handed to us ready-made — we do not even need Gram-Schmidt to build them, because they arrive orthogonal already.
Consider the family of functions on $[-\pi, \pi]$:
$$1, \quad \cos x, \ \sin x, \quad \cos 2x, \ \sin 2x, \quad \cos 3x, \ \sin 3x, \quad \dots$$
The constant function $1$, and for each positive integer $k$ the pair $\cos kx$ and $\sin kx$. The integer $k$ is the harmonic number: $k=1$ is the fundamental frequency, the slowest oscillation that still fits a whole number of periods in $[-\pi, \pi]$, and higher $k$ are its overtones, oscillating $k$ times as fast. These are exactly the pure tones an idealized instrument produces. Here are the orthogonality relations, stated precisely — these are the load-bearing facts of the entire chapter, so we write them with full conditions.
The orthogonality relations. For positive integers $k$ and $m$:
$$\int_{-\pi}^{\pi} \sin kx \, \sin mx\, dx = \begin{cases} \pi & k = m,\\ 0 & k \neq m,\end{cases} \qquad \int_{-\pi}^{\pi} \cos kx \, \cos mx\, dx = \begin{cases} \pi & k = m,\\ 0 & k \neq m,\end{cases}$$
$$\int_{-\pi}^{\pi} \sin kx \, \cos mx\, dx = 0 \quad\text{for all } k, m, \qquad \int_{-\pi}^{\pi} 1 \cdot \cos kx\, dx = \int_{-\pi}^{\pi} 1\cdot \sin kx\, dx = 0,$$
and for the constant function, $\int_{-\pi}^{\pi} 1\cdot 1\, dx = 2\pi$. In the locked notation: $\langle \sin kx, \sin mx\rangle$ and $\langle \cos kx, \cos mx\rangle$ are $\pi$ when $k=m$ and $0$ otherwise; every sine is orthogonal to every cosine; and the constant function $1$ is orthogonal to every sine and cosine, with $\lVert 1\rVert^2 = 2\pi$.
Let's prove the representative case — that $\sin kx$ and $\sin mx$ are orthogonal for $k \neq m$ — following the four-part proof shape from the style bible, because seeing why the integral vanishes makes the whole structure believable rather than magical.
Why we care. This single computation, repeated for the other pairs, is the entire foundation. If the basis functions were not orthogonal, the Fourier coefficients would be coupled — finding one would require solving a giant linear system, exactly the awkwardness Gram-Schmidt rescued us from in Chapter 20. Orthogonality is what makes each coefficient computable on its own, by a single integral. No orthogonality, no clean Fourier theory.
Key idea. A product of two sines is a sum of two cosines (a product-to-sum identity), and a cosine of a nonzero integer frequency integrates to zero over a full period, because it spends as much time positive as negative.
Proof. Start from the product-to-sum identity $$\sin kx \, \sin mx = \tfrac{1}{2}\big[\cos\!\big((k-m)x\big) - \cos\!\big((k+m)x\big)\big],$$ which you can verify by expanding the right-hand side with the cosine difference and sum formulas. Integrate both sides over $[-\pi, \pi]$: $$\int_{-\pi}^{\pi}\sin kx\,\sin mx\, dx = \tfrac{1}{2}\int_{-\pi}^{\pi}\cos\!\big((k-m)x\big)\, dx - \tfrac{1}{2}\int_{-\pi}^{\pi}\cos\!\big((k+m)x\big)\, dx.$$ Now use the elementary fact that for any nonzero integer $n$, $$\int_{-\pi}^{\pi}\cos(nx)\, dx = \left[\frac{\sin(nx)}{n}\right]_{-\pi}^{\pi} = \frac{\sin(n\pi) - \sin(-n\pi)}{n} = 0,$$ since $\sin(n\pi) = 0$ for every integer $n$. When $k \neq m$, both $k-m$ and $k+m$ are nonzero integers, so both integrals on the right vanish, and the whole expression is $0$. When $k = m$, the first term has $\cos(0\cdot x) = 1$, whose integral over $[-\pi, \pi]$ is $2\pi$, giving $\tfrac12(2\pi) - 0 = \pi$. That establishes both branches of the sine relation. $\blacksquare$
What this means. Geometrically, $\sin kx$ and $\sin mx$ point in perpendicular directions in function space whenever $k \neq m$: their "shadows" on each other are exactly zero. Two distinct pure frequencies are completely independent as far as the inner product can tell — knowing how much of the $k$-frequency a signal contains tells you nothing about how much of the $m$-frequency it contains. That independence, the central gift of orthogonality emphasized all through Part IV, is what we are about to exploit.
Geometric Intuition — Think of each frequency $\sin kx$, $\cos kx$ as one perpendicular axis in an infinite-dimensional space, just like $\mathbf{e}_1, \mathbf{e}_2, \dots$ in $\mathbb{R}^n$ but never-ending. Because they are mutually orthogonal, they form a clean right-angled coordinate frame — the dream coordinate system of Chapter 20, arriving pre-built. A signal is a single point in this space, and its Fourier coefficients are its coordinates along each frequency-axis. The whole of Fourier analysis is reading off coordinates in a perpendicular frame.
Two quick observations close this section. First, the norms are not $1$: $\lVert \sin kx\rVert = \lVert \cos kx\rVert = \sqrt{\pi}$ and $\lVert 1\rVert = \sqrt{2\pi}$. So this is an orthogonal basis, not yet an orthonormal one (Chapter 20's distinction). We can normalize by dividing each function by its norm, and we will, because the projection formula is cleanest for unit vectors. Second, we have not proved that these functions span the whole space — that every reasonable periodic function is some combination of them. That is the deep completeness theorem, which we discuss in §22.9; for this chapter we take it as the established fact it is and concentrate on the projection mechanics, which are pure Part IV.
FAQ: Do I have to verify all those integrals by hand every time?
No. You verify them once — the representative computation above generalizes mechanically to every pair via the same product-to-sum identities — and then you cite them as known, exactly as you cite that the standard basis $\mathbf{e}_1, \mathbf{e}_2$ is orthonormal without re-checking it each time. In practice you also confirm them numerically in seconds, which we do in §22.4. The point of stating the relations precisely, with their $k=m$ versus $k\neq m$ conditions, is that those conditions are what the projection formula relies on; get them right once and the rest of the chapter follows.
22.4 What is a Fourier coefficient, and why is it a projection?
Now we earn the chapter's title. Recall the master formula from Chapters 19 and 20: to find the coordinate of a vector $\mathbf{v}$ along a single orthonormal basis vector $\mathbf{q}$, you take the inner product:
$$\text{(coordinate of } \mathbf{v} \text{ along } \mathbf{q}) = \langle \mathbf{v}, \mathbf{q}\rangle, \qquad \text{and the projection of } \mathbf{v} \text{ onto } \mathbf{q} \text{ is } \langle \mathbf{v}, \mathbf{q}\rangle\,\mathbf{q}.$$
For an orthonormal basis $\mathbf{q}_1, \dots, \mathbf{q}_n$, you reconstruct the whole vector by summing its projections onto each axis: $\mathbf{v} = \sum_i \langle \mathbf{v}, \mathbf{q}_i\rangle\, \mathbf{q}_i$. This is the single most important formula of Part IV. We are now going to apply it verbatim — same symbols, same meaning — with $\mathbf{v}$ a signal $f$ and the $\mathbf{q}_i$ the normalized sines and cosines.
Normalize the basis first, so the formula is the clean unit-vector version. Dividing each function by its norm gives the orthonormal functions $$\frac{1}{\sqrt{2\pi}}, \qquad \frac{\cos kx}{\sqrt{\pi}}, \quad \frac{\sin kx}{\sqrt{\pi}} \quad (k = 1, 2, 3, \dots).$$ Each now has norm $1$, and any two distinct ones still have inner product $0$. This is a genuine orthonormal basis of function space, the infinite-dimensional sibling of the orthonormal $\mathbf{q}$'s you built with Gram-Schmidt in Chapter 20.
Apply the projection formula. The coordinate of $f$ along the normalized cosine $\cos kx / \sqrt{\pi}$ is the inner product $\langle f,\ \cos kx/\sqrt{\pi}\rangle$. It is traditional to package the $1/\sqrt{\pi}$ factors so the reconstruction reads cleanly, which gives the standard Fourier coefficients:
$$a_k = \frac{1}{\pi}\int_{-\pi}^{\pi} f(x)\cos kx\, dx = \frac{1}{\pi}\,\langle f, \cos kx\rangle, \qquad b_k = \frac{1}{\pi}\int_{-\pi}^{\pi} f(x)\sin kx\, dx = \frac{1}{\pi}\,\langle f, \sin kx\rangle,$$
for $k = 1, 2, 3, \dots$, together with the constant (average) term
$$a_0 = \frac{1}{2\pi}\int_{-\pi}^{\pi} f(x)\, dx = \frac{1}{2\pi}\,\langle f, 1\rangle.$$
With these, the Fourier series of $f$ is the reconstruction — the sum of projections onto every axis:
$$f(x) = a_0 + \sum_{k=1}^{\infty}\Big(a_k \cos kx + b_k \sin kx\Big).$$
Look closely at where the coefficients come from. Each $a_k$ and $b_k$ is, up to the normalizing constant $1/\pi$ that converts the orthogonal basis to an orthonormal one, exactly the inner product $\langle f, \text{(basis function)}\rangle$ — the projection coordinate. The $1/\pi$ is precisely the $1/\lVert \cdot\rVert^2$ factor that appears in the Chapter 19 projection formula when you project onto a vector that is orthogonal but not unit length: $\operatorname{proj}_{\mathbf{q}}(\mathbf{v}) = \frac{\langle \mathbf{v}, \mathbf{q}\rangle}{\langle \mathbf{q},\mathbf{q}\rangle}\mathbf{q}$, and here $\langle \cos kx, \cos kx\rangle = \pi$. There is nothing new in these formulas. The Fourier coefficient is the orthogonal projection of the signal onto one basis frequency, written out.
Geometric Intuition — To find "how much $\cos 3x$ is in the signal $f$," you project $f$ onto the $\cos 3x$ axis: you compute $\langle f, \cos 3x\rangle$ and divide by $\langle \cos 3x, \cos 3x\rangle = \pi$. That is the shadow $f$ casts on that one frequency-direction, the identical move as dropping a perpendicular from a point onto a line in Chapter 19. The Fourier series then re-assembles $f$ by stacking up all of these one-dimensional shadows — exactly as an orthonormal basis reconstructs a vector by summing its projections onto each axis.
Now the deepest payoff of orthogonality, and the reason the costume matters. Because the basis functions are orthogonal, each coefficient is computed completely independently of all the others. When you compute $b_3$ by integrating $f(x)\sin 3x$, every other basis function — every other sine, every cosine, the constant — contributes zero to that integral, because it is orthogonal to $\sin 3x$. The presence of $\cos 5x$ in the signal does not leak into your measurement of $\sin 3x$. This is why you can compute the coefficients one at a time, by single integrals, rather than by solving a coupled system. In a non-orthogonal basis you could not do this: every coefficient would depend on every other, and extracting one would mean inverting a large matrix. Orthogonality decouples the frequencies. This is the same independence that made least-squares coordinates clean in Chapter 19 and orthonormal-basis coordinates trivial in Chapter 20 — recurring theme #4, the same idea in a new space.
The Key Insight — Orthogonality is what makes the Fourier coefficients independent. Each $a_k, b_k$ is found by its own integral and is blind to every other frequency. Add a new harmonic to the analysis and the coefficients you already computed do not change one bit — you simply append one more projection. That "no looking back, no recomputation" property is impossible without orthogonality, and it is the entire reason the sines and cosines, rather than some arbitrary basis, are the right coordinate system for periodic signals.
Let's confirm both the orthogonality relations and the projection formula numerically before we put them to work. The mathematics indexes harmonics from $1$ ($a_1, b_1, \dots$); numpy arrays index from $0$, so the $k$-th harmonic lives at array index $k-1$ — the first place that gap bites in this chapter.
# Verify orthogonality of the Fourier basis by numerical integration (np.trapezoid).
import numpy as np
x = np.linspace(-np.pi, np.pi, 400_000, endpoint=False) # one period, fine grid
def inner(g, h): # <g, h> = integral of g*h over one period
return np.trapezoid(g * h, x)
print("<sin x, sin 2x> =", round(inner(np.sin(x), np.sin(2*x)), 4)) # 0.0 (k != m)
print("<sin x, cos x> =", round(inner(np.sin(x), np.cos(x)), 4)) # 0.0 (sine vs cosine)
print("<sin 3x, sin 3x> =", round(inner(np.sin(3*x), np.sin(3*x)), 4)) # 3.1416 (= pi)
print("<1, 1> =", round(inner(np.ones_like(x), np.ones_like(x)), 4)) # 6.2832 (= 2*pi)
The printed values are 0.0, 0.0, 3.1416, and 6.2832 — precisely $0$, $0$, $\pi$, and $2\pi$, matching the orthogonality relations of §22.3 to four decimals. The basis is orthogonal, distinct frequencies are perpendicular, and the squared norms are $\pi$ and $2\pi$ exactly as stated.
Computational Note — We use
np.trapezoid(the trapezoidal rule; in older numpy it was spellednp.trapz) to approximate $\int_{-\pi}^{\pi} g\,h\, dx$ on a fine grid. With $400{,}000$ sample points the approximation is accurate to several decimals, which is why3.1416agrees with $\pi$. Numerical integration is projection by quadrature: we are estimating an inner product the only way a computer can, by sampling and summing. A coarser grid would show small residues (we will see one in §22.6); refining the grid drives them to zero.
FAQ: Why divide by pi (and 2pi for the constant) in the coefficient formulas?
Because the sines and cosines are orthogonal but not unit length — their squared norm is $\pi$, and the constant's is $2\pi$. The general projection formula from Chapter 19, $\frac{\langle f, g\rangle}{\langle g, g\rangle}$, divides the inner product by the squared norm of the direction you project onto. For $\cos kx$ that squared norm is $\pi$, giving the $1/\pi$; for the constant $1$ it is $2\pi$, giving the $1/(2\pi)$ in $a_0$. If you instead use the normalized basis functions $\cos kx/\sqrt{\pi}$, the division is folded into the basis vector and the coefficient is the bare inner product. Either bookkeeping is correct; the $1/\pi$ convention just keeps the reconstruction formula tidy.
22.5 How do we decompose a square wave?
Time to do it for real, on the signal that has launched a thousand signal-processing courses: the square wave. Define $f$ on one period by $$f(x) = \begin{cases} +1 & 0 < x < \pi,\\ -1 & -\pi < x < 0,\end{cases}$$ and repeat it with period $2\pi$. This is a switch that flips between $+1$ and $-1$ — a clock signal, the on/off pulse train at the heart of every digital circuit. It is also delightfully nasty: it has sharp jumps, so it is the perfect stress test for our smooth sine-and-cosine basis. Can a sum of gentle, infinitely-smooth waves really build a function with vertical cliffs in it? Watch.
First, exploit symmetry to save work — symmetry arguments are the experienced analyst's first move, and they fall straight out of the orthogonality picture. The square wave is an odd function: $f(-x) = -f(x)$. Cosines are even and the constant is even, while $f$ is odd, and the integral of (odd)$\times$(even) over a symmetric interval is zero. So every cosine coefficient and the constant term vanish before we compute anything: $$a_0 = 0, \qquad a_k = \frac{1}{\pi}\int_{-\pi}^{\pi} \underbrace{f(x)}_{\text{odd}}\underbrace{\cos kx}_{\text{even}}\, dx = 0 \quad\text{for all } k.$$ Geometrically: the square wave is orthogonal to every cosine and to the constant, so its projection onto each of those axes is zero. It lives entirely in the "sine subspace." This is the independence of orthogonal coordinates doing our work for us — we have eliminated half the basis with a one-line parity argument.
Now the sine coefficients. Using oddness again, $f(x)\sin kx$ is even (odd times odd), so the integral over $[-\pi,\pi]$ is twice the integral over $[0,\pi]$, where $f = +1$: $$b_k = \frac{1}{\pi}\int_{-\pi}^{\pi} f(x)\sin kx\, dx = \frac{2}{\pi}\int_{0}^{\pi}\sin kx\, dx = \frac{2}{\pi}\left[\frac{-\cos kx}{k}\right]_0^{\pi} = \frac{2}{\pi}\cdot\frac{1 - \cos k\pi}{k}.$$ Now $\cos k\pi = (-1)^k$, so $1 - \cos k\pi$ is $0$ when $k$ is even and $2$ when $k$ is odd. Therefore $$b_k = \begin{cases} \dfrac{4}{\pi k} & k \text{ odd},\\[2mm] 0 & k \text{ even}.\end{cases}$$ Only the odd harmonics survive, and each odd coefficient is $4/(\pi k)$. Plugging in: $b_1 = 4/\pi \approx 1.2732$, $b_3 = 4/(3\pi) \approx 0.4244$, $b_5 = 4/(5\pi) \approx 0.2546$, $b_7 = 4/(7\pi) \approx 0.1819$, and so on, the coefficients shrinking like $1/k$. The Fourier series of the square wave is the clean infinite sum
$$f(x) = \frac{4}{\pi}\left(\sin x + \frac{\sin 3x}{3} + \frac{\sin 5x}{5} + \frac{\sin 7x}{7} + \cdots\right) = \frac{4}{\pi}\sum_{k\ \text{odd}}\frac{\sin kx}{k}.$$
The signal is a weighted stack of odd-harmonic sine waves, with the higher harmonics contributing less and less. That is the spectrum of a square wave: only odd frequencies, decaying as $1/k$.
Real-World Application — Digital electronics and signal integrity. A clock signal in a CPU is an idealized square wave, and this Fourier decomposition explains a real engineering headache. To transmit a clean square edge you must carry not just the fundamental but all the odd harmonics; a wire or circuit that attenuates high frequencies (every real wire does) drops the $\sin 5x$, $\sin 7x$ terms, and the once-sharp edges round off into mush. Engineers literally reason about clock signals in terms of "how many harmonics survive the channel." The shape of a digital pulse is its Fourier series, truncated by physics.
Let's confirm the coefficients by direct numerical projection — computing each $b_k = \frac1\pi\langle f, \sin kx\rangle$ as an integral, not by the formula, to verify the formula independently.
# Square-wave Fourier coefficients by numerical projection (compare to 4/(pi*k)).
import numpy as np
x = np.linspace(-np.pi, np.pi, 400_000, endpoint=False)
f = np.sign(np.sin(x)) # +1 on (0,pi), -1 on (-pi,0): the square wave
for k in range(1, 8):
b_k = np.trapezoid(f * np.sin(k*x), x) / np.pi # projection coefficient
print(f"k={k}: b_k = {b_k:+.5f} 4/(pi*k) = {4/(np.pi*k):.5f}")
# k=1: b_k = +1.27324 4/(pi*k) = 1.27324
# k=2: b_k = +0.00000 4/(pi*k) = 0.63662 <- even harmonic vanishes
# k=3: b_k = +0.42441 4/(pi*k) = 0.42441
# k=5: b_k = +0.25465 4/(pi*k) = 0.25465
The numerically projected coefficients match $4/(\pi k)$ for the odd harmonics to five decimals, and the even harmonics come out as $0.00000$ — exactly the hand result. The projection (numerical integration) and the closed-form formula agree, as they must.
FAQ: Why do only the odd harmonics appear?
It is a consequence of orthogonality plus a second symmetry. The square wave is odd, which kills all cosines and the constant. Among the sines, it also has half-wave symmetry: shifting by half a period flips its sign. A sine $\sin kx$ shares that flip-under-half-period property only when $k$ is odd; for even $k$ the sine is unchanged by the half-period shift and therefore cannot help build an antisymmetric-under-shift signal. Concretely, the integral $\int_0^\pi \sin kx\, dx$ evaluates to $\frac{1-(-1)^k}{k}$, which is zero for even $k$. So the even harmonics are orthogonal to this particular signal and drop out. Different signals have different spectra; the sawtooth in §22.8, for instance, keeps all its harmonics.
22.6 How does the reconstruction converge, and what is the Gibbs phenomenon?
We have the recipe. Now watch the signal rebuild itself as we add frequencies one at a time. The partial sum $S_N$ keeps only the first $N$ nonzero terms — for the square wave, the first $N$ odd harmonics:
$$S_N(x) = \frac{4}{\pi}\sum_{n=1}^{N}\frac{\sin\big((2n-1)x\big)}{2n-1} = \frac{4}{\pi}\left(\sin x + \frac{\sin 3x}{3} + \cdots + \frac{\sin\big((2N-1)x\big)}{2N-1}\right).$$
This is truncation: we keep the low-frequency content and discard the high-frequency tail. And here is a fact that should feel familiar from Chapter 19 — the truncated series is the best possible approximation of $f$ using those frequencies. Because the basis is orthogonal, $S_N$ is precisely the orthogonal projection of $f$ onto the finite-dimensional subspace spanned by the first $N$ odd harmonics. By the closest-point property of projection (Chapter 19), no other combination of those same sines comes closer to $f$ in the norm $\lVert \cdot\rVert$. Truncating a Fourier series is least-squares approximation, the very same optimality you proved for regression. This is the function-space cousin of the low-rank-in-spirit idea you will meet fully with the SVD in Chapter 30: keep the most important components, discard the rest, and you have the best approximation of that complexity.
Let's draw the partial sums and watch them converge. Using matplotlib, we overlay $S_1$, $S_3$, $S_5$, and $S_{25}$ on the true square wave.
# Figure 22.1: partial sums of the square-wave Fourier series converging to f.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-np.pi, np.pi, 2000)
def partial_sum(x, N): # sum of first N odd-harmonic sine terms
s = np.zeros_like(x)
for n in range(1, N + 1):
k = 2*n - 1 # odd harmonics: 1, 3, 5, ...
s += (4/np.pi) * np.sin(k*x) / k
return s
fig, ax = plt.subplots(figsize=(8, 4.5))
ax.plot(x, np.sign(np.sin(x)), "k-", lw=2, label="square wave f(x)")
for N, style in [(1, "C0--"), (3, "C1-."), (5, "C2:"), (25, "C3-")]:
ax.plot(x, partial_sum(x, N), style, lw=1.5, label=f"$S_{{{N}}}$ ({N} terms)")
ax.axhline(0, color="gray", lw=0.5); ax.legend(loc="upper right", fontsize=8)
ax.set_title("Figure 22.1 — Fourier partial sums converging to a square wave")
ax.set_xlabel("x"); ax.set_ylabel("amplitude")
Figure 22.1. Alt-text: the black square wave with four colored approximations laid over it; the one-term sine $S_1$ is a single gentle hump that badly misses the corners, while each successive partial sum hugs the flat $\pm 1$ stretches more tightly and the $25$-term curve is nearly indistinguishable from the square wave except for a thin spike of overshoot right at each jump. As $N$ grows, the partial sum flattens out along the constant $\pm 1$ regions and sharpens at the transitions. One sine is a crude blob; three terms already show the flat tops forming; twenty-five terms looks like a square wave drawn with a slightly nervous hand. The infinite sum of perfectly smooth sines reproduces the discontinuous square wave — almost.
That "almost" is the famous Gibbs phenomenon, and it is the chapter's Common Pitfall, because it traps everyone who first sees it. Look closely at the jumps in Figure 22.1: right beside each vertical cliff, every partial sum overshoots the target, poking above $+1$ and below $-1$ in a little spike. Your instinct says "add more terms and the spike will shrink to nothing." It does not. The spike gets narrower as $N$ grows — it crowds closer to the jump — but its height refuses to vanish. Let's measure it.
# The Gibbs overshoot: peak of the partial sum near the jump, vs. number of terms.
import numpy as np
x = np.linspace(0, np.pi, 200_000) # look just to the right of the jump at x=0
def peak(N):
s = sum((4/np.pi) * np.sin((2*n-1)*x) / (2*n-1) for n in range(1, N+1))
return s.max()
for N in [1, 3, 5, 25, 100]:
print(f"{N:4d} terms: peak = {peak(N):.5f}")
# 1 terms: peak = 1.27324
# 3 terms: peak = 1.18836
# 5 terms: peak = 1.18233
# 25 terms: peak = 1.17911
# 100 terms: peak = 1.17899
The peak settles toward about $1.179$ and stays there no matter how many terms you add. The partial sums converge to a value roughly $9\%$ above the jump — more precisely, the overshoot approaches $\tfrac{2}{\pi}\!\int_0^\pi \tfrac{\sin t}{t}\, dt \approx 1.17898$, an overshoot of $8.949\%$ of the total jump height of $2$. The ringing never disappears; it only migrates ever closer to the discontinuity. This is not a bug in our code or a failure of convergence in any ordinary sense — it is a genuine and unavoidable feature of approximating a jump with smooth waves.
Common Pitfall — Expecting the Gibbs overshoot to disappear with more terms. It does not. The famous $\approx 9\%$ overshoot at a jump discontinuity persists for every finite partial sum, however many terms you take; adding terms narrows the ringing and pushes it toward the jump but never lowers its peak. The resolution of the apparent paradox is the mode of convergence: the Fourier series converges to $f$ in the mean-square (energy) sense — $\lVert f - S_N\rVert \to 0$ — and it converges pointwise to $f$ at every point where $f$ is continuous. At the jump itself it converges to the midpoint of the jump (here, $0$), and the overshoot lives in an ever-thinner sliver beside the jump whose contribution to the energy $\lVert f - S_N\rVert^2$ shrinks to zero even though its height does not. Mean-square convergence and pointwise convergence are different things; Gibbs is the gap between them.
Warning — The clean statement "the Fourier series equals $f$" carries a condition you must respect: it holds where $f$ is continuous. At a jump discontinuity the series converges to the average of the left and right limits, $\tfrac12\big(f(x^-)+f(x^+)\big)$, not to either one-sided value — for the square wave, to $0$ at each jump, regardless of whether you defined $f$ to be $+1$, $-1$, or anything else there. Never assert pointwise equality at a discontinuity; state mean-square convergence (always valid for square-integrable $f$) or pointwise convergence on the continuity set. This is the kind of theorem-condition the style bible insists we make explicit.
Energy convergence is worth seeing as a number, because it is the sense in which the approximation is genuinely excellent. By Parseval's identity (the Pythagorean theorem in function space, §22.10), the fraction of the signal's total energy $\lVert f\rVert^2$ captured by the first $N$ terms is the sum of those terms' energies divided by the total.
# Fraction of square-wave energy captured by the first N (odd-harmonic) terms.
import numpy as np
total = 2*np.pi # ||f||^2 = integral of 1 over [-pi,pi] = 2*pi
for N in [1, 3, 5, 10, 50]:
captured = np.pi * sum((4/(np.pi*(2*n-1)))**2 for n in range(1, N+1))
print(f"{N:3d} terms: {100*captured/total:.3f}% of energy")
# 1 terms: 81.057% of energy
# 3 terms: 93.306% of energy
# 5 terms: 95.960% of energy
# 10 terms: 97.975% of energy
# 50 terms: 99.595% of energy
A single sine already captures $81\%$ of the square wave's energy; five terms reach $96\%$; fifty terms exceed $99.5\%$. That is the sense in which the reconstruction converges — overwhelmingly, in energy — even while the Gibbs spike at the jump stubbornly refuses to flatten. The two facts are not in conflict; they are answers to two different questions.
FAQ: If the series overshoots at the jump, in what sense is it correct?
In the sense that matters for signals: energy. The mismatch between $f$ and its partial sum $S_N$, measured by $\lVert f - S_N\rVert^2$ — the integrated squared error, proportional to the energy of the error signal — goes to zero as $N \to \infty$. The Gibbs overshoot occupies a region whose width shrinks like $1/N$, so although its height stays near $9\%$, the area (and hence energy) it contributes vanishes. For audio, image, and data applications, mean-square (energy) error is almost always the right measure, so a Fourier reconstruction that is excellent in energy is excellent in practice, Gibbs ringing notwithstanding. The overshoot is visible to the eye but nearly inaudible to the ear and negligible to the integral.
22.7 Where does this show up — compression, audio, and beyond?
The reason Fourier analysis is one of the most consequential ideas in applied mathematics is that truncating the series compresses the signal. If a signal's energy is concentrated in a few low frequencies — as it is for most natural sounds and images — then a handful of Fourier coefficients reconstruct it almost perfectly, and you can throw the rest away. Storing or transmitting those few numbers instead of the full signal is compression, and it is the same low-rank-approximation philosophy you will meet with the SVD in Chapter 30: represent the data by its most important components and discard the negligible tail.
Real-World Application — Audio compression (MP3, AAC). Lossy audio codecs cut a sound into short windows and compute the frequency content of each window — a Fourier-style transform. Most of the energy sits in relatively few frequency bands, so the encoder keeps the strong coefficients at full precision, stores the weak ones coarsely, and discards frequencies a psychoacoustic model predicts you cannot hear (a loud tone masks nearby quiet ones). The decoder rebuilds the waveform by summing the retained frequencies back up — the reconstruction step of §22.6. The whole pipeline is "project onto a frequency basis, keep the big coefficients, reconstruct." Case Study 1 develops this in detail.
The same idea governs images, with a two-dimensional cousin of the Fourier basis. JPEG splits an image into $8\times 8$ blocks and expands each block in a basis of two-dimensional cosine patterns — the discrete cosine transform, a close relative of the Fourier cosine series. The low-frequency patterns (slowly varying brightness) carry most of a block's energy; the high-frequency patterns (fine texture) carry little, and the eye barely misses them. JPEG keeps the low-frequency coefficients precisely and quantizes or zeroes the high-frequency ones — truncation again. Case Study 2 walks through the DCT picture. The thread is identical to the square wave: a signal is a vector, an orthogonal basis of frequencies are the axes, the coefficients are projections, and discarding small coefficients is a controlled, least-squares-optimal approximation.
Real-World Application — Solving differential equations and the heat equation. Fourier's original motivation in 1807
[verify]was heat flow, not audio. The reason sines and cosines are the basis for so much of physics is that they are the eigenfunctions of differentiation: $\frac{d^2}{dx^2}\sin kx = -k^2\sin kx$. Expanding a temperature profile in this basis turns a hard partial differential equation into a separate, trivial equation for each coefficient — because the basis diagonalizes the operator. That is a direct preview of Part V: an orthogonal basis of eigenvectors decouples a transformation into independent one-dimensional pieces, exactly as orthogonality decoupled our Fourier coefficients here.
There is even a direct line from this chapter to your daily life through the signals and plotting tools that visualize spectra: every spectrum analyzer, every equalizer bar bouncing on a music player, every "bass/treble" control is displaying or manipulating Fourier coefficients. When you slide a graphic-equalizer band, you are scaling the projection of the audio onto a range of frequency-axes — boosting or cutting the amount of those basis functions in the reconstruction. The abstract projection of Chapter 19 is, quite literally, in your pocket.
Historical Note — Joseph Fourier introduced these series around 1807
[verify]in his study of heat conduction, claiming that any periodic function could be written as a sum of sines and cosines. Leading mathematicians of the day, including Lagrange, were skeptical that smooth waves could represent functions with corners and jumps, and Fourier's memoir was initially not published in full[verify]. The skeptics were right to worry about the convergence subtleties — Gibbs would quantify the overshoot decades later[verify]— but Fourier's core claim, suitably interpreted as mean-square convergence, was correct and became one of the most fertile ideas in all of mathematics.
FAQ: Is compression really just "keep the big coefficients"?
At its heart, yes, and that is the beautiful part. Real codecs add sophistication — perceptual models, entropy coding, block transforms with overlap to avoid edge artifacts — but the load-bearing idea is exactly the projection-and-truncate move of this chapter: express the signal in an orthogonal frequency basis, where its energy concentrates into a few coefficients, then store only those. Orthogonality is what makes "keep the big ones" sound, because in an orthogonal basis each coefficient's contribution to the total energy is independent and additive (Parseval). Drop a small coefficient and you lose exactly its small energy and nothing else — no hidden coupling spreads the damage. That clean, controllable trade-off is the whole reason orthogonal bases dominate compression.
22.8 How does a computer compute these — sampling and the DFT?
Everything so far has used integrals — inner products of continuous functions. But a computer never sees a continuous function; it sees a finite list of samples, the signal measured at $N$ evenly spaced instants. So how does the projection survive the jump to a finite machine? The answer is a beautiful collapse: the continuous projection becomes a finite matrix-vector product, and the orthogonal basis becomes a finite orthogonal matrix. This is the discrete Fourier transform (DFT), and it is pure Part IV — orthogonality, one last time, in finite dimensions where it all began.
Sample the signal at the $N$ points $x_j = 2\pi j / N$ for $j = 0, 1, \dots, N-1$, collecting the values into a vector $\mathbf{f} = (f_0, f_1, \dots, f_{N-1}) \in \mathbb{C}^N$. The continuous inner product $\langle f, g\rangle = \int fg$ is approximated by its Riemann sum — sample the integrand and add up — so the projection of $\mathbf{f}$ onto the discrete frequency $\mathbf{w}_k$ with components $(\mathbf{w}_k)_j = e^{i k x_j}$ becomes a finite dot product: $$\hat f_k = \sum_{j=0}^{N-1} f_j\, e^{-i k x_j} = \langle \mathbf{f}, \mathbf{w}_k\rangle.$$ That is the DFT coefficient, and it is literally the complex inner product of Chapter 18 between the sample vector and a sampled exponential — projection by dot product, the finite-dimensional original of the integral we have been computing. The integral was always the limit of these sums as $N \to \infty$; the DFT is what you get by stopping at finite $N$.
And here is the orthogonality that makes it clean, now a statement about vectors in $\mathbb{C}^N$ rather than functions. The sampled exponentials satisfy $$\langle \mathbf{w}_k, \mathbf{w}_m\rangle = \sum_{j=0}^{N-1} e^{i(k-m)x_j} = \begin{cases} N & k \equiv m \!\!\pmod N,\\ 0 & \text{otherwise},\end{cases}$$ a finite geometric series that sums to zero unless every term is $1$. So the $N$ discrete frequency vectors $\mathbf{w}_0, \dots, \mathbf{w}_{N-1}$ are mutually orthogonal — they form an orthogonal basis of $\mathbb{C}^N$, the finite cousin of the orthogonal sines and cosines. Stack them as the columns of a matrix and (after scaling by $1/\sqrt{N}$) you get a genuine orthogonal/unitary matrix, the DFT matrix. Computing all the coefficients at once is then a single matrix-vector multiplication $\hat{\mathbf{f}} = F\mathbf{f}$, and because $F$ is unitary, inverting it — reconstructing the signal from its coefficients — is just multiplying by $F^{*}$, the conjugate transpose. This is exactly the gift orthonormal columns gave us in Chapter 21: $Q^{*}Q = I$, so the inverse is free.
Geometric Intuition — The DFT is change of basis (Chapter 16) into an orthogonal frequency frame. A sampled signal $\mathbf{f} \in \mathbb{C}^N$ is one point; the DFT rewrites it in the coordinate system whose axes are the $N$ pure sampled frequencies. Because those axes are orthogonal, the change of basis is a rigid rotation/reflection (an orthogonal matrix, Chapter 21), and reading off a frequency coordinate is a single dot product. The continuous Fourier series and the finite DFT are the same idea — project onto an orthogonal frequency basis — separated only by whether the basis is infinite (functions) or finite (sampled vectors).
Computational Note — Computed naively, $F\mathbf{f}$ costs $N^2$ multiplications — one dot product per frequency, $N$ of them. The Fast Fourier Transform (FFT) reorganizes the arithmetic to exploit symmetries in the $e^{-ikx_j}$ and computes the same result in $N\log N$ operations, a stupendous saving (for $N = 10^6$, the difference between $10^{12}$ and about $2\times 10^7$ operations). The FFT does not change the mathematics one bit — the output is the identical orthogonal projection — it just computes it cleverly.
np.fft.fftis this algorithm, and it is one of the most-run pieces of numerical code on Earth.
A quick numerical confirmation that the DFT recovers our square-wave spectrum, tying the finite computation back to the continuous coefficients of §22.5:
# The DFT recovers the square-wave sine coefficients (np.fft as orthogonal projection).
import numpy as np
N = 1024
j = np.arange(N)
x = 2*np.pi*j/N # N samples over one period
f = np.sign(np.sin(x)) # sampled square wave
F = np.fft.fft(f) / N # DFT coefficients c_k = projection / N
# b_k = -2 * Im(c_k); compare to 4/(pi*k) for odd k
for k in [1, 3, 5]:
b_k = -2 * F[k].imag
print(f"k={k}: b_k from FFT = {b_k:+.5f} 4/(pi*k) = {4/(np.pi*k):.5f}")
# k=1: b_k from FFT = +1.27324 4/(pi*k) = 1.27324
# k=3: b_k from FFT = +0.42440 4/(pi*k) = 0.42441
# k=5: b_k from FFT = +0.25463 4/(pi*k) = 0.25465
The FFT-derived coefficients reproduce $4/(\pi k)$ to four decimals — any tiny discrepancy is the sampling error of using $N = 1024$ points instead of a continuum, and it shrinks as $N$ grows. The finite DFT, the infinite Fourier series, and our hand computation all agree, because all three are the same orthogonal projection at different resolutions.
FAQ: Is the DFT different mathematics from the Fourier series?
No — it is the same projection, discretized. The Fourier series projects a continuous function onto an infinite orthogonal basis of sines and cosines using integrals. The DFT projects a sampled vector onto a finite orthogonal basis of sampled exponentials using dot products. Replace the integral by a Riemann sum and the function by its samples, and the series formula becomes the DFT formula. The FFT is merely a fast algorithm for the DFT; it changes the running time, not the answer. So when you call np.fft.fft, you are computing orthogonal projections onto a frequency basis — the exact subject of this chapter, at the resolution your data actually has.
22.9 What does the complex-exponential form add, and what about other signals?
Two extensions deepen the picture and pad out your intuition: a more compact basis using complex exponentials, and a second worked signal so the square wave does not feel like a special case.
The complex-exponential basis. Euler's formula $e^{i\theta} = \cos\theta + i\sin\theta$ lets us fuse each sine-cosine pair into a single complex exponential. Define, for every integer $k$ (now ranging over negative integers too), the basis functions $e^{ikx}$. Under the Hermitian inner product for complex-valued functions — which conjugates the second factor, $\langle f, g\rangle = \int_{-\pi}^{\pi} f(x)\,\overline{g(x)}\, dx$, exactly the complex inner product of Chapter 18 — these are orthogonal: $$\int_{-\pi}^{\pi} e^{ikx}\,\overline{e^{imx}}\, dx = \int_{-\pi}^{\pi} e^{i(k-m)x}\, dx = \begin{cases} 2\pi & k = m,\\ 0 & k \neq m.\end{cases}$$ So the $\{e^{ikx}\}$ are another orthogonal basis, and the Fourier coefficient is — once again — a projection: $$c_k = \frac{1}{2\pi}\langle f, e^{ikx}\rangle = \frac{1}{2\pi}\int_{-\pi}^{\pi} f(x)\,e^{-ikx}\, dx, \qquad f(x) = \sum_{k=-\infty}^{\infty} c_k\, e^{ikx}.$$ This is the same machine — orthogonal basis, coefficient by projection — in tidier packaging, and it is the form used almost everywhere in engineering and physics because one formula handles both sines and cosines at once. For our real square wave the complex coefficients come out purely imaginary, $c_k = -\,i\,\tfrac{2}{\pi k}$ for odd $k$ and zero for even $k$, consistent with the sine-only real series we found ($b_k = 4/(\pi k)$). Let's confirm two of them.
# Square-wave complex Fourier coefficients c_k by projection (Hermitian inner product).
import numpy as np
x = np.linspace(-np.pi, np.pi, 400_000, endpoint=False)
f = np.sign(np.sin(x))
for k in [1, 3]:
c_k = np.trapezoid(f * np.exp(-1j*k*x), x) / (2*np.pi)
print(f"k={k}: c_k = {c_k.real:+.4f}{c_k.imag:+.4f}i -2/(pi*k) = {-2/(np.pi*k):+.4f}")
# k=1: c_k = +0.0000-0.6366i -2/(pi*k) = -0.6366
# k=3: c_k = -0.0000-0.2122i -2/(pi*k) = -0.2122
The imaginary parts come out $-0.6366$ and $-0.2122$, exactly $-2/(\pi k)$ for $k = 1, 3$, with negligible real parts — the projection formula again, now in the complex basis. The complex and real forms carry identical information; they are two coordinate systems for the same point in function space, related by Euler's formula.
Math-Major Sidebar — The orthonormal complex exponential basis is $\{e_k(x) = e^{ikx}/\sqrt{2\pi} : k \in \mathbb{Z}\}$, and the space it spans (the closure of its finite combinations) is the Hilbert space $L^2[-\pi,\pi]$ of square-integrable functions — the infinite-dimensional inner product space we are really working in. Completeness of this basis is the deep theorem: every $f \in L^2$ is the mean-square limit of its Fourier partial sums, equivalently $\lVert f - S_N\rVert \to 0$. Completeness is what distinguishes an orthonormal basis from a merely orthonormal set; the sines and cosines are not just mutually perpendicular, they leave no direction unaccounted for. We meet $L^2$ and abstract inner product spaces properly in Chapter 34; for now, completeness is the rigorous content behind the informal claim "every periodic signal is a sum of sines and cosines."
A second signal: the sawtooth. To see that the square wave is not a fluke, take the sawtooth $f(x) = x$ on $(-\pi, \pi)$, repeated. It too is odd, so again only sines appear, but now every harmonic survives because the sawtooth lacks the square wave's half-wave symmetry. The coefficients work out (by integration by parts) to $b_k = \tfrac{2(-1)^{k+1}}{k}$, giving $$x = 2\left(\sin x - \frac{\sin 2x}{2} + \frac{\sin 3x}{3} - \frac{\sin 4x}{4} + \cdots\right).$$
# Sawtooth f(x)=x on (-pi,pi): coefficients by projection vs. formula 2(-1)^(k+1)/k.
import numpy as np
x = np.linspace(-np.pi, np.pi, 400_000, endpoint=False)
f = x.copy()
for k in range(1, 6):
b_k = np.trapezoid(f * np.sin(k*x), x) / np.pi
print(f"k={k}: b_k = {b_k:+.5f} 2(-1)^(k+1)/k = {2*(-1)**(k+1)/k:+.5f}")
# k=1: b_k = +2.00000 2(-1)^(k+1)/k = +2.00000
# k=2: b_k = -1.00000 2(-1)^(k+1)/k = -1.00000
# k=3: b_k = +0.66667 2(-1)^(k+1)/k = +0.66667
The projected coefficients match the formula exactly: $b_1 = 2$, $b_2 = -1$, $b_3 = \tfrac23$, with alternating signs. Two different signals, two different spectra, one identical method — project onto each frequency, read off the coordinate. The sawtooth also has a jump (at $x = \pm\pi$, where it leaps from $\pi$ down to $-\pi$), so it too exhibits Gibbs overshoot there, a useful confirmation that Gibbs is about jumps in general, not about the square wave in particular.
FAQ: Should I use the real (sine/cosine) form or the complex form?
Use whichever fits the problem; they are equivalent. The real form is more intuitive for a first encounter and for real-valued signals where you want to see "how much $\cos kx$ and how much $\sin kx$" explicitly. The complex exponential form is more compact — one coefficient $c_k$ per frequency instead of a pair — and it is the form that generalizes cleanly to the discrete Fourier transform and the FFT that actually runs on your computer, so it dominates in engineering and computing. The conversion is just Euler's formula: $c_k = \tfrac12(a_k - i b_k)$ for $k > 0$. Both are projections onto an orthogonal basis; the choice is bookkeeping, not mathematics.
22.10 Why is this the same projection, all the way down?
Step back and see the whole arc, because the unity is the point of the chapter. Everything we did is one move from Part IV, applied in a new space.
In Chapter 18 we defined an inner product abstractly, by its three properties — symmetry, linearity, positivity — precisely so it could be reused beyond arrows in $\mathbb{R}^n$. We have now cashed that in: $\langle f, g\rangle = \int fg$ is an inner product, so function space is an inner product space, and length, angle, and orthogonality mean exactly what they always meant. In Chapter 19 we proved that the orthogonal projection of a vector onto a subspace is its closest point, and that the coordinate along an orthonormal direction is an inner product. The Fourier coefficient is that coordinate; the truncated Fourier series is that closest point. In Chapter 20 we learned that an orthonormal basis makes coordinates trivial — one inner product each, no system to solve — because orthogonal directions are independent. The sines and cosines are exactly such a basis (handed to us pre-orthogonal), which is why each Fourier coefficient is one independent integral, blind to all the others.
Geometric Intuition — Hold the two pictures side by side. In $\mathbb{R}^3$ with the standard axes, a vector $\mathbf{v}$ has coordinates $\langle \mathbf{v}, \mathbf{e}_1\rangle, \langle \mathbf{v}, \mathbf{e}_2\rangle, \langle \mathbf{v}, \mathbf{e}_3\rangle$, and $\mathbf{v} = \sum_i \langle \mathbf{v}, \mathbf{e}_i\rangle \mathbf{e}_i$. In function space with the Fourier axes, a signal $f$ has coordinates $a_k = \tfrac1\pi\langle f, \cos kx\rangle$ and $b_k = \tfrac1\pi\langle f, \sin kx\rangle$, and $f = a_0 + \sum_k (a_k\cos kx + b_k \sin kx)$. The formulas are the same sentence in two languages. Replace the finite sum by an infinite one, the arrows by functions, the dot product by an integral, and finite-dimensional projection becomes Fourier analysis. Nothing else changed.
One more inheritance closes the loop: the Pythagorean theorem. In an orthonormal basis, the squared length of a vector is the sum of the squares of its coordinates (Chapter 18). In function space this is Parseval's identity: $$\lVert f\rVert^2 = \int_{-\pi}^{\pi} f(x)^2\, dx = 2\pi a_0^2 + \pi\sum_{k=1}^{\infty}\big(a_k^2 + b_k^2\big),$$ the statement that the total energy of a signal equals the sum of the energies in its frequencies. We used it in §22.6 to compute how much energy each partial sum captures, and it is the exact reason "keep the big coefficients" is a sound compression strategy: in an orthogonal basis, energy is additive across coordinates, so discarding a small coefficient costs exactly its small energy and nothing more. Parseval is the Pythagorean theorem of Chapter 18, in function space.
Build Your Toolkit — Implement
fourier_coeffs(f, n)and a companionreconstructintoolkit/fourier.py. Yourfourier_coeffstakes a Python functionfand an integernand returns the coefficients $a_0$ and the arrays $a_1, \dots, a_n$, $b_1, \dots, b_n$, each computed by numerical projection — that is, by approximating the inner products $\langle f, \cos kx\rangle$ and $\langle f, \sin kx\rangle$ withnp.trapezoidon a fine grid over one period, then dividing by the appropriate squared norm ($\pi$, or $2\pi$ for $a_0$). Yourreconstruct(a0, a, b, x)evaluates the partial sum $a_0 + \sum_k (a_k\cos kx + b_k\sin kx)$. Verify on the square wavef = lambda x: np.sign(np.sin(x)): your coefficients should reproduce $b_k \approx 4/(\pi k)$ for odd $k$ and $\approx 0$ for even $k$ and all $a_k$, and your reconstruction should visibly converge to the square wave (Gibbs spike and all). This is the projection of Chapter 19 carried out by quadrature — the chapter's entire idea, in twelve lines of code. (Cross-check the closed-form square-wave coefficients against the numpy outputs in §22.5.)
Here is the shape of the implementation and its verification, so you know your target output.
# toolkit/fourier.py (sketch) — Fourier coefficients by numerical projection.
import numpy as np
def fourier_coeffs(f, n, L=np.pi, num=4000):
x = np.linspace(-L, L, num, endpoint=False)
fx = f(x)
a0 = np.trapezoid(fx, x) / (2*L) # projection onto the constant 1
a = np.array([np.trapezoid(fx*np.cos(k*x), x)/L for k in range(1, n+1)]) # onto cos kx
b = np.array([np.trapezoid(fx*np.sin(k*x), x)/L for k in range(1, n+1)]) # onto sin kx
return a0, a, b
a0, a, b = fourier_coeffs(lambda x: np.sign(np.sin(x)), 5)
print("a0 =", round(a0, 4)) # a0 = -0.0003 (~ 0)
print("b =", np.round(b, 4)) # b = [1.2732 0. 0.4244 0. 0.2546]
print("4/(pi*k) =", [round(4/(np.pi*k), 4) for k in range(1, 6)]) # [1.2732, 0.6366, 0.4244, 0.3183, 0.2546]
The output confirms it: $a_0 \approx -0.0003$ (numerically zero, as the square wave has no constant part — the tiny residue is the grid's finite resolution, per the Computational Note in §22.4), the odd $b_k$ reproduce $4/(\pi k)$ to four decimals, and the even $b_k$ vanish. The cosine coefficients (not printed) are likewise $\approx 0$. Your from-scratch projection has recovered the exact spectrum we derived by hand in §22.5.
Check Your Understanding — A signal $g$ on $[-\pi,\pi]$ is built as $g(x) = 5 + 3\cos 2x - 4\sin 7x$. Without integrating anything, write down all of its nonzero Fourier coefficients.
Answer
Read them straight off, because $g$ is already written in the orthogonal basis — its Fourier coefficients are just the coefficients you see. The constant term is $a_0 = 5$ (the average value). The cosine coefficient $a_2 = 3$. The sine coefficient $b_7 = -4$. Every other $a_k$ and $b_k$ is $0$. No integration is needed precisely because the basis is orthogonal: each coefficient is the independent projection onto its own axis, and $g$ already displays its coordinates. This is the whole point of §22.4 — in an orthogonal basis, a coefficient is a coordinate, and coordinates of a vector already expressed in the basis require no work to extract.
FAQ: How does this connect to the eigenvalues coming in Part V?
Tightly, and it is the perfect bridge. We noted in §22.7 that sines and cosines are the eigenfunctions of the second-derivative operator: $\frac{d^2}{dx^2}\sin kx = -k^2 \sin kx$, an eigen-equation $A\mathbf{v} = \lambda\mathbf{v}$ with the operator playing the role of $A$, the function $\sin kx$ the eigenvector, and $-k^2$ the eigenvalue. Expanding a signal in the Fourier basis is therefore expanding it in the eigenvectors of differentiation — and that is exactly the move that will dominate Part V, where an orthogonal basis of eigenvectors diagonalizes a transformation into independent one-dimensional stretches. The orthogonality that decoupled our Fourier coefficients is the same orthogonality that, in Chapter 27's Spectral Theorem, makes a symmetric matrix orthogonally diagonalizable. And the idea that a signal is well approximated by keeping its largest-energy components is the seed of the SVD's low-rank approximation in Chapter 30. Fourier series is, in a real sense, your first eigen-decomposition — projection onto an orthogonal basis of eigenfunctions. Part V makes that view the center of the book.
22.11 Summary
A Fourier coefficient is an orthogonal projection. That one sentence is the chapter. We treated functions as vectors in an inner product space (Chapter 5's abstraction, made to work hard), with the inner product $\langle f, g\rangle = \int_{-\pi}^{\pi} fg\, dx$ that supplies length, angle, and orthogonality exactly as in $\mathbb{R}^n$. We stated and proved the orthogonality relations that make the sines, cosines, and the constant a mutually perpendicular basis — $\langle \sin kx, \sin mx\rangle = \pi$ when $k = m$ and $0$ otherwise, every sine orthogonal to every cosine. We recognized each Fourier coefficient as the projection of a signal onto one basis frequency, the identical formula from Chapters 19 and 20, and we saw that orthogonality is precisely what makes the coefficients independent — each found by its own integral, undisturbed by all the rest. We decomposed a square wave into odd-harmonic sines with $b_k = 4/(\pi k)$, watched the truncated series converge to it in energy (a single sine captures $81\%$ of the energy; fifty terms exceed $99.5\%$), and met the Gibbs overshoot that persists at the jump — the gap between mean-square and pointwise convergence. We saw the same projection-and-truncate idea power audio and image compression, glimpsed the complex-exponential form, and traced the through-line forward to the eigenvalues of Part V and the SVD of Chapter 30, where this very picture — an orthogonal basis that decouples a transformation — becomes the heart of the book.
Real-World Application — The everyday spectrum. The bouncing bars on a music visualizer, the noise-cancellation in your headphones, the lossy compression of every streamed song and shared photo, the radio that tunes to one station and rejects the rest — all of it is projection onto an orthogonal basis of frequencies, keeping the coefficients you want and discarding the ones you do not. The right angle you learned as a child organizes the spectrum of a sound, and a Fourier coefficient is nothing but a shadow cast on a single pure tone.