> Learning paths. Math majors — read everything, especially the length-preservation proof, the determinant argument, and the Math-Major Sidebar on the orthogonal group. CS / Data Science — focus on the Geometric Intuition, the numpy, and the...
Prerequisites
- chapter-20-gram-schmidt-and-qr
Learning Objectives
- Define an orthogonal matrix by the condition $Q^{\mathsf{T}}Q = I$ and explain why this makes $Q^{-1} = Q^{\mathsf{T}}$.
- Prove that orthogonal matrices preserve lengths, and deduce that they preserve dot products and angles.
- Show that an orthogonal matrix has $\det = \pm 1$, and classify $+1$ as a rotation and $-1$ as a reflection.
- Write and apply 2D and 3D rotation matrices and a Householder reflection, and verify each is orthogonal.
- Explain the group structure: a product of orthogonal matrices is orthogonal, and the inverse is too.
- State the complex analogue — a unitary matrix satisfies $U^{*}U = I$ — and connect it to quantum logic gates.
In This Chapter
- 21.1 What does it mean for a matrix to preserve distance?
- 21.2 What is an orthogonal matrix, exactly?
- 21.3 Why do orthogonal matrices preserve length? (the central proof)
- 21.4 Why does preserving length also preserve angles?
- 21.5 Why must the determinant of an orthogonal matrix be ±1?
- 21.6 What does a rotation matrix look like in 2D?
- 21.7 What does a rotation look like in 3D, and how do reflections fit in?
- 21.8 The visualizer returns: a rotation versus a reflection
- 21.9 Why is a product of orthogonal matrices orthogonal? (the group structure)
- 21.10 What is the complex analogue? Unitary matrices and the qubit
- 21.11 Where do orthogonal and unitary matrices show up in signal processing?
- 21.12 How do you check orthogonality, and build a rotation, in code?
- 21.13 What have we built, and where does it lead?
Orthogonal Matrices and Rotations: Transformations That Preserve Distance
Learning paths. Math majors — read everything, especially the length-preservation proof, the determinant argument, and the Math-Major Sidebar on the orthogonal group. CS / Data Science — focus on the Geometric Intuition, the numpy, and the rotation/whitening applications; the group-theory sidebar is optional. Physics / Engineering — focus on the geometry of rigid motion, the rotation matrices in 2D and 3D, and the unitary preview that powers quantum gates.
Pick up a book off your desk and turn it in the air. Flip it over. Slide it across the table. In every one of those motions the book itself does not change — no edge stretches, no corner bends, the cover and the spine stay exactly as far apart as they always were. The position changes; the shape does not. Your hand has just performed what mathematicians call a rigid motion, and the matrices that describe such motions about a fixed point are the subject of this chapter. They are the transformations that preserve distance, and they turn out to be precisely the orthogonal matrices — the most well-behaved, most computationally pleasant family of matrices you will ever meet.
We have spent all of Part IV building the geometry of right angles. Chapter 18 gave us length and angle through the dot product; Chapter 19 made "closest point" precise through projection; Chapter 20 taught us to manufacture orthonormal bases with Gram–Schmidt and to package them as the $Q$ in a QR factorization. That $Q$ was our first orthogonal matrix, and we noted in passing that it had a magical property: $Q^{\mathsf{T}}Q = I$. This chapter is where we stop and ask the geometric question we have been circling — what kind of transformation has orthonormal columns, and what does it do to space? The answer is rotations and reflections, and it is one of the most satisfying payoffs in the book.
The central claim, which we will earn rather than assert, is this: the single algebraic equation $Q^{\mathsf{T}}Q = I$ is equivalent to the geometric statement "this transformation preserves every length." Once a transformation preserves lengths it automatically preserves angles, areas, and volumes too — it moves the whole space rigidly. That equivalence between a clean piece of algebra and a vivid piece of geometry is exactly the kind of two-views-of-one-object harmony the book keeps returning to. And at the end of the chapter we will meet the complex cousin of the orthogonal matrix — the unitary matrix, satisfying $U^{*}U = I$ — and finally see why the qubit we teased back in Chapter 1 is governed by exactly this mathematics.
21.1 What does it mean for a matrix to preserve distance?
Before any formula, let us fix the picture. We have a transformation $T(\mathbf{x}) = Q\mathbf{x}$ acting on $\mathbb{R}^n$. We want $T$ to be an isometry: a map that never changes the distance between two points. If $\mathbf{x}$ and $\mathbf{y}$ are two points, the distance between them is $\lVert \mathbf{x} - \mathbf{y}\rVert$, and we demand that after the transformation this distance is unchanged: $$\lVert Q\mathbf{x} - Q\mathbf{y}\rVert = \lVert \mathbf{x} - \mathbf{y}\rVert \quad\text{for all } \mathbf{x}, \mathbf{y}.$$
Because $Q$ is linear, $Q\mathbf{x} - Q\mathbf{y} = Q(\mathbf{x} - \mathbf{y})$, so this is the same as saying $\lVert Q\mathbf{v}\rVert = \lVert \mathbf{v}\rVert$ for every vector $\mathbf{v}$. In words: a linear isometry is a transformation that never changes the length of any vector. That is the whole geometric idea, and everything algebraic in this chapter flows from it.
Geometric Intuition — Picture the unit square from our recurring visualizer. A shear slants it into a parallelogram; a scaling stretches it; a general matrix warps it however it likes. A length-preserving transformation refuses all of that. The unit square can be spun to a new angle or flipped over like a pancake, but its sides stay length 1, its corners stay right angles, and its area stays exactly 1. Rigidity is the absence of distortion.
Why insist on length preservation? Because so much of applied mathematics is about moving data into a better coordinate system without corrupting it. When a robot reorients its gripper, the gripper must not grow. When you whiten a dataset or change to an orthonormal basis, you want to repackage the same information, not distort the relationships inside it. When a quantum computer applies a gate, the total probability must stay equal to 1 — which, we will see, is exactly a length-preservation requirement. Distance-preserving maps are the transformations you can apply without losing or fabricating information.
21.1.1 The two motions that preserve distance
Intuitively there are only two kinds of rigid motion about a fixed origin in the plane. You can rotate the plane about the origin by some angle, or you can reflect it across some line through the origin. (A glide or a slide would move the origin, so it is not a linear map; translations live in Chapter 12's homogeneous coordinates, not here.) Everything else — say, rotate by 30° then flip — is just a combination of these two. By the end of the chapter we will have proven that these really are the only possibilities: every orthogonal $2\times 2$ matrix is a rotation or a reflection, no exceptions. The determinant will tell us which.
The Key Insight — A linear transformation preserves distance if and only if its matrix $Q$ satisfies $Q^{\mathsf{T}}Q = I$. Geometry (rigid motion) and algebra (orthonormal columns) are two descriptions of the very same object.
21.2 What is an orthogonal matrix, exactly?
Now the precise definition. We have hinted at it since Chapter 20; here it is in full, with its conditions stated carefully.
Definition (orthogonal matrix). A real square matrix $Q \in \mathbb{R}^{n\times n}$ is orthogonal if its transpose is its inverse: $$Q^{\mathsf{T}}Q = I \quad\Longleftrightarrow\quad Q^{-1} = Q^{\mathsf{T}}.$$ Equivalently, the columns of $Q$ form an orthonormal set: they are mutually perpendicular unit vectors.
Let us unpack why those two phrasings are the same statement, because the equivalence is the source of all the power. Write $Q$ in terms of its columns, $Q = [\,\mathbf{q}_1 \mid \mathbf{q}_2 \mid \cdots \mid \mathbf{q}_n\,]$. When you form $Q^{\mathsf{T}}Q$, the entry in row $i$, column $j$ is the dot product of the $i$-th column with the $j$-th column: $$\bigl(Q^{\mathsf{T}}Q\bigr)_{ij} = \mathbf{q}_i \cdot \mathbf{q}_j.$$ Setting $Q^{\mathsf{T}}Q = I$ means every such dot product equals the corresponding entry of the identity: $\mathbf{q}_i\cdot\mathbf{q}_j = 1$ when $i = j$ (each column is a unit vector) and $\mathbf{q}_i\cdot\mathbf{q}_j = 0$ when $i \ne j$ (distinct columns are perpendicular). That is precisely the definition of orthonormal columns. So "$Q^{\mathsf{T}}Q = I$" and "orthonormal columns" are literally the same sentence written two ways.
Common Pitfall — The name is a historical misnomer that traps everyone once. An "orthogonal matrix" does not merely have orthogonal columns — it has orthonormal columns: orthogonal and unit length. A matrix whose columns are perpendicular but not unit length, like $\begin{psmallmatrix}2 & 0\\ 0 & 3\end{psmallmatrix}$, is not orthogonal, because $Q^{\mathsf{T}}Q = \begin{psmallmatrix}4 & 0\\ 0 & 9\end{psmallmatrix} \ne I$. The normalization is not optional; it is half the definition. (Many authors wish the term were "orthonormal matrix," but "orthogonal" is the entrenched standard.)
21.2.1 Rows are orthonormal too
For a square matrix, $Q^{\mathsf{T}}Q = I$ forces $QQ^{\mathsf{T}} = I$ as well. The reason is the basic fact from Chapter 9 that a one-sided inverse of a square matrix is automatically two-sided: if $Q^{\mathsf{T}}Q = I$, then $Q^{\mathsf{T}}$ is a left inverse of $Q$, and for square matrices a left inverse must also be a right inverse, so $QQ^{\mathsf{T}} = I$ too. Reading $QQ^{\mathsf{T}} = I$ the same way we read $Q^{\mathsf{T}}Q = I$, we learn that the rows of an orthogonal matrix are also orthonormal. So an orthogonal matrix is orthonormal both across its columns and down its rows — a strikingly rigid structure.
Warning
— The square-ness condition matters. For a non-square matrix with orthonormal columns — a "tall" $m\times n$ matrix with $m > n$, like the $Q$ from a reduced QR factorization in Chapter 20 — we still have $Q^{\mathsf{T}}Q = I_n$, but $QQ^{\mathsf{T}} \ne I_m$. Such a matrix has orthonormal columns but is not invertible and is not called orthogonal; the term "orthogonal matrix" is reserved for square matrices. We will lean on the tall-$Q$ case again when projections reappear, but keep the distinction sharp: only square orthonormal-column matrices earn the name orthogonal.
21.2.2 The free inverse — the single most useful property
Pause on the equivalence $Q^{-1} = Q^{\mathsf{T}}$, because it is the property that working mathematicians and engineers reach for daily. Inverting a general $n\times n$ matrix is expensive: Gaussian elimination (Chapter 9) costs on the order of $n^3$ arithmetic operations, and the result can be numerically delicate if the matrix is near-singular. For an orthogonal matrix, inversion costs nothing — you just transpose, which is a free relabeling of entries with no arithmetic at all. There is no elimination, no determinant, no division, and no possibility of blowing up, since an orthogonal matrix is never close to singular (its determinant is exactly $\pm 1$).
Concretely, suppose you need to solve $Q\mathbf{x} = \mathbf{b}$ for a rotation $Q$. Instead of running elimination, you read off the answer instantly: $\mathbf{x} = Q^{-1}\mathbf{b} = Q^{\mathsf{T}}\mathbf{b}$. Take $Q$ the rotation by $60°$ and $\mathbf{b} = (1, 0)$: $$Q = \begin{bmatrix} \tfrac12 & -\tfrac{\sqrt3}{2} \\ \tfrac{\sqrt3}{2} & \tfrac12 \end{bmatrix}, \qquad \mathbf{x} = Q^{\mathsf{T}}\mathbf{b} = \begin{bmatrix} \tfrac12 & \tfrac{\sqrt3}{2} \\ -\tfrac{\sqrt3}{2} & \tfrac12 \end{bmatrix}\begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 0.5 \\ -0.866 \end{bmatrix}.$$ The transpose of a rotation by $60°$ is the rotation by $-60°$ — and of course it is: undoing a rotation by $60°$ means rotating back by $60°$ the other way. The geometry ("rotate back") and the algebra ("transpose") are the same operation. This is why orthogonal factors are prized in numerical linear algebra: every orthogonal piece of a decomposition is free to invert and perfectly stable, which is the engine behind the QR-based algorithms of Chapter 20 and the SVD of Chapter 30.
Check Your Understanding — Is the matrix $A = \begin{psmallmatrix}\tfrac35 & -\tfrac45 \\ \tfrac45 & \tfrac35\end{psmallmatrix}$ orthogonal? If so, what is $A^{-1}$, and is it a rotation or a reflection?
Answer
Check the columns. First column $(\tfrac35, \tfrac45)$ has length $\sqrt{\tfrac{9}{25} + \tfrac{16}{25}} = \sqrt{\tfrac{25}{25}} = 1$ ✓; second column $(-\tfrac45, \tfrac35)$ likewise has length 1 ✓; their dot product is $\tfrac35\cdot(-\tfrac45) + \tfrac45\cdot\tfrac35 = -\tfrac{12}{25} + \tfrac{12}{25} = 0$ ✓. So $A$ is orthogonal. Its inverse is just the transpose, $A^{-1} = A^{\mathsf{T}} = \begin{psmallmatrix}\tfrac35 & \tfrac45 \\ -\tfrac45 & \tfrac35\end{psmallmatrix}$ — no elimination needed. And $\det(A) = \tfrac{9}{25} + \tfrac{16}{25} = 1$, so it is a rotation (by the angle $\theta$ with $\cos\theta = \tfrac35$, the famous 3-4-5 triangle's angle, about $53.13°$).
21.3 Why do orthogonal matrices preserve length? (the central proof)
This is the theorem the whole chapter rests on, and it deserves the full proof treatment from our style guide.
1. Why we care. Everything geometric about orthogonal matrices — that they are rotations and reflections, that they preserve angles and areas, that they are the transformations safe to apply to data — follows from one fact: they preserve length. If we can prove that $Q^{\mathsf{T}}Q = I$ forces $\lVert Q\mathbf{x}\rVert = \lVert\mathbf{x}\rVert$, we have connected the algebra to the geometry once and for all.
2. Key idea. Length is built from the dot product ($\lVert\mathbf{x}\rVert^2 = \mathbf{x}\cdot\mathbf{x}$), and the dot product is built from the transpose ($\mathbf{x}\cdot\mathbf{y} = \mathbf{x}^{\mathsf{T}}\mathbf{y}$). So the moment $Q^{\mathsf{T}}$ meets $Q$ inside a dot product, the condition $Q^{\mathsf{T}}Q = I$ makes them cancel.
3. Proof. Let $Q$ be orthogonal, so $Q^{\mathsf{T}}Q = I$, and let $\mathbf{x}$ be any vector in $\mathbb{R}^n$. We compute the squared length of $Q\mathbf{x}$ using the identity $\lVert\mathbf{w}\rVert^2 = \mathbf{w}^{\mathsf{T}}\mathbf{w}$ from Chapter 18: $$\lVert Q\mathbf{x}\rVert^2 = (Q\mathbf{x})^{\mathsf{T}}(Q\mathbf{x}).$$ Apply the transpose-of-a-product rule from Chapter 8, $(Q\mathbf{x})^{\mathsf{T}} = \mathbf{x}^{\mathsf{T}}Q^{\mathsf{T}}$: $$\lVert Q\mathbf{x}\rVert^2 = \mathbf{x}^{\mathsf{T}}Q^{\mathsf{T}}Q\,\mathbf{x}.$$ Now use the defining property $Q^{\mathsf{T}}Q = I$ to collapse the middle: $$\lVert Q\mathbf{x}\rVert^2 = \mathbf{x}^{\mathsf{T}} I\,\mathbf{x} = \mathbf{x}^{\mathsf{T}}\mathbf{x} = \lVert\mathbf{x}\rVert^2.$$ Lengths are non-negative, so taking the (positive) square root of both sides gives $\lVert Q\mathbf{x}\rVert = \lVert\mathbf{x}\rVert$. Since $\mathbf{x}$ was arbitrary, $Q$ preserves the length of every vector. $\blacksquare$
4. What this means. Three short lines of algebra — transpose, regroup, cancel — turned "$Q^{\mathsf{T}}Q = I$" into "$Q$ never changes a length." Notice that the proof also runs backwards: if $\lVert Q\mathbf{x}\rVert = \lVert\mathbf{x}\rVert$ for all $\mathbf{x}$, then $\mathbf{x}^{\mathsf{T}}Q^{\mathsf{T}}Q\mathbf{x} = \mathbf{x}^{\mathsf{T}}\mathbf{x}$ for all $\mathbf{x}$, which forces $Q^{\mathsf{T}}Q = I$ (a symmetric matrix that produces the same quadratic form as $I$ must equal $I$ — we make this airtight in Chapter 28). So the implication is an equivalence: orthogonal $\iff$ length-preserving. The algebraic definition and the geometric property are genuinely the same thing.
Math-Major Sidebar — The backward direction uses a small lemma: if $\mathbf{x}^{\mathsf{T}}M\mathbf{x} = 0$ for all $\mathbf{x}$ and $M$ is symmetric, then $M = 0$. Here $M = Q^{\mathsf{T}}Q - I$, which is symmetric because $(Q^{\mathsf{T}}Q)^{\mathsf{T}} = Q^{\mathsf{T}}Q$. (The symmetry hypothesis is essential: the non-symmetric matrix $\begin{psmallmatrix}0 & 1\\ -1 & 0\end{psmallmatrix}$ satisfies $\mathbf{x}^{\mathsf{T}}M\mathbf{x} = 0$ for every real $\mathbf{x}$ yet is far from zero.) The cleanest proof of the lemma uses the polarization identity to recover the full bilinear form $\mathbf{x}^{\mathsf{T}}M\mathbf{y}$ from the quadratic form $\mathbf{x}^{\mathsf{T}}M\mathbf{x}$, then sets $\mathbf{x}=\mathbf{e}_i,\mathbf{y}=\mathbf{e}_j$ to read off each entry $m_{ij}=0$. This is the same polarization trick that lets an inner product be reconstructed from its norm, foreshadowing the inner-product spaces of Chapter 34.
21.4 Why does preserving length also preserve angles?
Here is a small geometric miracle: a transformation that only promises to preserve lengths turns out, for free, to preserve angles as well. You cannot rescale the world uniformly without also distorting it, so once nothing stretches, nothing shears either. Algebraically, the reason is that the dot product — which encodes both length and angle — is itself preserved.
Claim. If $Q$ is orthogonal, then $(Q\mathbf{x})\cdot(Q\mathbf{y}) = \mathbf{x}\cdot\mathbf{y}$ for all $\mathbf{x}, \mathbf{y}$.
The proof is one line, the same cancellation as before: $$(Q\mathbf{x})\cdot(Q\mathbf{y}) = (Q\mathbf{x})^{\mathsf{T}}(Q\mathbf{y}) = \mathbf{x}^{\mathsf{T}}Q^{\mathsf{T}}Q\,\mathbf{y} = \mathbf{x}^{\mathsf{T}}I\,\mathbf{y} = \mathbf{x}^{\mathsf{T}}\mathbf{y} = \mathbf{x}\cdot\mathbf{y}.$$
Now recall the angle formula from Chapter 18: $\cos\theta = \dfrac{\mathbf{x}\cdot\mathbf{y}}{\lVert\mathbf{x}\rVert\,\lVert\mathbf{y}\rVert}$. After applying $Q$, the numerator $\mathbf{x}\cdot\mathbf{y}$ is unchanged (just shown), and both factors in the denominator are unchanged (length preservation, §21.3). So $\cos\theta$ is identical before and after — the angle between any two vectors is exactly preserved. A transformation that keeps all lengths is forced to keep all angles too. That is precisely what we mean by rigid: the shape of the whole configuration is carried along untouched.
Real-World Application — Data whitening and PCA preprocessing. In statistics and machine learning, "whitening" rotates a dataset so that its features become uncorrelated, and the rotation is performed by an orthogonal matrix built from eigenvectors (Chapter 27). Because the rotation preserves all pairwise distances and angles, it reorganizes the coordinates of the data without changing the relationships between data points — two records that were similar stay equally similar. This is why an orthogonal change of basis is the statistician's safe move: you can re-express the data in friendlier axes without corrupting it. We will build the full machinery in Chapters 27 and 32, but the guarantee comes from this chapter.
Check Your Understanding — A transformation $T$ doubles the length of every vector: $\lVert T\mathbf{x}\rVert = 2\lVert\mathbf{x}\rVert$. Is its matrix orthogonal? Does it preserve angles?
Answer
Its matrix is not orthogonal: orthogonal matrices preserve length exactly ($\lVert Q\mathbf{x}\rVert = \lVert\mathbf{x}\rVert$), and this one doubles it. The matrix is $2I$, and $(2I)^{\mathsf{T}}(2I) = 4I \ne I$. It does preserve angles, however, because $\cos\theta = \frac{(2\mathbf{x})\cdot(2\mathbf{y})}{\lVert 2\mathbf{x}\rVert\,\lVert 2\mathbf{y}\rVert} = \frac{4(\mathbf{x}\cdot\mathbf{y})}{4\lVert\mathbf{x}\rVert\lVert\mathbf{y}\rVert}$ — the factors of 4 cancel. So angle preservation is weaker than orthogonality: every orthogonal map preserves angles, but a uniform scaling preserves angles without being orthogonal. Orthogonality is the combination of angle preservation and length preservation.
21.5 Why must the determinant of an orthogonal matrix be ±1?
We have a transformation that preserves lengths, angles, and therefore areas and volumes. Chapter 11 taught us that the determinant measures exactly the factor by which a transformation scales (signed) volume. If areas and volumes are preserved, the magnitude of that scaling factor must be 1. Let us prove it crisply from the algebra and then read off the geometry.
Claim. If $Q$ is orthogonal, then $\det(Q) = \pm 1$.
Proof. Take determinants of both sides of $Q^{\mathsf{T}}Q = I$. Using two facts from Chapter 11 — that the determinant is multiplicative, $\det(AB) = \det(A)\det(B)$, and that a matrix and its transpose share a determinant, $\det(Q^{\mathsf{T}}) = \det(Q)$ — we get $$\det(Q^{\mathsf{T}}Q) = \det(Q^{\mathsf{T}})\det(Q) = \det(Q)^2, \qquad \det(I) = 1,$$ so $\det(Q)^2 = 1$, which gives $\det(Q) = +1$ or $\det(Q) = -1$. $\blacksquare$
The two signs are not a technicality — they are the two kinds of rigid motion, and the determinant is the label that tells them apart.
- $\det(Q) = +1$: a rotation. The transformation preserves orientation. A right-handed coordinate frame stays right-handed; the unit square spins to a new angle but is never flipped. These are the proper rigid motions. The set of all $n\times n$ orthogonal matrices with determinant $+1$ is called the special orthogonal group, written $\mathrm{SO}(n)$.
- $\det(Q) = -1$: a reflection. The transformation reverses orientation. A right-handed frame becomes left-handed; the unit square is flipped over, as if viewed in a mirror. (A composition like rotate-then-reflect also lands here, since it reverses orientation overall.)
Geometric Intuition — The sign of the determinant is the handedness of the transformation. Hold up your right hand and look at it in a mirror — the reflection looks like a left hand. No amount of rotating your real right hand will ever make it match its mirror image; rotation ($\det = +1$) can never undo a reflection ($\det = -1$), because $+1 \ne -1$ and a product of rotations always has determinant $+1$. This is the mathematical reason a right glove never fits a left hand: rotations and reflections live in genuinely separate camps, partitioned by the sign of the determinant.
Common Pitfall — $\lvert\det(Q)\rvert = 1$ is necessary but not sufficient for orthogonality. The shear $\begin{psmallmatrix}1 & 5\\ 0 & 1\end{psmallmatrix}$ has determinant $1$, yet it is the opposite of rigid — it badly distorts the unit square (you saw it slant in Chapter 1) and its columns are not orthonormal, so it is not orthogonal. Determinant $\pm 1$ tells you area is preserved; it does not tell you lengths and angles are preserved. Always check the actual condition $Q^{\mathsf{T}}Q = I$.
21.6 What does a rotation matrix look like in 2D?
Now the concrete machinery, starting in the plane. We derived the 2D rotation matrix back in Chapter 7 by asking where the basis vectors land: rotating by an angle $\theta$ counterclockwise sends $\mathbf{e}_1 = (1,0)$ to $(\cos\theta, \sin\theta)$ and $\mathbf{e}_2 = (0,1)$ to $(-\sin\theta, \cos\theta)$. Stacking those images as columns gives $$Q = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}.$$ Let us now verify, with this chapter's tools, that this really is an orthogonal matrix — that the rotation we drew by hand satisfies $Q^{\mathsf{T}}Q = I$.
The columns are $\mathbf{q}_1 = (\cos\theta, \sin\theta)$ and $\mathbf{q}_2 = (-\sin\theta, \cos\theta)$. Check the three dot products that make up $Q^{\mathsf{T}}Q$: $$\mathbf{q}_1\cdot\mathbf{q}_1 = \cos^2\theta + \sin^2\theta = 1, \quad \mathbf{q}_2\cdot\mathbf{q}_2 = \sin^2\theta + \cos^2\theta = 1,$$ $$\mathbf{q}_1\cdot\mathbf{q}_2 = -\cos\theta\sin\theta + \sin\theta\cos\theta = 0.$$ Both columns are unit length and they are perpendicular — orthonormal — so $Q^{\mathsf{T}}Q = I$. The Pythagorean identity $\cos^2\theta + \sin^2\theta = 1$ is exactly the statement that the columns are unit vectors; trigonometry's most famous identity is the orthonormality of the rotation matrix. And the determinant is $$\det(Q) = \cos^2\theta - (-\sin^2\theta) = \cos^2\theta + \sin^2\theta = 1,$$ confirming $\det = +1$, the signature of a rotation that preserves orientation. Every plane rotation is orthogonal, and the algebra agrees with the picture.
21.6.1 The 2D reflection matrix, and a complete classification
The rotation is the $\det = +1$ orthogonal matrix; what does the $\det = -1$ one look like? Reflecting the plane across the line through the origin that makes angle $\varphi$ with the $x$-axis has its own clean formula: $$F_\varphi = \begin{bmatrix} \cos 2\varphi & \sin 2\varphi \\ \sin 2\varphi & -\cos 2\varphi \end{bmatrix}.$$ The doubled angle $2\varphi$ is the famous fingerprint of a reflection: a vector at angle $\alpha$ is sent to angle $2\varphi - \alpha$ (its mirror image across the $\varphi$-line), and the difference between input and output angle, $2\varphi - 2\alpha$, depends on the mirror angle $\varphi$, not on $\alpha$. You can read off three quick checks: the columns are orthonormal (using $\cos^2 2\varphi + \sin^2 2\varphi = 1$), so $F_\varphi^{\mathsf{T}}F_\varphi = I$; the determinant is $-\cos^2 2\varphi - \sin^2 2\varphi = -1$; and $F_\varphi^2 = I$, because reflecting twice across the same line returns every point home. The mirror line itself is fixed — any vector pointing along it is sent to itself — which is the geometric reason a reflection always has $+1$ as an eigenvalue (the fixed direction) and $-1$ as the other (the flipped, perpendicular direction). We will name those fixed and flipped directions eigenvectors in Chapter 23; for now, notice that a reflection wears its invariant directions on its sleeve.
These two forms are not just examples — they are the whole story in 2D. Every $2\times 2$ orthogonal matrix is one of exactly these two shapes: $$\underbrace{\begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}}_{\det = +1,\ \text{rotation by }\theta} \qquad\text{or}\qquad \underbrace{\begin{bmatrix} \cos 2\varphi & \phantom{-}\sin 2\varphi \\ \sin 2\varphi & -\cos 2\varphi \end{bmatrix}}_{\det = -1,\ \text{reflection across the }\varphi\text{-line}}.$$ Here is the short argument, and it is a satisfying one. Let $Q = \begin{psmallmatrix}a & b\\ c & d\end{psmallmatrix}$ be orthogonal. Its first column $(a, c)$ is a unit vector, so it lies on the unit circle and we may write $a = \cos\theta$, $c = \sin\theta$ for some angle $\theta$. The second column $(b, d)$ is a unit vector perpendicular to the first, and in the plane there are only two unit vectors perpendicular to a given one — the two ways to turn 90°. One choice, $(b,d) = (-\sin\theta, \cos\theta)$, is the 90°-counterclockwise turn and gives the rotation (with $\det = +1$); the other, $(b,d) = (\sin\theta, -\cos\theta)$, is the clockwise turn and gives the reflection (with $\det = -1$, and matching $F_\varphi$ when $2\varphi = \theta$). There is no third option, because the perpendicularity and unit-length conditions leave exactly two solutions. So in two dimensions, "orthogonal" means precisely "rotation or reflection" — the determinant's sign is the only thing that distinguishes them, and we have now proven the classification we promised back in §21.1.1.
Geometric Intuition — A rotation has no fixed direction in the plane (spin the plane and every arrow moves — except the zero vector), while a reflection has a whole line of fixed directions: the mirror itself. That difference is exactly why a rotation's eigenvalues are complex (no real direction is preserved, §21.13) while a reflection's are the real pair $+1$ and $-1$. The determinant sees only the product of the eigenvalues: $(+1)(-1) = -1$ for the reflection, and (as we will compute) $e^{i\theta}e^{-i\theta} = 1$ for the rotation.
21.6.2 Hand computation: rotating a vector by 30°
Let us rotate the vector $\mathbf{v} = (3, 4)$ — chosen because its length is the friendly $\lVert\mathbf{v}\rVert = \sqrt{9+16} = 5$ — by $\theta = 30°$, and confirm the length survives. With $\cos 30° = \tfrac{\sqrt 3}{2} \approx 0.866$ and $\sin 30° = \tfrac12 = 0.5$, $$Q = \begin{bmatrix} 0.866 & -0.5 \\ 0.5 & 0.866 \end{bmatrix}, \qquad Q\mathbf{v} = \begin{bmatrix} 0.866\cdot 3 - 0.5\cdot 4 \\ 0.5\cdot 3 + 0.866\cdot 4 \end{bmatrix} = \begin{bmatrix} 0.598 \\ 4.964 \end{bmatrix}.$$ The image points in a new direction, as a rotation should. Its length: $$\lVert Q\mathbf{v}\rVert = \sqrt{0.598^2 + 4.964^2} = \sqrt{0.358 + 24.642} = \sqrt{25.000} = 5.$$ Exactly 5 — the length is untouched, just as the proof in §21.3 promised. The arrow swung to a different angle but kept its length to the last digit.
Let us do one more by hand, to feel the pattern. Rotate $\mathbf{w} = (1, 1)$ by $\theta = 45°$. With $\cos 45° = \sin 45° = \tfrac{1}{\sqrt 2} \approx 0.7071$, $$Q\mathbf{w} = \begin{bmatrix} 0.7071 & -0.7071 \\ 0.7071 & 0.7071 \end{bmatrix}\begin{bmatrix} 1 \\ 1 \end{bmatrix} = \begin{bmatrix} 0.7071 - 0.7071 \\ 0.7071 + 0.7071 \end{bmatrix} = \begin{bmatrix} 0 \\ 1.4142 \end{bmatrix}.$$ The vector $(1,1)$ pointed at 45°; rotating it by another 45° lands it at 90°, straight up the $y$-axis — exactly $(0, \sqrt 2)$. And its length is unchanged: $\lVert\mathbf{w}\rVert = \sqrt 2 \approx 1.4142$ before, and $\lVert Q\mathbf{w}\rVert = \sqrt{0^2 + 1.4142^2} = 1.4142$ after. Rotation moves the arrow's direction (here, from 45° to 90°) while leaving its length alone — the two halves of what a rigid spin does.
21.6.3 numpy verification
Now the computational confirmation. (Recall the indexing convention from earlier chapters: the math writes $\mathbf{v} = (v_1, v_2)$ one-indexed, while numpy's v[0], v[1] are zero-indexed — same numbers, shifted labels.)
# Rotation by 30 degrees is orthogonal, det = +1, and preserves length.
import numpy as np
theta = np.deg2rad(30)
Q = np.array([[np.cos(theta), -np.sin(theta)],
[np.sin(theta), np.cos(theta)]])
v = np.array([3.0, 4.0])
print("Q^T Q =\n", np.round(Q.T @ Q, 10)) # the orthogonality test
print("det(Q) =", round(np.linalg.det(Q), 6))
print("||v|| =", np.linalg.norm(v))
print("Q v =", np.round(Q @ v, 4))
print("||Qv|| =", round(np.linalg.norm(Q @ v), 6))
Q^T Q =
[[ 1. 0.]
[ 0. 1.]]
det(Q) = 1.0
||v|| = 5.0
Q v = [0.5981 4.9641]
||Qv|| = 5.0
The matrix passes the orthogonality test ($Q^{\mathsf{T}}Q = I$), its determinant is exactly $+1$, and the rotated vector has the same length, $5$, as the original. The hand computation and numpy agree to all displayed digits.
Computational Note — In exact arithmetic $Q^{\mathsf{T}}Q$ is precisely $I$, but with floating-point rounding you will often see entries like
1.0000000000000002or-2.2e-17off the diagonal instead of clean1s and0s. That is why a software orthogonality check must compare against the identity within a tolerance (np.allclose), never with exact equality (==). One genuinely lovely numerical fact about orthogonal matrices is that they do not amplify floating-point error: because they preserve length, they have condition number exactly 1 (Chapter 38), making them the most numerically stable transformations there are. This is the deep reason QR (Chapter 20) is preferred over the normal equations for least squares.
21.7 What does a rotation look like in 3D, and how do reflections fit in?
In three dimensions a rotation needs an axis — a line that stays fixed while everything around it spins. The simplest cases rotate about a coordinate axis. A rotation by angle $\theta$ about the $z$-axis leaves the third coordinate alone and rotates the first two exactly as in the plane: $$R_z(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta & 0 \\ \sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{bmatrix}.$$ The top-left $2\times 2$ block is precisely the plane rotation from §21.6, and the lone $1$ in the corner pins the $z$-axis in place. The same template, with the block moved to a different pair of rows and columns, rotates about the $x$- or $y$-axis. Each such matrix is orthogonal with determinant $+1$ — a member of $\mathrm{SO}(3)$, the rotation group that controls every orientation in 3D graphics, robotics, and aerospace.
Why does it take three numbers to specify a general 3D rotation, when a 2D rotation needs only one (the angle $\theta$)? Because in three dimensions you must first choose the axis — a direction in space, which costs two numbers (think latitude and longitude on a sphere) — and then choose the angle of spin about that axis, a third number. Those three numbers can be packaged as Euler angles (rotate about $z$, then $y$, then $x$), as an axis-plus-angle pair, or as a unit quaternion; all three are just coordinates on $\mathrm{SO}(3)$, the curved three-dimensional space of all 3D rotations. The matrix viewpoint we use here — a $3\times 3$ orthogonal matrix with $\det = +1$ — has nine entries, but the six orthonormality conditions ($Q^{\mathsf{T}}Q = I$ gives three "unit length" and three "perpendicular" equations) cut those nine numbers down to $9 - 6 = 3$ degrees of freedom, matching the count exactly. The bookkeeping is reassuring: the algebra and the geometry agree on how much information a rotation carries.
Let us watch $R_z(90°)$ act, by hand, on the standard basis. At $\theta = 90°$, $\cos\theta = 0$ and $\sin\theta = 1$, so $$R_z(90°) = \begin{bmatrix} 0 & -1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}, \qquad R_z(90°)\,\mathbf{e}_1 = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix} = \mathbf{e}_2, \qquad R_z(90°)\,\mathbf{e}_3 = \begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix} = \mathbf{e}_3.$$ The $x$-axis swings to the $y$-axis (a quarter-turn counterclockwise looking down from above), while the $z$-axis — the axis of rotation — stays put, fixed exactly as an axis should be. The columns of $R_z(90°)$ are still three mutually perpendicular unit vectors (just the basis vectors, shuffled and sign-flipped), so $Q^{\mathsf{T}}Q = I$ holds, and $\det = +1$. This is the recurring lesson in three dimensions: an orthogonal matrix carries an orthonormal basis to another orthonormal basis — it relabels the axes of space without bending them.
Real-World Application — Spacecraft and robot orientation. The attitude of a satellite, a drone, or a robot arm's end-effector is stored as a $3\times 3$ rotation matrix in $\mathrm{SO}(3)$, or equivalently as a unit quaternion (a four-number encoding of the same rotation). When a flight computer chains "rotate about $z$, then about $y$, then about $x$" to express a maneuver in Euler angles, it is multiplying three orthogonal matrices, and the product is again orthogonal — so the result is still a genuine, distortion-free rotation. The hard part in practice is keeping it orthogonal: numerical drift slowly corrupts a stored rotation matrix, and engineers periodically "re-orthonormalize" it (often with a Gram–Schmidt or QR pass from Chapter 20) to scrub out the error before it accumulates.
21.7.1 Reflections and the Householder matrix
Reflections are the other half of the orthogonal family. The most elegant way to build one in any dimension is the Householder reflection. Pick a unit vector $\mathbf{u}$ (the normal to the mirror), and reflect every vector across the hyperplane perpendicular to $\mathbf{u}$ using $$H = I - 2\,\mathbf{u}\mathbf{u}^{\mathsf{T}}, \qquad \lVert\mathbf{u}\rVert = 1.$$ The geometry is exactly the projection picture from Chapter 19 run twice: $\mathbf{u}\mathbf{u}^{\mathsf{T}}\mathbf{x}$ is the component of $\mathbf{x}$ along $\mathbf{u}$, and subtracting twice that component flips $\mathbf{x}$ to the far side of the mirror. Vectors lying in the mirror (perpendicular to $\mathbf{u}$) are untouched; the normal $\mathbf{u}$ itself is sent to $-\mathbf{u}$. Two facts make $H$ special: it is symmetric ($H^{\mathsf{T}} = H$) and it is its own inverse ($H^2 = I$) — reflecting twice returns you home — and together those give $H^{\mathsf{T}}H = H^2 = I$, so a Householder matrix is orthogonal with $\det(H) = -1$.
Let us see one concretely in 2D. Take $\mathbf{u} = \tfrac{1}{\sqrt 2}(1,1)$, the unit normal to the line $y = -x$. Then $$\mathbf{u}\mathbf{u}^{\mathsf{T}} = \tfrac12\begin{bmatrix}1 & 1\\ 1 & 1\end{bmatrix}, \qquad H = I - 2\,\mathbf{u}\mathbf{u}^{\mathsf{T}} = \begin{bmatrix}1 & 0\\ 0 & 1\end{bmatrix} - \begin{bmatrix}1 & 1\\ 1 & 1\end{bmatrix} = \begin{bmatrix}0 & -1\\ -1 & 0\end{bmatrix}.$$ This $H$ swaps-and-negates: it sends $(x, y)\mapsto(-y, -x)$, the reflection across the line $y=-x$. Check it: $\det(H) = (0)(0) - (-1)(-1) = -1$ (a reflection), and $H^{\mathsf{T}}H = \begin{psmallmatrix}0 & -1\\ -1 & 0\end{psmallmatrix}\begin{psmallmatrix}0 & -1\\ -1 & 0\end{psmallmatrix} = \begin{psmallmatrix}1 & 0\\ 0 & 1\end{psmallmatrix} = I$, orthogonal as promised.
# A Householder reflection H = I - 2 u u^T is orthogonal with det = -1.
import numpy as np
u = np.array([1.0, 1.0]); u = u / np.linalg.norm(u) # unit normal to the mirror
H = np.eye(2) - 2 * np.outer(u, u)
print("H =\n", np.round(H, 4))
print("det(H) =", round(np.linalg.det(H), 6))
print("H^T H =\n", np.round(H.T @ H, 10))
print("H u =", np.round(H @ u, 4), " (the normal flips to -u)")
H =
[[ 0. -1.]
[-1. 0.]]
det(H) = -1.0
H^T H =
[[ 1. -0.]
[-0. 1.]]
H u = [-0.7071 -0.7071] (the normal flips to -u)
Determinant $-1$, $H^{\mathsf{T}}H = I$, and the normal vector $\mathbf{u}$ is sent to $-\mathbf{u}$ — every prediction confirmed. Householder reflections are not just a curiosity: they are the workhorse inside the most stable algorithm for computing the QR factorization of Chapter 20, because reflecting is a numerically gentle way to zero out a column below the diagonal.
Historical Note — The reflection $H = I - 2\mathbf{u}\mathbf{u}^{\mathsf{T}}$ is named for Alston Scott Householder, who introduced it for numerical computation around 1958 [verify]. It became the backbone of stable QR factorization and remains a standard tool in numerical linear algebra libraries today. The broader study of orthogonal transformations as a group traces to the nineteenth century work of figures such as Camille Jordan and, later, the Lie-theoretic viewpoint, though the specific attributions are easy to garble [verify].
21.8 The visualizer returns: a rotation versus a reflection
It is time to bring back the recurring 2D transformation visualizer from Chapter 1 — the same tool, unchanged, that we have used to see shears, scalings, and general maps. For orthogonal matrices it tells a clean story: the unit square keeps its shape and its area no matter what. A rotation spins it; a reflection flips it. The determinant in the title says which, and it is always $\pm 1$.
# toolkit/visualizer.py — the recurring 2D transformation visualizer.
# Shows what a 2x2 matrix A does to the unit square and the basis vectors.
import numpy as np
import matplotlib.pyplot as plt
def visualize_2d(A, title="", ax=None):
"""Plot the action of 2x2 matrix A on the unit square and i-hat, j-hat."""
A = np.asarray(A, dtype=float)
square = np.array([[0, 1, 1, 0, 0],
[0, 0, 1, 1, 0]]) # unit-square corners (closed)
out = A @ square # transformed square
e1, e2 = A @ np.array([1, 0]), A @ np.array([0, 1]) # images of basis vectors
if ax is None:
_, ax = plt.subplots(figsize=(5, 5))
ax.plot(square[0], square[1], "b--", lw=1, label="input (unit square)")
ax.fill(out[0], out[1], alpha=0.25, color="C1")
ax.plot(out[0], out[1], "C1-", lw=2, label="A · (unit square)")
ax.arrow(0, 0, *e1, color="C3", width=0.02, length_includes_head=True) # A e1
ax.arrow(0, 0, *e2, color="C2", width=0.02, length_includes_head=True) # A e2
ax.axhline(0, color="gray", lw=0.5)
ax.axvline(0, color="gray", lw=0.5)
ax.set_aspect("equal")
ax.grid(True, alpha=0.3)
ax.set_title(title or f"det = {np.linalg.det(A):.2f}")
ax.legend(loc="best", fontsize=8)
return ax
# Example: a horizontal shear
# visualize_2d([[1, 1], [0, 1]], title="Shear")
# plt.show()
And now the experiment that makes this chapter visible. We place a rotation and a reflection side by side:
# Rotation (det = +1) preserves shape AND orientation; reflection (det = -1) flips it.
import numpy as np, matplotlib.pyplot as plt
from visualizer import visualize_2d
theta = np.deg2rad(30)
Rot = np.array([[np.cos(theta), -np.sin(theta)],
[np.sin(theta), np.cos(theta)]]) # det = +1
Ref = np.array([[np.cos(2*theta), np.sin(2*theta)],
[np.sin(2*theta), -np.cos(2*theta)]]) # reflect across 30°-line, det = -1
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
visualize_2d(Rot, title=f"Rotation 30° (det = {np.linalg.det(Rot):.0f})", ax=ax1)
visualize_2d(Ref, title=f"Reflection (det = {np.linalg.det(Ref):.0f})", ax=ax2)
plt.tight_layout(); plt.show()
Figure 21.1. Two orthogonal transformations of the unit square. On the left, a 30° rotation: the dashed input square (area 1) is carried to the solid orange square (still area 1), spun counterclockwise; the basis-vector arrows $\mathbf{e}_1$ (red) and $\mathbf{e}_2$ (green) keep their right-angle and unit lengths, and they retain their counterclockwise (red-then-green) order — orientation is preserved, $\det = +1$. On the right, a reflection across the 30° line: the square is the same shape and area, but it has been flipped, and the order of the arrows has reversed — orientation is reversed, $\det = -1$. Alt-text: side-by-side plots; left shows a unit square rotated 30° counterclockwise with two perpendicular arrows; right shows the same square mirror-flipped across a line, with the arrows' rotational order swapped.
The picture makes the algebra unforgettable. In both panels the orange square is congruent to the dashed one — same side lengths, same right angles, same area 1 — because both matrices are orthogonal and orthogonal means rigid. The only difference is handedness: the rotation keeps the arrows in their original counterclockwise order, while the reflection swaps it. That single visual difference is the entire content of $\det = +1$ versus $\det = -1$.
Geometric Intuition — Watch the area in the visualizer. For any orthogonal matrix the orange region has area exactly 1, because $\lvert\det\rvert = 1$ means the area-scaling factor is 1 (Chapter 11). Contrast this with the shear and scaling experiments from Chapters 1, 8, and 11, where the orange region grew, shrank, or slanted. Orthogonal transformations are the ones that move the square around the plane without ever resizing or distorting it — the rigid motions made visible.
21.9 Why is a product of orthogonal matrices orthogonal? (the group structure)
Suppose you rotate the plane and then reflect it, or chain three rotations in a row. Is the combined transformation still rigid? Geometrically it has to be — a sequence of distance-preserving moves cannot suddenly start changing distances. Let us confirm it algebraically, lightly, because the structure it reveals is important.
Claim. If $Q_1$ and $Q_2$ are orthogonal $n\times n$ matrices, then so is their product $Q_1 Q_2$.
Proof. Test the product against the definition, using the reverse-order transpose rule $(AB)^{\mathsf{T}} = B^{\mathsf{T}}A^{\mathsf{T}}$ from Chapter 8: $$(Q_1 Q_2)^{\mathsf{T}}(Q_1 Q_2) = Q_2^{\mathsf{T}}Q_1^{\mathsf{T}}Q_1 Q_2 = Q_2^{\mathsf{T}}(Q_1^{\mathsf{T}}Q_1)Q_2 = Q_2^{\mathsf{T}}\,I\,Q_2 = Q_2^{\mathsf{T}}Q_2 = I.$$ So $Q_1 Q_2$ satisfies $Q^{\mathsf{T}}Q = I$ and is orthogonal. $\blacksquare$
Three short facts complete the picture, and together they say the orthogonal matrices form a group under multiplication:
- Closure. A product of orthogonal matrices is orthogonal (just proved).
- Identity. The identity $I$ is orthogonal, since $I^{\mathsf{T}}I = I$.
- Inverses. If $Q$ is orthogonal, so is $Q^{-1} = Q^{\mathsf{T}}$, because $(Q^{\mathsf{T}})^{\mathsf{T}}Q^{\mathsf{T}} = Q Q^{\mathsf{T}} = I$. Undoing a rigid motion is itself a rigid motion.
This group is called $\mathrm{O}(n)$, the orthogonal group. The rotations alone — the determinant-$+1$ members — form the subgroup $\mathrm{SO}(n)$, the special orthogonal group. Note that determinants multiply: a rotation times a rotation has determinant $(+1)(+1) = +1$ (still a rotation), a rotation times a reflection has $(+1)(-1) = -1$ (a reflection), and a reflection times a reflection has $(-1)(-1) = +1$ — two flips make a rotation, which you can confirm in front of any mirror.
Math-Major Sidebar — $\mathrm{SO}(n)$ is more than an algebraic group; it is a smooth manifold — a Lie group — and rotations can be generated continuously from the identity. Differentiating a path of rotations at $t=0$ produces a skew-symmetric matrix ($A^{\mathsf{T}} = -A$), which is why the matrix exponential $e^{At}$ of a skew-symmetric $A$ is always a rotation. This is the bridge from this chapter to Chapter 37 (the matrix exponential and systems of ODEs) and to the angular-velocity formulas of classical mechanics. The reflections, by contrast, are disconnected from the identity: you cannot reach $\det = -1$ from $\det = +1$ along a continuous path of orthogonal matrices, because the determinant cannot jump from $+1$ to $-1$ without passing through forbidden values. $\mathrm{O}(n)$ has two pieces; $\mathrm{SO}(n)$ is the piece containing $I$.
# Composition of two rotations is again a rotation: R(30°)·R(45°) = R(75°).
import numpy as np
def R(deg):
a = np.deg2rad(deg)
return np.array([[np.cos(a), -np.sin(a)], [np.sin(a), np.cos(a)]])
prod = R(30) @ R(45)
print("R(30)·R(45) =\n", np.round(prod, 4))
print("R(75) =\n", np.round(R(75), 4))
print("det(product) =", round(np.linalg.det(prod), 6))
print("orthogonal? ", np.allclose(prod.T @ prod, np.eye(2)))
R(30)·R(45) =
[[ 0.2588 -0.9659]
[ 0.9659 0.2588]]
R(75) =
[[ 0.2588 -0.9659]
[ 0.9659 0.2588]]
det(product) = 1.0
orthogonal? True
Composing a 30° and a 45° rotation produces exactly the 75° rotation — angles simply add — and the product is orthogonal with determinant $+1$, confirming closure inside $\mathrm{SO}(2)$.
Check Your Understanding — You reflect the plane across the $x$-axis, then reflect across the $y$-axis. What single transformation results, and what is its determinant?
Answer
Reflecting across the $x$-axis is $\begin{psmallmatrix}1 & 0\\ 0 & -1\end{psmallmatrix}$ (det $-1$); across the $y$-axis is $\begin{psmallmatrix}-1 & 0\\ 0 & 1\end{psmallmatrix}$ (det $-1$). Their product is $\begin{psmallmatrix}-1 & 0\\ 0 & -1\end{psmallmatrix} = -I$, which is rotation by 180°, with determinant $(-1)(-1) = +1$. Two reflections compose to a rotation — exactly the "two flips make a rotation" rule, and the angle of rotation is twice the angle between the two mirror lines (here the axes are 90° apart, giving a 180° rotation).
21.10 What is the complex analogue? Unitary matrices and the qubit
Everything so far has lived in real space $\mathbb{R}^n$, where the transpose $Q^{\mathsf{T}}$ is the right tool and orthogonal matrices are the rigid motions. But quantum mechanics — and signal processing, and large parts of pure mathematics — lives in complex space $\mathbb{C}^n$, where vectors have complex entries. There, the correct notion of length uses the conjugate transpose (also called the Hermitian transpose or adjoint), written $A^{*}$: you transpose and take the complex conjugate of every entry. The reason is that the squared length of a complex vector must be a non-negative real number, and only conjugation delivers that: $\lVert\mathbf{z}\rVert^2 = \mathbf{z}^{*}\mathbf{z} = \sum_i \lvert z_i\rvert^2 = \sum_i \bar z_i z_i \ge 0$. (If you used a plain transpose you could get a negative or complex "length," which is nonsense.)
With the conjugate transpose in hand, the complex analogue of an orthogonal matrix is immediate.
Definition (unitary matrix). A complex square matrix $U \in \mathbb{C}^{n\times n}$ is unitary if $$U^{*}U = I \quad\Longleftrightarrow\quad U^{-1} = U^{*}.$$ Its columns are orthonormal with respect to the complex inner product $\langle\mathbf{u},\mathbf{v}\rangle = \mathbf{u}^{*}\mathbf{v}$.
Every theorem of this chapter carries over verbatim with $\mathsf{T}\to{*}$. The same three-line proof as §21.3 shows a unitary matrix preserves complex length: $\lVert U\mathbf{z}\rVert^2 = (U\mathbf{z})^{*}(U\mathbf{z}) = \mathbf{z}^{*}U^{*}U\mathbf{z} = \mathbf{z}^{*}\mathbf{z} = \lVert\mathbf{z}\rVert^2$. Taking the determinant of $U^{*}U = I$ and using $\det(U^{*}) = \overline{\det(U)}$ gives $\lvert\det(U)\rvert^2 = 1$, so $\det(U)$ is a complex number of modulus 1 (any point on the unit circle, like $1$, $-1$, or $i$) — a broader set than the real $\{+1, -1\}$, because complex space has more room to rotate. And every real orthogonal matrix is automatically unitary (conjugation does nothing to a real number), so orthogonal matrices are exactly the real unitary matrices.
Warning
— State the field, every time. For a real matrix the condition is $Q^{\mathsf{T}}Q = I$ (orthogonal); for a complex matrix it is $U^{*}U = I$ (unitary). Writing $U^{\mathsf{T}}U = I$ for a genuinely complex matrix is a real error: it would demand $\sum_i z_i^2 = 1$ down each column instead of $\sum_i \lvert z_i\rvert^2 = 1$, which is not length preservation at all. The conjugate is not decorative — it is what makes "length" mean length over $\mathbb{C}$.
It helps to see why the conjugate is the right move geometrically. A single complex number $z = re^{i\phi}$ carries a magnitude $r$ and a phase $\phi$. When we form $\bar z z = (re^{-i\phi})(re^{i\phi}) = r^2$, the phases cancel and we are left with the squared magnitude — a real, non-negative number, exactly what a squared length should be. Without the conjugate, $z^2 = r^2 e^{2i\phi}$ would still carry a phase and could even be negative or imaginary, which is useless as a length. So the conjugate transpose is precisely the bookkeeping that strips away phase when measuring size, while preserving phase information inside the off-diagonal inner products where it matters. This phase, invisible to the magnitude, is exactly the resource that quantum algorithms exploit through interference — and it is why the complex inner product, not the real one, is the right geometry for a qubit. Unitary matrices are the maps that respect this phase-aware geometry, rotating the complex vector while preserving every magnitude.
Now the anchor. Back in Chapter 1 we teased the qubit — the quantum bit — as a two-component complex vector $\begin{psmallmatrix}\alpha\\\beta\end{psmallmatrix} \in \mathbb{C}^2$ whose entries are amplitudes, with $\lvert\alpha\rvert^2 + \lvert\beta\rvert^2 = 1$ so that the probabilities of measuring $0$ or $1$ sum to 1. A quantum logic gate is an operation on a qubit, and here is the punchline this chapter has been building toward: a quantum gate must be a unitary matrix. The reason is exactly length preservation. A gate transforms the state $\mathbf{z}\mapsto U\mathbf{z}$, and total probability — the squared length $\lVert\mathbf{z}\rVert^2$ — must remain 1, or the laws of probability break. The only linear maps that preserve complex length are the unitary ones, so quantum gates are precisely the unitary matrices. The deepest principle of quantum computation is a corollary of the theorem we proved in §21.3.
Three gates make this concrete. The Pauli-X gate $X = \begin{psmallmatrix}0 & 1\\ 1 & 0\end{psmallmatrix}$ is the quantum NOT; it has real entries, so it is both orthogonal and unitary, with $\det = -1$ (it is a reflection). The Hadamard gate $H = \tfrac{1}{\sqrt2}\begin{psmallmatrix}1 & 1\\ 1 & -1\end{psmallmatrix}$ creates "superpositions" and is also real-orthogonal (it is a reflection, $\det = -1$). The phase gate $S = \begin{psmallmatrix}1 & 0\\ 0 & i\end{psmallmatrix}$ is genuinely complex — it satisfies $S^{*}S = I$ but not $S^{\mathsf{T}}S = I$ — with $\det(S) = i$, a determinant of modulus 1 that no real orthogonal matrix could have. The phase gate is the cleanest possible example of "unitary but not orthogonal."
# Quantum gates are unitary (U* U = I). H is real-orthogonal; S is genuinely complex.
import numpy as np
H = (1/np.sqrt(2)) * np.array([[1, 1], [1, -1]], dtype=complex) # Hadamard
S = np.array([[1, 0], [0, 1j]], dtype=complex) # phase gate
for name, U in [("Hadamard H", H), ("phase S", S)]:
print(f"{name}: U* U =\n", np.round(U.conj().T @ U, 10).real)
print(f" det(U) = {np.round(np.linalg.det(U), 4)}, |det| = {round(abs(np.linalg.det(U)),6)}")
z = np.array([1, 1j], dtype=complex) # an unnormalized state
print("||z|| =", round(np.linalg.norm(z), 6))
print("||S z|| =", round(np.linalg.norm(S @ z), 6)) # length preserved
Hadamard H: U* U =
[[ 1. -0.]
[-0. 1.]]
det(U) = (-1+0j), |det| = 1.0
phase S: U* U =
[[1. 0.]
[0. 1.]]
det(U) = 1j, |det| = 1.0
||z|| = 1.414214
||S z|| = 1.414214
Both gates satisfy $U^{*}U = I$; the Hadamard's determinant is real ($-1$, since it is also orthogonal) while the phase gate's determinant is $i$ (modulus 1, but not real, since it is unitary-but-not-orthogonal); and applying $S$ leaves the state's length unchanged at $\sqrt 2$ — total probability is conserved, exactly as quantum mechanics requires. We will develop the qubit and Hermitian observables much further in Chapter 27 (the spectral theorem) and Chapter 34 (Hilbert spaces); this is the moment the anchor snaps into focus.
Real-World Application — Quantum computing. Every gate in a quantum circuit — the building block of quantum algorithms like Shor's and Grover's — is a unitary matrix, and a whole circuit is a product of unitary matrices, hence unitary itself (the group structure of §21.9, now over $\mathbb{C}$). Reversibility is built in for free: because $U^{-1} = U^{*}$, every quantum gate is undoable, which is why quantum computation is fundamentally reversible in a way ordinary logic gates are not. The same orthonormal-columns idea that keeps the unit square rigid keeps a quantum state's probabilities adding to 1.
You can follow these connections further into the physics in the companion volume's treatment of unitary operators in quantum mechanics, where the same $U^{*}U=I$ condition governs the time-evolution of quantum states. And the real-rotation half of this chapter is the daily workhorse behind rotations in games, where orthogonal matrices and their quaternion cousins orient every object on screen without distorting it.
21.11 Where do orthogonal and unitary matrices show up in signal processing?
Lest you think unitary matrices belong only to quantum physics, here is a thoroughly down-to-earth example: the discrete Fourier transform (DFT), the engine behind audio analysis, image compression (JPEG), and every spectrum display you have ever watched dance to music. The DFT takes a length-$N$ signal $\mathbf{x}$ — a list of $N$ samples — and re-expresses it in terms of frequencies. In its normalized form, the DFT is multiplication by an $N\times N$ matrix $F$ whose entries are evenly spaced points on the unit circle in the complex plane: $$F_{jk} = \frac{1}{\sqrt N}\,e^{-2\pi i\,jk/N}, \qquad j, k = 0, 1, \dots, N-1.$$ And here is the beautiful fact: this matrix is unitary, $F^{*}F = I$. (We will develop the continuous cousin of this idea — Fourier series — in the very next chapter, Chapter 22, where the sines and cosines turn out to be an orthogonal basis for functions.) The columns of $F$ are the orthonormal "frequency vectors," and decomposing a signal into frequencies is nothing but a change to an orthonormal basis — a rotation in complex $N$-dimensional space.
Because $F$ is unitary, it preserves length, and that single fact is a celebrated theorem of signal processing under a different name: Parseval's theorem, $\lVert\mathbf{x}\rVert = \lVert F\mathbf{x}\rVert$, which says the energy of a signal equals the energy of its spectrum. No energy is created or destroyed by looking at a signal in the frequency domain — you are just viewing the same vector from rotated axes. And unitarity hands you the inverse transform for free: since $F^{-1} = F^{*}$, reconstructing the signal from its spectrum is just multiplication by the conjugate-transpose, the inverse DFT. The forward and inverse transforms are a rigid motion and its undo.
# The normalized DFT matrix is unitary (F* F = I) and preserves energy (Parseval).
import numpy as np
N = 4
k = np.arange(N)
F = np.exp(-2j * np.pi * np.outer(k, k) / N) / np.sqrt(N) # normalized DFT
print("F* F =\n", np.round((F.conj().T @ F).real, 6)) # the identity
print("|det(F)| =", round(abs(np.linalg.det(F)), 6)) # modulus 1 (unitary)
x = np.array([1.0, 2.0, 0.0, -1.0]) # a 4-sample signal
print("||x|| =", round(np.linalg.norm(x), 6)) # signal energy
print("||Fx|| =", round(np.linalg.norm(F @ x), 6)) # spectrum energy: equal
F* F =
[[ 1. -0. 0. 0.]
[-0. 1. -0. 0.]
[ 0. -0. 1. -0.]
[ 0. 0. -0. 1.]]
|det(F)| = 1.0
||x|| = 2.44949
||Fx|| = 2.44949
The DFT matrix passes the unitary test exactly, and the signal's energy ($\approx 2.449$) is identical before and after transforming — Parseval's theorem, which is just §21.10's length-preservation theorem wearing a signal-processing hat. This is a textbook example of the recurring theme that the same mathematics solves problems across every field: the orthogonality that keeps a robot's gripper rigid and a qubit's probabilities summing to 1 is the very same orthogonality that conserves signal energy in your phone's audio codec.
Real-World Application — Audio and image compression. JPEG and MP3 don't use the raw DFT but close relatives (the discrete cosine transform and the MDCT), which are likewise built from orthonormal basis vectors — real orthogonal transforms. Compression works precisely because the transform is orthogonal: it repackages the signal into frequency coefficients without losing any information (length is preserved), and only then does the codec throw away the small, perceptually unimportant coefficients. The orthogonal transform is the reversible, lossless first step; the lossy compression happens afterward, in the rotated coordinates where it is safe to discard the least significant directions. We will see the same "rotate into good coordinates, then truncate" strategy power image compression with the SVD in Chapter 31 and dimensionality reduction with PCA in Chapter 32.
21.12 How do you check orthogonality, and build a rotation, in code?
We close with the chapter's contribution to the from-scratch toolkit you have been assembling since Chapter 2. Two small functions capture this chapter operationally: a test that decides whether a given matrix is orthogonal, and a builder that manufactures a rotation matrix. Both rest entirely on the ideas above — the test is just "$Q^{\mathsf{T}}Q = I$ within tolerance," and the builder is just the cosine-sine pattern from §21.6.
A subtlety worth stating plainly: the test must use a tolerance, not exact equality, because of the floating-point reality from §21.6.3. In exact arithmetic an orthogonal matrix gives $Q^{\mathsf{T}}Q = I$ on the nose, but the moment you build $Q$ from np.cos/np.sin, rounding sprinkles in errors around $10^{-16}$. A good is_orthogonal reports True when $Q^{\mathsf{T}}Q$ is close enough to $I$, with the closeness threshold a tunable argument.
Build Your Toolkit — Implement two functions in
toolkit/orthogonal.py, in pure Python (no numpy inside the implementations — numpy only to check): 1.is_orthogonal(Q, tol=1e-9)— form the matrix product $Q^{\mathsf{T}}Q$ by hand (triple loop or list comprehensions), then returnTrueif every entry is withintolof the corresponding identity entry (1 on the diagonal, 0 off it). Equivalently, check that each pair of columns has the right dot product: $\mathbf{q}_i\cdot\mathbf{q}_j \approx \delta_{ij}$. 2.rotation_2d(theta)— return the $2\times 2$ matrix $\begin{psmallmatrix}\cos\theta & -\sin\theta\\ \sin\theta & \cos\theta\end{psmallmatrix}$ as a list of lists.Then verify length preservation numerically: generate several random vectors, multiply each by
rotation_2d(theta), and confirm $\lVert Q\mathbf{v}\rVert = \lVert\mathbf{v}\rVert$ to tolerance. Cross-checkis_orthogonalagainstnp.allclose(Q.T @ Q, np.eye(n))on a rotation (should beTrue), a Householder reflection (True), and a shear (False). This module joinsgram_schmidt.pyfrom Chapter 20 and will be reused when we re-orthonormalize eigenvector matrices in Chapter 27 and assemble the SVD's $U$ and $V$ in Chapter 30.
Here is the kind of check your finished functions should pass — written with numpy here so you can confirm the expected behavior before coding the from-scratch version:
# Expected behavior of is_orthogonal(Q, tol) — verify against numpy's allclose.
import numpy as np
def is_orthogonal(Q, tol=1e-9):
Q = np.asarray(Q, dtype=float)
return np.allclose(Q.T @ Q, np.eye(Q.shape[1]), atol=tol)
theta = np.deg2rad(30)
rot = [[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]]
print("rotation ->", is_orthogonal(rot)) # True
print("reflection->", is_orthogonal([[0, 1], [1, 0]])) # True
print("shear ->", is_orthogonal([[1, 1], [0, 1]])) # False (not rigid)
# Length preservation on random vectors:
Q = np.array(rot)
vs = np.random.default_rng(0).standard_normal((4, 2))
print("lengths preserved:",
all(np.isclose(np.linalg.norm(Q @ v), np.linalg.norm(v)) for v in vs))
rotation -> True
reflection-> True
shear -> False (not rigid)
lengths preserved: True
The rotation and reflection pass; the shear fails, because it distorts. And the rotation preserves the length of every random vector thrown at it, the computational shadow of the theorem in §21.3.
The Key Insight — Orthogonal matrices are the easiest matrices in linear algebra to work with, and that is no accident: their inverse is free (just transpose — no Gaussian elimination, no determinant), they are perfectly stable numerically (condition number 1), and they preserve everything geometric. Whenever a decomposition can be built out of orthogonal pieces — QR in Chapter 20, the spectral theorem in Chapter 27, the SVD in Chapter 30 — it will be, precisely because orthogonal matrices never cost anything to undo and never distort what they touch.
21.13 What have we built, and where does it lead?
We began with a book turning in your hand and ended with a quantum gate, and the same six-word idea ran through both: rigid motions preserve distance, algebraically $Q^{\mathsf{T}}Q = I$. From that single condition we proved length preservation, then angle preservation, then $\det = \pm 1$ splitting the orthogonal matrices cleanly into rotations ($+1$) and reflections ($-1$). We wrote those rotations down explicitly in 2D and 3D, built reflections with the Householder formula, watched the visualizer spin and flip the unit square without ever resizing it, and saw that the whole family forms a group — closed under composition, with transposes for inverses. Finally we crossed into complex space, where the conjugate transpose replaces the transpose, orthogonal becomes unitary, and the qubit's quantum gates revealed themselves as exactly the length-preserving maps.
This chapter is a clean illustration of the book's deepest theme — that geometry and algebra are two views of one object. "Preserves distance" is a statement about pictures: squares stay congruent, arrows keep their lengths, angles stay fixed. "$Q^{\mathsf{T}}Q = I$" is a statement about symbols: columns dot to the identity. We proved they are the same statement, and that equivalence is the threshold idea to carry forward — once you see that a single tidy equation encodes an entire family of rigid motions, orthogonal matrices stop being a definition to memorize and become a geometric fact you can picture. Keep the vocabulary close, too: orthogonal and unitary matrices, orthonormal columns, isometry, rotation versus reflection sorted by the determinant's sign, the special orthogonal group $\mathrm{SO}(n)$, the Householder reflection, and the conjugate transpose $U^{*}$ — these are the terms the rest of Part IV and all of Parts V and VI will lean on.
The forward references write themselves. The rotations whose determinant is $+1$ have complex eigenvalues on the unit circle — a fact we glimpse in the numpy below and develop fully in Chapter 26, where complex eigenvalues turn out to be rotations in disguise. The orthogonal matrices return as the stars of two of the book's most important theorems: the spectral theorem of Chapter 27, where every symmetric matrix factors as $A = QDQ^{\mathsf{T}}$ with an orthogonal $Q$ of eigenvectors, and the singular value decomposition of Chapter 30, where every matrix factors as $A = U\Sigma V^{\mathsf{T}}$ with orthogonal $U$ and $V$ — rotate, stretch, rotate. Orthogonality is not a side topic; it is the scaffolding on which the back half of this book is built.
# A rotation's eigenvalues lie on the unit circle (complex) — a preview of Chapter 26.
import numpy as np
theta = np.deg2rad(30)
Q = np.array([[np.cos(theta), -np.sin(theta)], [np.sin(theta), np.cos(theta)]])
ev = np.linalg.eigvals(Q)
print("eigenvalues:", np.round(ev, 4))
print("their moduli:", np.round(np.abs(ev), 4))
eigenvalues: [0.866+0.5j 0.866-0.5j]
their moduli: [1. 1.]
The eigenvalues are $\cos 30° \pm i\sin 30° = e^{\pm i\,30°}$, complex numbers of modulus exactly 1 — a rotation has no real invariant direction in the plane (nothing stays pointing the same way under a true rotation), and the unit-modulus eigenvalues are the algebraic fingerprint of that fact. Hold onto this picture; in Chapter 23 we ask what eigenvalues mean, and a rotation will be our most instructive example of all.