Chapter 12 — Key Takeaways

The one idea

Homogeneous coordinates turn the rendering of a 3D world into a chain of matrix multiplications. Append a $1$ to every point, and the one motion a matrix supposedly cannot do — translation — becomes a matrix multiply; rotation, scaling, and translation then unify into a single matrix type that composes by ordinary multiplication, and even perspective falls out as a matrix plus a division. Every frame you have ever seen is a vertex flowing through model → world → camera → projection → screen, each arrow a matrix.

The big ideas, in order

  1. Translation is not linear, because it moves the origin ($T(\mathbf 0) = \mathbf t \neq \mathbf 0$). No $n\times n$ matrix can translate; translation is affine (linear-plus-shift).
  2. Homogeneous coordinates fix this. Lift $(x,y)$ to $(x,y,1)$ and use $3\times 3$ matrices (in 3D: lift to $(x,y,z,1)$ and use $4\times 4$). Translation becomes a shear in the higher dimension, living in the last column. This pays off the promise from Chapters 1 and 3.
  3. Rotation and scaling lift trivially: drop the Chapter 7 $2\times 2$ (or 3D $3\times 3$) into the top-left block and pad with the identity's last row/column. In 3D there are three coordinate-axis rotations $R_x, R_y, R_z$, each fixing its axis and rotating the other two coordinates.
  4. One model matrix $M = TRS$ encodes size, orientation, and position; multiply each vertex by it to place the object. Order matters ($TR \neq RT$) — Chapter 8's non-commutativity, now with screen consequences. Row-vs-column conventions (column/left-multiply vs. row/right-multiply) are transposes of each other and must never be mixed.
  5. Projection flattens 3D to 2D. Orthographic: drop a coordinate; parallel lines stay parallel; measurements preserved (CAD). Perspective: divide by depth ($x/z, y/z$); far things shrink; parallel lines converge (games, vision). Perspective is a projection matrix (bottom row copies $z$ into $w$) followed by the perspective divide (divide by $w$).
  6. The rendering pipeline is the composition $V_{port}\, P\, V\, M$ plus the perspective divide: model → world → camera → projection → screen. The view matrix $V$ is the inverse of the camera's placement (Chapter 9). The object never moves; the coordinate systems do.

The transformation toolkit (homogeneous form)

Transform 2D ($3\times 3$) 3D ($4\times 4$) Fixes origin? In a matrix because…
Scaling $\begin{bmatrix}s_x&0&0\\0&s_y&0\\0&0&1\end{bmatrix}$ diag$(s_x,s_y,s_z,1)$ yes linear (Chapter 7)
Rotation $\begin{bmatrix}\cos\theta&-\sin\theta&0\\\sin\theta&\cos\theta&0\\0&0&1\end{bmatrix}$ $R_x, R_y, R_z$ yes linear (Chapter 7)
Translation $\begin{bmatrix}1&0&t_x\\0&1&t_y\\0&0&1\end{bmatrix}$ fourth column $(t_x,t_y,t_z)$ no homogeneous lift (this chapter)
Orthographic drop a coordinate drop $z$ singular projection (Chapter 7/11)
Perspective $w \leftarrow z$, then divide matrix + perspective divide

Skills you gained

  • Explain why translation cannot be a plain matrix, and apply the homogeneous-coordinates fix.
  • Build 2D ($3\times 3$) and 3D ($4\times 4$) translation, rotation, and scaling matrices.
  • Compose them into one model matrix in the correct order, and predict how reordering changes the result.
  • Project a 3D scene with both orthographic and perspective projection, and perform the perspective divide.
  • Trace a point through the full pipeline (model → world → camera → projection → screen) and render a wireframe with matplotlib.
  • Recognize the row-vs-column convention pitfall and the view-matrix-as-inverse subtlety.

Terms to know

homogeneous coordinates, affine transformation, translation matrix, model matrix, world space, camera (view) space, view matrix, clip space / normalized device coordinates, orthographic projection, perspective projection, perspective divide, rendering pipeline, vertex, wireframe, viewport transform, point at infinity, scene graph, gimbal lock.

How this connects to the recurring themes

  • Theme 1 (transformations are the point). The pipeline is nothing but a composition of transformations; the object's points never change, only the matrices acting on them. This is Part II's thesis at full scale.
  • Theme 2 (geometry = algebra). "Place the cube, tilt it, view it from here, flatten it" and "multiply by $PVM$ then divide" are the same act, stated geometrically and algebraically.
  • Theme 3 (computation validates theory). Every matrix produced numbers we confirmed in numpy; your toolkit now seeds a real renderer (render3d.py).
  • Theme 4 (most applied branch of pure math). The identical matrices serve games, film, CAD, and AR — and the choice of projection encodes the image's purpose (realism vs. measurement).

Toolkit contribution

toolkit/capstone/render3d.pytranslation(tx,ty), rotation(theta), scaling(sx,sy) returning $3\times 3$ homogeneous matrices (pure Python), composed with Chapter 8's matmul into a model matrix and verified against numpy. This seeds the 3D-render option for the Chapter 39 capstone, which extends them to the full 3D $4\times 4$ set plus perspective.

Forward references

  • Chapter 13–14 — Column space and null space: the orthographic projection that flattened depth is exactly a map with a nontrivial null space; "what a transformation reaches and destroys" is the next question.
  • Chapter 16 — Change of basis: the "translate to origin, rotate, translate back" sandwich (§12.6) and the camera's coordinate re-expression generalize to similarity transforms.
  • Chapter 21 — Orthogonal matrices and rotations: the rotation matrices here, generalized and characterized as the distance-preserving maps.
  • Chapter 17, 19, 32 — Projection returns as least-squares regression and PCA: collapsing onto a lower-dimensional subspace, the same operation that flattened our cube.
  • Chapter 38 — Numerical linear algebra: the floating-point depth-buffer artifacts (z-fighting) are a conditioning problem, the same care applied to graphics.
  • Chapter 39 — Capstone: the 3D-render option assembles render3d.py into a rotating wireframe renderer end to end.