Case Study 1 — How a Game Renders a Single Frame

DataField.Dev

Case Study 1 — How a Game Renders a Single Frame

Field: real-time computer graphics / game development. This is the chapter's anchor — the 3D rendering pipeline — followed end to end for one object in one frame. Connects to 3D math for games.

The problem

Sixty times a second, your game engine must answer one question for every visible object: given where this thing is in the world, where the camera is, and what lens we're using, which pixels does it cover? Let's follow a single object — a spinning barrel sitting on a dungeon floor — through one frame, watching its vertices flow through the four coordinate spaces of §12.8. Every step is a matrix multiply from this chapter. By the end you will have rendered a frame the way an engine does, on paper.

A barrel, like every model, is authored once in its own local coordinates, centered at its own origin, in whatever units the artist chose. Our barrel is, for simplicity, a unit cube: eight corner vertices, the body of the barrel. The artist never worried about where in the dungeon the barrel sits or how big it is relative to the hero — those are the job of the model matrix, applied at render time. This separation is the whole reason the same barrel mesh can appear a hundred times across a level at a hundred sizes and angles: one mesh, many model matrices.

Step 1 — Model space to world space

The level designer places this barrel at world position $(4, 0, 2)$, scaled to $1.5\times$ its authored size, and slowly spinning — this frame it has rotated $30°$ about the vertical ($y$) axis. The model matrix composes those three, in the order §12.5 demands (scale, then rotate, then translate):

$$M = T(4, 0, 2)\, R_y(30°)\, S(1.5, 1.5, 1.5).$$

# The barrel's model matrix: scale 1.5, spin 30 deg about y, place at (4,0,2).
import numpy as np
def T3(tx,ty,tz):
    M = np.eye(4); M[0,3],M[1,3],M[2,3] = tx,ty,tz; return M
def Ry(t): c,s=np.cos(t),np.sin(t); return np.array([[c,0,s,0],[0,1,0,0],[-s,0,c,0],[0,0,0,1]],float)
def S3(sx,sy,sz): return np.diag([sx,sy,sz,1.0])

M = T3(4,0,2) @ Ry(np.radians(30)) @ S3(1.5,1.5,1.5)
print(np.round(M, 3))
print("barrel origin -> world:", (M @ np.array([0,0,0,1.0])))

[[ 1.299  0.     0.75   4.   ]
 [ 0.     1.5    0.     0.   ]
 [-0.75   0.     1.299  2.   ]
 [ 0.     0.     0.     1.   ]]
barrel origin -> world: [4. 0. 2. 1.]

Read this matrix the way Chapter 7 taught: the top-left $3\times 3$ block is the rotation-times-scale (its columns are the scaled, rotated basis directions), and the fourth column $(4, 0, 2)$ is the world position. The barrel's own origin maps cleanly to world $(4,0,2)$ — the dungeon floor spot the designer chose. Every one of the barrel's eight vertices, multiplied by this one matrix, is now expressed in the shared world coordinate system, alongside the hero, the torches, and the walls. One matrix placed the entire object.

Step 2 — World space to camera space

The player's camera floats at world position $(0, 1, 10)$ — slightly above the floor, ten units back — looking down the $-z$ axis into the scene. The view matrix $V$ must re-express the whole world relative to that camera, putting the camera at the origin. As §12.8 stressed, the view matrix is the inverse of the camera's placement: to bring the world in front of a camera sitting at $(0,1,10)$, you translate the entire world by $(0,-1,-10)$.

# View matrix: camera at (0,1,10) looking down -z  =>  translate world by (0,-1,-10).
V = T3(0, -1, -10)
world_pt = M @ np.array([0,0,0,1.0])      # barrel origin in world: (4,0,2)
camera_pt = V @ world_pt
print("barrel origin in camera space:", camera_pt)

barrel origin in camera space: [ 4. -1. -8.  1.]

The barrel now sits at camera-space coordinates $(4, -1, -8)$: four units to the camera's right, one unit below its eye level, and eight units in front (the negative $z$ means "into the screen," our viewing convention). Nothing about the barrel physically moved — we re-coordinated the world so that "in front of the camera" became a simple matter of sign. This is the conceptual heart of the pipeline: the object is fixed; the matrices slide the coordinate systems.

Step 3 — Projection: camera space to the screen

Now flatten. We use perspective projection (this is a game, not a blueprint), so a point at camera depth $z$ projects to $(x / \text{depth},\ y/\text{depth})$ where the depth is the distance in front of the camera, $-z$ (positive, since the barrel's $z$ is negative). Our barrel at $(4, -1, -8)$ has depth $8$:

# Perspective project the barrel (focal length d = 1). Depth is -z (in front).
x, y, z, w = camera_pt
depth = -z                                  # 8 units in front
print("barrel on screen:", np.round((x/depth, y/depth), 3))

barrel on screen: [ 0.5  -0.125]

The barrel lands at screen coordinates $(0.5, -0.125)$ — right of center and slightly below, as expected for an object to the camera's right and below eye level. Those normalized coordinates would then be scaled by the viewport transform to actual pixels (and the $y$-axis flipped, since screens count rows top-down — the very flip from Chapter 7's Case Study 1).

Here is the payoff that makes perspective look real. Suppose an identical barrel sits much farther back, at world $(4, 0, -6)$, putting it at camera depth $16$ — twice as far. Watch what perspective does:

# A second, identical barrel twice as far away appears half the size on screen.
M2 = T3(4,0,-6) @ Ry(np.radians(30)) @ S3(1.5,1.5,1.5)
camera_pt2 = V @ (M2 @ np.array([0,0,0,1.0]))
print("far barrel camera space:", camera_pt2)
print("far barrel on screen:", np.round((camera_pt2[0]/(-camera_pt2[2]),
                                         camera_pt2[1]/(-camera_pt2[2])), 4))

far barrel camera space: [  4.  -1. -16.   1.]
far barrel on screen: [ 0.25  -0.062]

The far barrel projects to $(0.25, -0.0625)$ — exactly half the screen displacement of the near barrel, because it is twice as deep and perspective divides by depth. Two physically identical barrels, drawn at different sizes purely because of distance: that is the $1/z$ shrink of §12.7 producing the illusion of depth on a flat screen. Orthographic projection would have drawn them identically, betraying the flatness.

Putting the frame together

For the full barrel we repeat steps 1–3 not for the origin alone but for all eight vertices, in one batched matrix multiply per stage — exactly what a GPU does, except across thousands of vertices in parallel. Stack the eight homogeneous vertices as columns of a $4\times 8$ matrix cube, and the entire transform to camera space is V @ M @ cube, one expression. Project each resulting column, connect the edges, and the engine has the barrel's silhouette. Do this for every object in view, sort by depth so nearer surfaces draw over farther ones (the depth buffer), and you have a frame.

# The whole barrel to camera space in one batched multiply (8 vertices at once).
cube = np.array([[0,1,1,0,0,1,1,0],
                 [0,0,1,1,0,0,1,1],
                 [0,0,0,0,1,1,1,1],
                 [1,1,1,1,1,1,1,1]], float)
camera_space = V @ M @ cube           # all 8 vertices, model then view
print(np.round(camera_space[:3, :4], 3))   # first four vertices (x,y,z)

[[ 4.     5.299  5.299  4.   ]
 [-1.    -1.     0.5    0.5  ]
 [-8.    -8.75  -8.75  -8.   ]]

Why this matters

This is not a simplified analogy — it is, geometrically, what every 3D engine does. The matrices are bigger in practice (real projection matrices also remap depth into a buffer range, and real scenes carry lighting and texture coordinates that are themselves linear algebra), but the spine is exactly what you just computed: a vertex is multiplied by a model matrix, a view matrix, and a projection matrix, then divided, then mapped to pixels. The skills are pure Part II — build a transform by composing rotation, scaling, and translation (Chapter 7, Chapter 8); mind the order because multiplication does not commute (Chapter 8); read the determinant's sign when orientation matters (Chapter 11); and use homogeneous coordinates so that translation and perspective both become matrix operations (this chapter).

The reason a modern graphics processor exists at all is to do this small computation enormously in parallel: millions of vertices, each multiplied by a handful of $4\times 4$ matrices, sixty or more times a second. When you see a vast, fluid 3D world, you are watching the composition $P V M$ evaluated for every point of every object, frame after frame. A matrix is a function that transforms space — and a game is that idea, running as fast as silicon allows.

Try it yourself

Re-render the near barrel but spin it to $60°$ about $y$ instead of $30°$. How does its model matrix change? Does its world position (the fourth column) change? (It should not — rotation and position are independent slots.)
Move the camera up to $(0, 5, 10)$ and rebuild $V$. Where does the barrel now appear on screen, and why has its vertical position dropped?
Place three barrels at depths $4$, $8$, and $16$ and project all three. Confirm their on-screen sizes are in the ratio $1 : \tfrac12 : \tfrac14$ — the signature of perspective's $1/z$.
Swap perspective for orthographic (drop $z$ instead of dividing by it) and re-plot the three barrels. Confirm they now appear the same size regardless of depth, and explain why that looks wrong for a game but right for a blueprint.