Case Study 2 — Rotated Frames in Robotics and Graphics

DataField.Dev

Case Study 2 — Rotated Frames in Robotics and Graphics

Field: robotics / computer graphics. A robot arm, a camera, and a game engine all live or die by change of basis: every object carries its own coordinate frame, and getting anything done means converting between them. This case study works a rotated frame in detail and shows the similarity formula $B = P^{-1}AP$ doing real work.

The setup: a world frame and a camera frame

A mobile robot rolls across a warehouse floor. Bolted to it is a camera, and the camera sees a landmark — a barcode on a shelf. There are (at least) two natural coordinate systems in play. The world frame is fixed to the building: $x$ points east, $y$ points north, origin at the loading dock. The camera frame is fixed to the robot: $x'$ points "to the camera's right," $y'$ points "straight ahead," origin at the lens. As the robot turns, the camera frame rotates relative to the world frame, even though the warehouse — and the barcode — stay put.

This is the everyday reality of robotics and graphics: there is no single privileged coordinate system. The barcode has world coordinates and camera coordinates, and they are different lists of numbers for the same physical point — precisely the "same vector, different coordinates" theme of this chapter. A robot that wants to drive toward the barcode must convert the camera's measurement into world coordinates to plan a path; a renderer that wants to draw a 3-D scene must convert every object from its own local frame into the camera frame to figure out where it lands on screen. Both operations are changes of basis. Getting the direction of the conversion right — $P$ versus $P^{-1}$ — is the difference between a robot that approaches the shelf and one that drives into a wall.

Rotation as a change-of-basis matrix

Suppose the robot has turned so that its camera frame is rotated $60°$ counterclockwise from the world frame. The camera-frame basis vectors, expressed in world coordinates, are the world-frame axes rotated by $60°$: $$\mathbf{b}_1 = (\cos 60°, \sin 60°), \qquad \mathbf{b}_2 = (-\sin 60°, \cos 60°).$$ Stacking them as columns gives the change-of-basis matrix $P$ — which here is just the rotation matrix $R$ from Chapter 7: $$P = R = \begin{bmatrix} \cos 60° & -\sin 60° \\ \sin 60° & \cos 60° \end{bmatrix} = \begin{bmatrix} 0.5 & -0.866 \\ 0.866 & 0.5 \end{bmatrix}.$$ By the chapter's rule, the columns of $P$ are the new (camera) basis vectors in the old (world) coordinates, so $P$ converts camera coordinates to world coordinates: $[\mathbf{p}]_{\text{world}} = R\,[\mathbf{p}]_{\text{camera}}$. To go the other way — to convert a camera measurement into world coordinates we already have it; to convert a world point into what the camera would report — we need $R^{-1}$.

Rotations carry a gift that makes this cheap. A rotation matrix is orthogonal: its inverse is simply its transpose, $R^{-1} = R^{\mathsf{T}}$ (Chapter 21). So converting world $\to$ camera costs no matrix inversion at all — you just transpose. Let's find the camera-frame coordinates of a landmark that sits at world coordinates $(2, 0)$ (two meters due east of the dock):

# Convert a world-frame point into the rotated camera frame. R^-1 = R^T for a rotation.
import numpy as np
np.set_printoptions(suppress=True, precision=4)
th = np.deg2rad(60)
R = np.array([[np.cos(th), -np.sin(th)],
              [np.sin(th),  np.cos(th)]])   # camera-frame -> world-frame
p_world = np.array([2., 0.])                # landmark, in world coordinates
p_cam = R.T @ p_world                       # world -> camera  (R^-1 = R^T)
print("landmark in world frame :", p_world)         # [2. 0.]
print("landmark in camera frame:", p_cam)           # [1.     -1.7321]

The landmark, two meters east in the world, sits at camera coordinates $(1, -1.732)$ — one meter to the camera's right and about $1.73$ meters behind it (negative "ahead"). That makes sense: the camera turned $60°$ to the left, so a point dead-east is now off to its right and toward its back. The same physical barcode, two addresses. The robot's vision system reports the camera-frame numbers; the path planner needs the world-frame numbers; $R$ and $R^{\mathsf{T}}$ shuttle between them.

When the transformation itself must change frames: similarity

Coordinates of points are only half the story. Often a transformation is defined in one frame and must be applied in another — and that is where the similarity formula $B = P^{-1}AP$ earns its place on the factory floor.

Concretely: suppose the warehouse has an anisotropic effect that stretches space twice as much east–west as north–south — say a wide-angle lens distortion, or a calibrated scaling the vision pipeline must undo. In the world frame this stretch is the clean diagonal matrix $$S = \begin{bmatrix} 2 & 0 \\ 0 & 1 \end{bmatrix}.$$ But the robot's image-processing code operates in the camera frame. What matrix represents the same stretch in camera coordinates? It cannot be $S$ — the camera's axes are rotated, so "east–west" is no longer "the camera's $x'$." We need $S$ re-expressed in the camera frame, which is exactly the similarity transform with $P = R$: $$B = R^{-1} S R = R^{\mathsf{T}} S R.$$

# Re-express a world-frame scaling in the camera frame via similarity B = R^-1 S R.
import numpy as np
np.set_printoptions(suppress=True, precision=4)
th = np.deg2rad(60)
R = np.array([[np.cos(th), -np.sin(th)], [np.sin(th), np.cos(th)]])
S = np.array([[2., 0.], [0., 1.]])          # the stretch, in WORLD coordinates
B = R.T @ S @ R                              # the SAME stretch, in CAMERA coordinates
print("B = R^-1 S R =\n", B)                 # [[1.25 -0.433] [-0.433 1.75]]
print("trace:  S =", np.trace(S), " B =", round(np.trace(B), 4))   # 3.0  3.0
print("det:    S =", round(np.linalg.det(S)), " B =", round(np.linalg.det(B)))  # 2  2

The camera-frame matrix is $B = \begin{bmatrix} 1.25 & -0.433 \\ -0.433 & 1.75 \end{bmatrix}$ — full of off-diagonal cross-terms, nothing like the tidy diagonal $S$. And yet it is the identical physical stretch, merely described from the rotated camera's point of view. The proof that it is the same transformation is the commuting-diagram argument of §16.7; the proof that we got the bookkeeping right is the invariants. The trace is $3$ for both $S$ and $B$, and the determinant is $2$ for both: the stretch still triples-then-halves nothing — it doubles area's east-west extent, scaling total area by a factor of $2$ — regardless of which frame measures it. Area scaling cannot depend on the camera's orientation, so $\det$ must be frame-independent; that is the chapter's invariance result, confirmed on real hardware.

The direction of $P$ is where robotics bugs live. The matrix $R$ here has the camera basis vectors as columns, so $R$ maps camera $\to$ world and $R^{-1}=R^{\mathsf{T}}$ maps world $\to$ camera. A great many real defects — a robot arm that moves the wrong way, a rendered object that appears mirror-flipped or rotated backwards — are a transposed or inverted change-of-basis matrix: someone used $R$ where they needed $R^{\mathsf{T}}$, exactly the Common Pitfall of §16.3. The discipline that prevents it is to test on a basis vector: the camera's own forward axis $(0,1)$ in camera coordinates must convert to $R(0,1) = (-\sin 60°, \cos 60°)$ in world coordinates — if your code disagrees, the direction is flipped.

Chaining frames: the rendering and kinematics pipeline

Real systems stack many frames. A point on a robot's gripper has gripper coordinates; to find its world coordinates you change basis from gripper to forearm to upper-arm to base to world — a chain of rotation (and translation) matrices multiplied together, each one a change-of-basis matrix in the sense of §16.3.1. A graphics engine does the same: model space $\to$ world space $\to$ camera (view) space $\to$ clip space, each arrow a matrix, the whole pipeline one long composition. Because change-of-basis matrices compose by multiplication (transitivity of similarity, §16.7), the entire chain collapses into a single matrix that the GPU applies to every vertex. The mathematics that lets a game draw a million triangles per frame, or a robot arm know where its fingertip is, is precisely the change of basis of this chapter, iterated.

A note on translation, and why frames are usually "almost" change of basis

A careful reader will object that a real camera frame differs from the world frame not only by a rotation but by a translation — the lens is not at the loading dock. Pure change of basis, as developed in this chapter, handles only the rotation: it relates two bases sharing the same origin, because a basis is a set of directions through the origin and linear maps fix the origin. Translation slides the origin, which is not a linear operation (it sends $\mathbf{0}$ somewhere nonzero), so it falls outside the strict $B = P^{-1}AP$ story.

Robotics and graphics solve this with a clever trick from Chapter 12: homogeneous coordinates. You append a $1$ to every point, lifting a 2-D point $(x, y)$ to the 3-D vector $(x, y, 1)$, and then translation becomes a linear map — a $3\times 3$ matrix whose top-right column holds the offset. In this enlarged space, the full frame transform (rotate and translate) is once again a single matrix, and the entire apparatus of this chapter applies verbatim: the camera-to-world transform is a $3\times 3$ change-of-basis matrix, its inverse converts world-to-camera, and chains of frames compose by multiplication. The conceptual content is identical to what we built here; homogeneous coordinates merely buy the extra dimension that turns the affine frame change (rotation + translation) back into an honest linear change of basis. So even the translations, which seem to escape the chapter, are folded back in by the right choice of space — itself a kind of "change of basis" on the problem.

The rotational part we worked above is the part that carries the orientation information, and it is where the similarity formula bites hardest: orientations of transformations (a stretch, a blur, an inertia tensor) re-express by $B = P^{-1}AP$ using the rotational $P$, while positions of points re-express by the full homogeneous matrix. Keeping straight which objects are points (transform by the matrix) and which are operators (transform by similarity) is the daily discipline of a graphics or robotics engineer — and it is exactly the point/transformation distinction this chapter draws.

The lesson

In robotics and graphics there is never one true coordinate system — there is a web of frames, and competence is the fluency to convert among them without losing track of direction. A point keeps its physical identity while its coordinates change with the frame ($R$ and $R^{\mathsf{T}}$); a transformation keeps its physical action while its matrix changes with the frame ($B = R^{-1}SR$); and the trace and determinant stand as invariants that let you check you have not corrupted the transformation in translation. The chapter's abstract slogan — the matrix is the shadow, the transformation is the object — is, for a robot, the concrete difference between reaching the shelf and crashing into it.