> Learning paths. Math majors — read everything; pay special attention to the careful statement of the eight closure-and-arithmetic properties at the end of §2.4 (they are the seed of the vector-space axioms in Chapter 5) and to the Math-Major...
Prerequisites
- chapter-01-what-is-linear-algebra
Learning Objectives
- Define a vector in both of its equivalent guises — a directed arrow (direction + magnitude) and an ordered list of components — and explain why both views matter.
- Add vectors geometrically (tip-to-tail and the parallelogram rule) and algebraically (componentwise), and recognize the two procedures as the same operation.
- Multiply a vector by a scalar and predict the geometric effect (stretching, shrinking, flipping) from the sign and size of the scalar.
- Read and write a vector's components and coordinates, translating fluently between the picture and the column of numbers, in both 1-indexed math notation and 0-indexed numpy.
- Estimate and compute the magnitude (length) of a vector via the Pythagorean theorem, and form a linear combination of several vectors.
- Implement add, scale, and magnitude from scratch in toolkit/vectors.py and verify them against numpy.
In This Chapter
- 2.1 What is a vector in math?
- 2.2 How do you add two vectors?
- 2.3 What does it mean to multiply a vector by a scalar?
- 2.4 Why are addition and scaling the only operations that matter?
- 2.5 How long is a vector? Magnitude and the Pythagorean theorem
- 2.6 What is the midpoint of two vectors, and how do we blend them?
- 2.7 Components, coordinates, and the 1-indexed/0-indexed gap
- 2.8 Seeing vectors move: scaling and adding with the visualizer
- 2.9 Vectors as data: when there is no arrow to draw
- 2.10 How do velocities and displacements combine? A second application
- 2.11 What color is that, as a vector? RGB and the geometry of blending
- 2.12 Build your toolkit: vectors.py
- 2.13 Summary and the road ahead
Vectors: Direction, Magnitude, and the Language of Space
Learning paths. Math majors — read everything; pay special attention to the careful statement of the eight closure-and-arithmetic properties at the end of §2.4 (they are the seed of the vector-space axioms in Chapter 5) and to the Math-Major Sidebar on what "the same object" really means. CS / Data Science — focus on the Geometric Intuition callouts, the
numpysnippets, and the applications; you will spend the rest of your career storing data as vectors, so the two-views idea in §2.1 is the one to internalize. Physics / Engineering — focus on the geometry of addition and scaling, the displacement/velocity framing, and keep the picture of arrows sliding tip-to-tail in your head. This chapter assumes only Chapter 1: the idea that a matrix is a function that transforms space, and the recurring 2D visualizer.
In Chapter 1 we made a promise: that linear algebra is the study of how space can be moved and reshaped, and that a matrix is just the bookkeeping for one of those motions. But before we can transform space, we need the things that live in space — the objects that get moved. Those objects are vectors. A vector is the raw material of everything that follows: the input and output of every transformation, the rows and columns of every matrix, the data point in every dataset, the state of every quantum system. Get vectors right, deeply and geometrically, and the rest of the book has somewhere solid to stand.
This chapter answers the most basic question in the subject — what is a vector in math? — and it answers it twice, because there are two answers and the whole power of linear algebra comes from holding both at once. Then we learn the only two operations vectors really have, addition and scaling, in pictures first and formulas second, exactly as the book's rhythm demands.
2.1 What is a vector in math?
Picture an arrow. Not anchored to any particular spot on the page — just an arrow with a certain length, pointing in a certain direction. Slide it around without rotating or stretching it, and it is still the same arrow, because the only two things that define it are how long it is and which way it points. That is the first answer to our question.
Geometric Intuition — A vector is a directed length: a magnitude (how far) together with a direction (which way), and nothing else. Position is not part of its identity. The arrow from the corner of your desk to your coffee cup and an identical arrow drawn across the room represent the same vector, because they have the same length and the same heading. This is why we are free to slide vectors around — a freedom we will cash in immediately when we add them.
Now here is the second answer, and it looks completely different. A vector is a list of numbers in a definite order. The arrow pointing 3 units east and 1 unit north is the list $(3, 1)$. The arrow pointing 3 units east and 1 unit south is the different list $(3, -1)$. In three dimensions a vector is a list of three numbers; in $n$ dimensions, a list of $n$ numbers. We write a vector in bold lowercase, and by default we stack its numbers vertically as a column:
$$ \mathbf{v} = \begin{bmatrix} 3 \\ 1 \end{bmatrix}, \qquad \mathbf{w} = \begin{bmatrix} 4 \\ 5 \\ 6 \end{bmatrix}, \qquad \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}. $$
The individual numbers are the components (or coordinates) of the vector. We name them with the same letter, italic and not bold, with a subscript telling us which one: $\mathbf{v}$ has components $v_1 = 3$ and $v_2 = 1$. A vector with $n$ components is an element of $\mathbb{R}^n$, "real $n$-space," and we write $\mathbf{v} \in \mathbb{R}^n$. (The numbers are real for now; in later chapters they may be complex, and the symbol becomes $\mathbb{C}^n$.)
These two answers — arrow and list — are not rivals. They are the same object seen from two sides, and the bridge between them is a coordinate system. Lay down a horizontal axis and a vertical axis (the $x$- and $y$-axes), agree on a unit of length, and any arrow gets a list: walk from the tail to the tip, and record how far you went horizontally ($v_1$) and how far vertically ($v_2$). Run the bridge the other way and any list becomes an arrow: from the origin, go $v_1$ to the right and $v_2$ up, and draw the arrow to where you land.
The Key Insight — A vector is both an arrow and a list. The arrow view tells you what a vector means (direction and magnitude, motion through space); the list view tells you how to compute with it (just numbers). Every idea in this book has a picture and a formula, and a fluent linear algebraist flips between them without friction.
Why insist on both? Because each view does work the other cannot. The arrow makes operations visible: you will literally see why $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$, no algebra required. The list makes operations scalable: a computer cannot draw a 784-dimensional arrow (a small grayscale image, flattened), but it can add two lists of 784 numbers in microseconds. Geometry gives understanding; coordinates give power. We refuse to give up either.
Common Pitfall — Many students hear "a vector is a list of numbers" and conclude that the list is the vector, full stop — that $(3,1)$ and the arrow are the same thing in the same way that "5" and "five" are. Not quite. The list depends on the coordinate system you chose; rotate the axes and the same arrow gets a different list. The arrow is the invariant object; the list is its shadow in one particular coordinate frame. This distinction looks pedantic now, but it is the entire content of Chapter 16 (change of basis), where one vector wears many lists.
Historical Note. The word vector comes from the Latin vehere, "to carry" — a vector "carries" you from one point to another, which is exactly the displacement reading. The modern arrow-and-components concept crystallized in the 1800s out of two streams: William Rowan Hamilton's quaternions (1843), whose imaginary part behaved like a 3D directed quantity, and Hermann Grassmann's more abstract Ausdehnungslehre ("theory of extension," 1844). The clean vector algebra we use today — separate vectors, dot and cross products, the bold-arrow notation — was distilled later by Josiah Willard Gibbs and Oliver Heaviside in the 1880s, largely for physics and electromagnetism. [verify] (Exact dates and the precise division of credit between Hamilton, Grassmann, Gibbs, and Heaviside are debated by historians; treat the decade-level story as reliable and the fine attribution as approximate.)
A note on points versus arrows
You will sometimes see $(3, 1)$ called a point and sometimes a vector, and the slippage is deliberate but worth naming. A point is a location; a vector is a displacement. The point $(3,1)$ sits at a spot in the plane. The vector $\begin{bmatrix} 3 \\ 1 \end{bmatrix}$ is the instruction "move 3 right and 1 up." They share the same two numbers because we conventionally draw a vector starting at the origin, so its tip lands exactly on the point with the same coordinates. A vector drawn from the origin like this is called a position vector — it encodes a point as the arrow you'd travel to reach it. A vector drawn between two other points is a displacement vector. Same arithmetic, different story; we will lean on the displacement reading constantly in the case studies.
Check Your Understanding — An arrow starts at the point $(1, 2)$ and ends at the point $(4, 4)$. What is the vector it represents, as a column of components?
Answer
Subtract tail from tip, componentwise: the horizontal change is $4 - 1 = 3$ and the vertical change is $4 - 2 = 2$, so the vector is $\begin{bmatrix} 3 \\ 2 \end{bmatrix}$. Notice that an arrow from $(0,0)$ to $(3,2)$ — or from $(10, 10)$ to $(13, 12)$ — represents the same vector. Only the difference tip $-$ tail matters, which is exactly the "position is not part of a vector's identity" principle made concrete.
2.2 How do you add two vectors?
Addition is the first of the two operations a vector has, and it is best met as a motion before it is met as a formula. So here is the motion.
Suppose you take a walk. First you go along the arrow $\mathbf{u}$ — say, 3 blocks east and 1 block north. Then, from wherever you ended up, you walk along the arrow $\mathbf{v}$ — say, 1 block east and 2 blocks north. Where do you end up relative to where you started? The single arrow from your starting point to your final point is the sum $\mathbf{u} + \mathbf{v}$.
Geometric Intuition — To add two vectors, place them tip to tail: draw $\mathbf{u}$, then start $\mathbf{v}$ at the tip of $\mathbf{u}$. The sum $\mathbf{u} + \mathbf{v}$ is the arrow from the tail of $\mathbf{u}$ to the tip of $\mathbf{v}$ — the net journey. Equivalently, if you draw both $\mathbf{u}$ and $\mathbf{v}$ starting from the same origin, they span a parallelogram, and $\mathbf{u} + \mathbf{v}$ is its diagonal. This is the parallelogram rule, and it is the picture worth burning into memory: addition is composing two displacements into one.
Now watch the picture hand us a fact for free. If you walk $\mathbf{u}$ then $\mathbf{v}$, you reach the same final corner as walking $\mathbf{v}$ then $\mathbf{u}$ — the parallelogram has the same far corner either way. So vector addition is commutative: $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$. We did not manipulate a single symbol; we read it off the diagram. That is the geometry-first method paying its first dividend.
With the picture in hand, the formula is almost anticlimactic. To add two vectors, add their corresponding components:
$$ \mathbf{u} + \mathbf{v} = \begin{bmatrix} u_1 \\ u_2 \end{bmatrix} + \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} u_1 + v_1 \\ u_2 + v_2 \end{bmatrix}. $$
In general, for vectors in $\mathbb{R}^n$, the $i$-th component of the sum is $u_i + v_i$. Why does adding components match the tip-to-tail picture? Because horizontal and vertical motions don't interfere. Your total eastward travel is your eastward travel on leg one plus your eastward travel on leg two; likewise for northward. The components add independently, which is exactly what the formula says.
Warning
— You can only add two vectors that live in the same space — same number of components. The sum $\begin{bmatrix} 1 \\ 2 \end{bmatrix} + \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}$ is undefined; there is no rule for adding a 2-vector to a 3-vector, and no sensible picture either (an arrow in the plane plus an arrow in space is not anything). Dimension-matching is a real condition, not a formality. In code this surfaces as a shape-mismatch error, and you will see it constantly when wiring datasets together; check dimensions first.
Hand computation
Let $\mathbf{u} = \begin{bmatrix} 3 \\ 1 \end{bmatrix}$ and $\mathbf{v} = \begin{bmatrix} 1 \\ 2 \end{bmatrix}$. Then
$$ \mathbf{u} + \mathbf{v} = \begin{bmatrix} 3 + 1 \\ 1 + 2 \end{bmatrix} = \begin{bmatrix} 4 \\ 3 \end{bmatrix}. $$
Sketch it to confirm: from the origin go to $(3,1)$; from there go 1 right and 2 up, landing at $(4,3)$. The arrow straight from the origin to $(4,3)$ is the sum. The picture and the formula agree, as they always will.
numpy verification
# Vector addition is componentwise — numpy does it with the + operator.
import numpy as np
u = np.array([3, 1])
v = np.array([1, 2])
print(u + v) # [4 3]
print(v + u) # [4 3] -> addition is commutative
The output [4 3] matches our hand calculation, and printing v + u confirms the commutativity we read off the parallelogram. One small but important habit: numpy's + on arrays is componentwise and is exactly vector addition, but + on Python lists means concatenation ([3,1] + [1,2] gives [3, 1, 1, 2]). Always wrap your data in np.array before doing vector arithmetic.
What is vector subtraction, geometrically? Subtraction is just adding the reverse arrow: $\mathbf{u} - \mathbf{v} = \mathbf{u} + (-\mathbf{v})$, where $-\mathbf{v}$ is $\mathbf{v}$ flipped to point the opposite way (we make that precise in §2.3). There is an even more useful reading: $\mathbf{u} - \mathbf{v}$ is the arrow that points from the tip of $\mathbf{v}$ to the tip of $\mathbf{u}$ when both start at the origin. That is precisely why, back in §2.1, the vector between two points was tip $-$ tail: the displacement from point $P$ to point $Q$ is the position vector of $Q$ minus the position vector of $P$. Subtraction is "how do I get from here to there."
2.3 What does it mean to multiply a vector by a scalar?
The second operation is scalar multiplication — multiplying a vector by an ordinary number. In linear algebra an ordinary number is called a scalar, precisely because its job is to scale vectors. Once again, picture first.
Take the vector $\mathbf{v} = \begin{bmatrix} 2 \\ 1 \end{bmatrix}$ and multiply it by $3$. The result $3\mathbf{v}$ is the same arrow pointing the same direction, but three times as long. Multiply by $\tfrac{1}{2}$ and you get an arrow pointing the same way but half as long. Multiply by $0$ and the arrow collapses to a single point at the origin — the zero vector $\mathbf{0}$, the one vector with no direction at all.
What about negative scalars? Multiplying by $-1$ keeps the length the same but reverses the direction — the arrow now points exactly backward. Multiplying by $-2$ both reverses and doubles. So the sign controls direction (forward or backward along the same line) and the magnitude controls length.
Geometric Intuition — Scalar multiplication slides a vector along the line through the origin that it defines. A positive scalar $c$ stretches it (if $c > 1$) or shrinks it (if $0 < c < 1$) without turning it; a negative scalar flips it to the opposite ray and scales it by $|c|$; $c = 0$ crushes it to the origin. Every scalar multiple of $\mathbf{v}$ lies on one infinite straight line through the origin — and that line is exactly the set $\{c\mathbf{v} : c \in \mathbb{R}\}$. Hold onto this: an entire line is the scalar multiples of a single nonzero vector. It is your first glimpse of span, the star of Chapter 6.
The formula is again the obvious one: to multiply a vector by a scalar, multiply every component by that scalar.
$$ c\,\mathbf{v} = c\begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} = \begin{bmatrix} c\,v_1 \\ c\,v_2 \\ \vdots \\ c\,v_n \end{bmatrix}. $$
Why does scaling every component scale the arrow's length by the same factor and leave its direction alone (or reverse it)? Because stretching both the horizontal and the vertical legs of the arrow by $c$ produces a similar right triangle — same shape, $|c|$ times the size — so the hypotenuse (the arrow) grows by $|c|$ too, and points the same way (or the opposite way if $c<0$). We will make "length" precise in §2.5, but the intuition is already correct.
Hand computation
Let $\mathbf{v} = \begin{bmatrix} 2 \\ 1 \end{bmatrix}$.
$$ 3\mathbf{v} = \begin{bmatrix} 6 \\ 3 \end{bmatrix}, \qquad -1\,\mathbf{v} = \begin{bmatrix} -2 \\ -1 \end{bmatrix}, \qquad \tfrac{1}{2}\mathbf{v} = \begin{bmatrix} 1 \\ 0.5 \end{bmatrix}. $$
The first is three times as long, same direction; the second is the same length, reversed; the third is half as long, same direction. Each lies on the single line through the origin and $(2,1)$.
numpy verification
# Scalar multiplication scales every component (numpy: scalar * array).
import numpy as np
v = np.array([2.0, 1.0])
print(3 * v) # [6. 3.]
print(-1 * v) # [-2. -1.]
print(0.5 * v) # [1. 0.5]
Outputs are [6. 3.], [-2. -1.], and [1. 0.5], matching the hand results. (The decimal points appear because we declared the array as floats with 2.0, 1.0; with integer input np.array([2,1]), 0.5 * v would still promote to floats automatically.)
Common Pitfall — Scalar multiplication does not rotate a vector to a new direction (except the trivial flip by a negative scalar). No real scalar will turn $\begin{bmatrix} 1 \\ 0 \end{bmatrix}$ into $\begin{bmatrix} 0 \\ 1 \end{bmatrix}$, because those arrows point along different lines through the origin, and scaling keeps you on one line. Turning vectors to genuinely new directions is the job of matrices, not scalars — which is the whole reason we need matrices at all. If you ever find yourself wishing a scalar could rotate something, that wish is Chapter 7 knocking.
2.4 Why are addition and scaling the only operations that matter?
We now have two operations: add two vectors, and scale one vector by a number. That is a suspiciously short list. Where is vector multiplication? Where is division? The honest and important answer is that these two operations — combined — are the entire foundation of linear algebra, and almost everything else is built from them.
When you scale several vectors and add the results, you form a linear combination:
$$ c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \cdots + c_k \mathbf{v}_k, $$
where the $c_i$ are scalars and the $\mathbf{v}_i$ are vectors. The whole subject is, in a real sense, the study of linear combinations: which vectors you can reach by combining a given set (that's span, Chapter 6), whether any vector in your set is redundant (that's linear independence, Chapter 6), and how transformations act on combinations (that's linearity, the thread from Chapter 1). When we said in Chapter 1 that a transformation $T$ is linear exactly when it respects addition and scaling — $T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v})$ and $T(c\mathbf{v}) = c\,T(\mathbf{v})$ — we were saying it respects linear combinations. These two operations are the operations linear algebra is named for.
Geometric Intuition — A linear combination is a recipe for reaching a point by "spending" some amount of each ingredient vector. With ingredients $\mathbf{e}_1 = \begin{bmatrix}1\\0\end{bmatrix}$ (one step east) and $\mathbf{e}_2 = \begin{bmatrix}0\\1\end{bmatrix}$ (one step north), the combination $3\mathbf{e}_1 + 1\mathbf{e}_2$ takes 3 steps east and 1 north to land at $(3,1)$. Every vector in the plane is some linear combination of $\mathbf{e}_1$ and $\mathbf{e}_2$ — which is why those two are called the standard basis vectors of $\mathbb{R}^2$. The components of a vector are nothing more than the amounts of each basis vector in its recipe.
Let's compute one. With $\mathbf{v}_1 = \begin{bmatrix} 2 \\ 0 \end{bmatrix}$ and $\mathbf{v}_2 = \begin{bmatrix} 0 \\ 3 \end{bmatrix}$, the combination with weights $2$ and $-1$ is
$$ 2\mathbf{v}_1 + (-1)\mathbf{v}_2 = \begin{bmatrix} 4 \\ 0 \end{bmatrix} + \begin{bmatrix} 0 \\ -3 \end{bmatrix} = \begin{bmatrix} 4 \\ -3 \end{bmatrix}. $$
# A linear combination is "scale each, then add."
import numpy as np
v1 = np.array([2, 0])
v2 = np.array([0, 3])
print(2*v1 + (-1)*v2) # [ 4 -3]
The output [ 4 -3] matches. This three-line snippet is, quietly, the most important computation in the book: scale each vector, add the results. Matrix–vector multiplication (Chapter 7) will turn out to be exactly a linear combination of the matrix's columns. Solving $A\mathbf{x} = \mathbf{b}$ (Chapter 3) will turn out to be asking which linear combination of the columns of $A$ equals $\mathbf{b}$. You are looking at the engine of the whole machine.
Worked example: reaching a target with the right weights
Linear combinations become genuinely interesting when the ingredient vectors do not point along the axes, because then finding the weights is a small puzzle — and that puzzle is exactly what solving a linear system will be. Take the two ingredients $\mathbf{a} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}$ (northeast) and $\mathbf{b} = \begin{bmatrix} 1 \\ -1 \end{bmatrix}$ (southeast). Can we reach the target $\begin{bmatrix} 4 \\ 2 \end{bmatrix}$ as a combination $c_1 \mathbf{a} + c_2 \mathbf{b}$, and if so, with which weights?
Write the combination componentwise. The first component must satisfy $c_1 + c_2 = 4$ and the second $c_1 - c_2 = 2$. Add the two equations: $2c_1 = 6$, so $c_1 = 3$; substitute back to get $c_2 = 1$. So $3\mathbf{a} + 1\mathbf{b}$ should land on $(4,2)$.
$$ 3\begin{bmatrix} 1 \\ 1 \end{bmatrix} + 1\begin{bmatrix} 1 \\ -1 \end{bmatrix} = \begin{bmatrix} 3 \\ 3 \end{bmatrix} + \begin{bmatrix} 1 \\ -1 \end{bmatrix} = \begin{bmatrix} 4 \\ 2 \end{bmatrix}. \checkmark $$
# Which combination of a and b reaches the target [4, 2]?
import numpy as np
a = np.array([1.0, 1.0])
b = np.array([1.0, -1.0])
print(3*a + 1*b) # [4. 2.] -> weights c1=3, c2=1 hit the target
The output [4. 2.] confirms the weights. Two things deserve a flag. First, we just solved a system of two linear equations ($c_1 + c_2 = 4$, $c_1 - c_2 = 2$) by hand — that is the entire subject of Chapters 3 and 4, met here in miniature. Second, because $\mathbf{a}$ and $\mathbf{b}$ point in genuinely different directions (neither is a scalar multiple of the other), every target in the plane is reachable by exactly one choice of weights; the two arrows are enough to build all of $\mathbb{R}^2$. When two vectors fail this — when one is just a scaled copy of the other — their combinations fill only a single line, and most targets become unreachable. That fork (build everything vs. build only a line) is the seed of linear independence and span in Chapter 6.
Geometric Intuition — Picture the weights as dials. Turning the $c_1$ dial slides you along the $\mathbf{a}$ direction; turning $c_2$ slides you along $\mathbf{b}$. With two dials pointing in independent directions, you can drive to any point in the plane — there is exactly one setting of the dials for each destination. The components of a vector in the basis $\{\mathbf{a}, \mathbf{b}\}$ are precisely those dial settings. We met this for the standard basis $\{\mathbf{e}_1, \mathbf{e}_2\}$ already; the surprise of Chapter 16 is that many different pairs of dials describe the same plane, and the same point reads as different weights on different dials.
The arithmetic of vectors (a preview of Chapter 5)
Addition and scaling obey a handful of rules so natural you have been assuming them all along. For any vectors $\mathbf{u}, \mathbf{v}, \mathbf{w}$ in $\mathbb{R}^n$ and any scalars $c, d$:
- Commutativity of addition: $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$.
- Associativity of addition: $(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$.
- Additive identity: there is a zero vector $\mathbf{0}$ with $\mathbf{v} + \mathbf{0} = \mathbf{v}$.
- Additive inverse: every $\mathbf{v}$ has a negative $-\mathbf{v}$ with $\mathbf{v} + (-\mathbf{v}) = \mathbf{0}$.
- Distributivity over vector sums: $c(\mathbf{u} + \mathbf{v}) = c\mathbf{u} + c\mathbf{v}$.
- Distributivity over scalar sums: $(c + d)\mathbf{v} = c\mathbf{v} + d\mathbf{v}$.
- Compatibility of scalar multiplication: $c(d\mathbf{v}) = (cd)\mathbf{v}$.
- Scalar identity: $1\,\mathbf{v} = \mathbf{v}$.
Each is one line to prove from the componentwise definitions, because each just restates an ordinary arithmetic fact about real numbers, applied component by component. (For instance, rule 5 holds because $c(u_i + v_i) = c u_i + c v_i$ for each $i$, by the distributive law for real numbers.)
Math-Major Sidebar. These eight rules are not a random list — they are precisely the axioms of a vector space, the abstract structure we meet in Chapter 5. The punchline of that chapter is that any set whose elements you can add and scale subject to these eight rules behaves exactly like arrows in $\mathbb{R}^n$, even when the "vectors" are polynomials, continuous functions, or matrices. We verify the rules here for $\mathbb{R}^n$ so that, when the abstraction arrives, you already have a concrete model in hand and the axioms feel like old friends rather than arbitrary demands. Notice also rule 8's quiet necessity: without it, the scalar $1$ could do something other than nothing, and the structure would break in strange ways.
2.5 How long is a vector? Magnitude and the Pythagorean theorem
A vector has a direction and a magnitude — its length — and we have been using the word "length" informally. Let's pin it down, at least in two and three dimensions, where geometry makes it unavoidable. (The full, $n$-dimensional, abstract treatment of length is the norm, and it gets its own chapter, Chapter 18; here we want just enough to talk about how big a vector is.)
Geometric Intuition — A vector in the plane is the hypotenuse of a right triangle whose legs are its horizontal and vertical components. So its length is exactly what the Pythagorean theorem says a hypotenuse is. The vector $\begin{bmatrix} 3 \\ 4 \end{bmatrix}$ is the hypotenuse of a 3-by-4 right triangle, so its length is $\sqrt{3^2 + 4^2} = \sqrt{25} = 5$. No new idea — just Pythagoras, wearing vector notation.
We write the magnitude of $\mathbf{v}$ with double bars, $\lVert \mathbf{v} \rVert$, and in $\mathbb{R}^2$ and $\mathbb{R}^3$ it is
$$ \left\lVert \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} \right\rVert = \sqrt{v_1^2 + v_2^2}, \qquad \left\lVert \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix} \right\rVert = \sqrt{v_1^2 + v_2^2 + v_3^2}. $$
The 3D formula is the 2D one with another squared term — apply Pythagoras twice, once in the base plane and once for the vertical rise. The pattern extends to $\mathbb{R}^n$ by the same logic, giving $\lVert \mathbf{v} \rVert = \sqrt{v_1^2 + \cdots + v_n^2}$; we will justify that carefully in Chapter 18 using the dot product, but the formula is the one your intuition already expects.
Warning
— Notation collision ahead. The double-bar $\lVert \mathbf{v} \rVert$ means magnitude/length and is always written with double bars; single bars $|x|$ mean the absolute value of a scalar. Writing $|\mathbf{v}|$ for a vector's length is a habit worth breaking now, because in later chapters $|\cdot|$ on a matrix will mean the determinant — a completely different quantity. Use $\lVert \cdot \rVert$ for vectors, always.
Hand computation
$$ \left\lVert \begin{bmatrix} 3 \\ 4 \end{bmatrix} \right\rVert = \sqrt{9 + 16} = \sqrt{25} = 5, \qquad \left\lVert \begin{bmatrix} 1 \\ 2 \\ 2 \end{bmatrix} \right\rVert = \sqrt{1 + 4 + 4} = \sqrt{9} = 3. $$
Both come out to whole numbers because $(3,4,5)$ and $(1,2,2,3)$ are Pythagorean tuples, chosen so you can check the arithmetic in your head. Most vectors have irrational lengths — for instance $\lVert (1,1) \rVert = \sqrt{2} \approx 1.414$.
numpy verification
# Magnitude (Euclidean length) via numpy's linear-algebra module.
import numpy as np
print(np.linalg.norm(np.array([3, 4]))) # 5.0
print(np.linalg.norm(np.array([1, 2, 2]))) # 3.0
print(np.linalg.norm(np.array([1, 1]))) # 1.4142135623730951
Outputs 5.0, 3.0, and 1.4142135623730951 (which is $\sqrt{2}$) confirm the hand results. The function is np.linalg.norm — "norm" is the grown-up name for length, and we will earn that name in Chapter 18.
Computational Note —
np.linalg.normcomputes the Euclidean ($\ell^2$) length by default, exactly the Pythagorean formula above. It accepts anordargument to compute other notions of length (for exampleord=1sums absolute values — the "taxicab" length, useful in optimization and compressed sensing). You don't need those yet, but it's worth knowing the default is one choice among several. Floating-point caveat:normof very large or very small components can overflow or underflow if implemented naively; numpy scales internally to avoid this, which is one reason to prefer it over hand-rollingsqrt(sum of squares)in production.
A 3D worked example: how far did the drone fly?
Magnitude earns its keep the moment a problem leaves the plane. A delivery drone lifts off from a launch pad and ends up at the position $\begin{bmatrix} 2 \\ 3 \\ 6 \end{bmatrix}$ (in units of, say, 100 meters east, north, and up). How far is it from the pad — the straight-line distance through the air? That distance is the magnitude of the displacement vector:
$$ \left\lVert \begin{bmatrix} 2 \\ 3 \\ 6 \end{bmatrix} \right\rVert = \sqrt{2^2 + 3^2 + 6^2} = \sqrt{4 + 9 + 36} = \sqrt{49} = 7. $$
# Straight-line distance in 3D is the magnitude of the displacement.
import numpy as np
launch = np.array([0.0, 0.0, 0.0])
drone = np.array([2.0, 3.0, 6.0])
print(np.linalg.norm(drone - launch)) # 7.0
The output 7.0 matches: the drone is 7 units from the pad even though no single coordinate is that large, because the three perpendicular legs combine through Pythagoras. The distance between two points is exactly the magnitude of their difference vector, $\lVert \mathbf{q} - \mathbf{p} \rVert$ — distance is subtraction (§2.2) followed by length, in any number of dimensions. This is the workhorse formula behind nearest-neighbor search in machine learning, collision proximity in 3D games, and clustering in data science, all of which boil down to "which points are close?"
Scaling and length
Now we can confirm the claim from §2.3 that scaling by $c$ scales the length by $|c|$. If $\mathbf{w} = c\mathbf{v}$, then each component is $c v_i$, so
$$ \lVert c\mathbf{v} \rVert = \sqrt{(cv_1)^2 + \cdots + (cv_n)^2} = \sqrt{c^2\,(v_1^2 + \cdots + v_n^2)} = |c|\,\lVert \mathbf{v} \rVert. $$
The $|c|$ (not $c$) is the careful part: a length is never negative, and $\sqrt{c^2} = |c|$, so scaling by $-2$ multiplies the length by $2$, not $-2$ — exactly matching the picture of a flip-and-double. A vector scaled to length 1 (divide by its own magnitude, $\mathbf{v}/\lVert\mathbf{v}\rVert$, when $\mathbf{v}\neq\mathbf{0}$) is called a unit vector, and it carries pure direction with the length factored out; we will use unit vectors constantly from Chapter 18 onward.
Check Your Understanding — Without computing a square root, is the vector $2\begin{bmatrix} 3 \\ 4 \end{bmatrix}$ longer or shorter than $\begin{bmatrix} 3 \\ 4 \end{bmatrix}$, and by exactly what factor?
Answer
Exactly twice as long. Scaling by $c = 2$ scales the length by $|c| = 2$, so since $\lVert (3,4) \rVert = 5$, we have $\lVert (6,8) \rVert = 10$ — no square root needed. You can confirm with Pythagoras: $\sqrt{36 + 64} = \sqrt{100} = 10$. This is why factoring scalars out of a magnitude is such a useful move: it turns a messy length into a known length times a clean number.
2.6 What is the midpoint of two vectors, and how do we blend them?
We promised in §2.2 that subtraction is "how do I get from here to there," and we can now cash that out into one of the most-used computations in all of graphics and data: blending two vectors. The tool is a special family of linear combinations whose weights sum to 1.
Geometric Intuition — Imagine the straight segment from the tip of $\mathbf{a}$ to the tip of $\mathbf{b}$. The combination $(1-t)\mathbf{a} + t\,\mathbf{b}$, as the single dial $t$ slides from $0$ to $1$, traces exactly that segment: at $t=0$ you sit on $\mathbf{a}$, at $t=1$ on $\mathbf{b}$, and at $t=\tfrac12$ on the midpoint halfway between. The weights $1-t$ and $t$ always sum to $1$, which is what keeps you on the segment rather than flying off it. This is the geometric meaning of an average: the midpoint is the place where you've spent equal amounts of each endpoint.
The midpoint of $\mathbf{a}$ and $\mathbf{b}$ is the case $t = \tfrac12$, the plain average $\tfrac12\mathbf{a} + \tfrac12\mathbf{b} = \tfrac{1}{2}(\mathbf{a} + \mathbf{b})$. To see why this is a blend and not magic, rewrite it with subtraction: $(1-t)\mathbf{a} + t\mathbf{b} = \mathbf{a} + t(\mathbf{b} - \mathbf{a})$. Read that aloud: "start at $\mathbf{a}$, then walk a fraction $t$ of the way along the displacement $\mathbf{b} - \mathbf{a}$ from $\mathbf{a}$ to $\mathbf{b}$." The subtraction $\mathbf{b} - \mathbf{a}$ is the direction-and-distance from one tip to the other — exactly the reading we built in §2.2 — and $t$ controls how far along you travel.
Hand computation: midpoint and a quarter-blend
Let $\mathbf{a} = \begin{bmatrix} 1 \\ 2 \end{bmatrix}$ and $\mathbf{b} = \begin{bmatrix} 5 \\ 4 \end{bmatrix}$. The midpoint is
$$ \tfrac{1}{2}(\mathbf{a} + \mathbf{b}) = \tfrac{1}{2}\begin{bmatrix} 6 \\ 6 \end{bmatrix} = \begin{bmatrix} 3 \\ 3 \end{bmatrix}, $$
and the quarter-of-the-way point ($t = \tfrac14$) is
$$ \tfrac{3}{4}\mathbf{a} + \tfrac{1}{4}\mathbf{b} = \mathbf{a} + \tfrac14(\mathbf{b}-\mathbf{a}) = \begin{bmatrix} 1 \\ 2 \end{bmatrix} + \tfrac14\begin{bmatrix} 4 \\ 2 \end{bmatrix} = \begin{bmatrix} 2 \\ 2.5 \end{bmatrix}. $$
Both points lie on the segment from $(1,2)$ to $(5,4)$, the midpoint exactly in the middle and the quarter-point nearer to $\mathbf{a}$, as the weights ($\tfrac34$ on $\mathbf{a}$) demand.
numpy verification
# Linear interpolation (lerp): (1-t)*a + t*b traces the segment a -> b.
import numpy as np
a = np.array([1.0, 2.0])
b = np.array([5.0, 4.0])
print((a + b) / 2) # [3. 3.] -> midpoint (t = 0.5)
print(0.75*a + 0.25*b) # [2. 2.5] -> quarter blend (t = 0.25)
print((1-0.5)*a + 0.5*b) # [3. 3.] -> same midpoint, written as a lerp
The outputs [3. 3.], [2. 2.5], and [3. 3.] match the hand computations. Graphics programmers know $(1-t)\mathbf{a} + t\mathbf{b}$ by the name lerp (linear interpolation), and it is everywhere: smoothly animating a sprite from one position to another, fading one color into another (a color is a vector of red/green/blue components), morphing one shape's vertices into another's. Sweeping $t$ from 0 to 1 over many frames is the animation.
The midpoint generalizes from two vectors to many: the average (or centroid) of $k$ vectors is the linear combination with all weights equal to $\tfrac1k$. For three points it is the triangle's center of mass.
# Centroid: the equal-weight average of several position vectors.
import numpy as np
pts = np.array([[0, 0], [4, 0], [2, 3]], dtype=float) # three corners
print(pts.mean(axis=0)) # [2. 1.] -> the centroid (average corner)
The centroid [2. 1.] is $\tfrac13$ of each corner summed. In data science this same operation is the mean of a cluster of data points (each point a vector), and it is the heart of the $k$-means clustering algorithm: repeatedly average the points in each group to find each group's center. Averaging data vectors and finding the midpoint of two arrows are the same linear combination — geometry and data, one operation, exactly the book's second theme.
Check Your Understanding — You want the point one-third of the way from $\mathbf{a} = \begin{bmatrix} 0 \\ 0 \end{bmatrix}$ to $\mathbf{b} = \begin{bmatrix} 9 \\ 6 \end{bmatrix}$. What weights do you use, and where do you land?
Answer
One-third of the way means $t = \tfrac13$, so the weights are $1 - t = \tfrac23$ on $\mathbf{a}$ and $t = \tfrac13$ on $\mathbf{b}$. You land at $\tfrac23\mathbf{a} + \tfrac13\mathbf{b} = \mathbf{a} + \tfrac13(\mathbf{b}-\mathbf{a}) = \tfrac13\begin{bmatrix} 9 \\ 6 \end{bmatrix} = \begin{bmatrix} 3 \\ 2 \end{bmatrix}$ (since $\mathbf{a}$ is the origin here). The larger weight sits on $\mathbf{a}$ because you're staying closer to $\mathbf{a}$ — a good sanity check whenever you blend.Real-World Application — portfolio returns (economics/finance). An investor's portfolio is a linear combination of assets, with the weights being the fractions of money in each asset — and those weights sum to 1, exactly like a blend. If a stock's returns across three economic scenarios are $\mathbf{s} = \begin{bmatrix} 0.10 \\ -0.05 \\ 0.20 \end{bmatrix}$ and a bond's are $\mathbf{r} = \begin{bmatrix} 0.03 \\ 0.04 \\ 0.02 \end{bmatrix}$, then a 60/40 portfolio has returns $0.6\,\mathbf{s} + 0.4\,\mathbf{r} = \begin{bmatrix} 0.072 \\ -0.014 \\ 0.128 \end{bmatrix}$ in those scenarios. Tilting the weights tilts the whole return vector — more stock raises the upside and deepens the downside. Modern portfolio theory is, at its core, choosing the weights of a linear combination of asset vectors to balance expected return against risk; the operation is the one you just learned.
2.7 Components, coordinates, and the 1-indexed/0-indexed gap
We have been writing vectors as columns of numbers, but it pays to be explicit about how those numbers are addressed, because mathematics and code disagree, and the disagreement causes real bugs.
In mathematics, components are 1-indexed: the first component of $\mathbf{v}$ is $v_1$, the second is $v_2$, and the last in $\mathbb{R}^n$ is $v_n$. This matches ordinary counting — first, second, third.
In numpy (and Python, C, and most programming languages), array entries are 0-indexed: the first entry is v[0], the second is v[1], and the last in a length-$n$ array is v[n-1]. So the mathematician's $v_1$ is the programmer's v[0], $v_2$ is v[1], and in general $v_i$ is v[i-1].
Common Pitfall — The off-by-one between math's $v_1$ and numpy's
v[0]is one of the most reliable sources of bugs when you first translate formulas into code. A formula that sums $v_1 + v_2 + \cdots + v_n$ becomes a loopfor i in range(n): total += v[i]— noterange(n)runs0, 1, ..., n-1, not1throughn. When in doubt, write out a tiny example by hand and check thatv[0]really is the number you call $v_1$. We will flag this gap again whenever it bites; this is its first appearance.
# Math is 1-indexed (v_1, v_2, ...); numpy is 0-indexed (v[0], v[1], ...).
import numpy as np
v = np.array([3, 1])
print(v[0]) # 3 -> this is what we call v_1
print(v[1]) # 1 -> this is what we call v_2
print(len(v)) # 2 -> the dimension n
The output 3, 1, 2 shows v[0] holding the value we name $v_1$. Throughout this book, when prose says "the $i$-th component $v_i$," the corresponding code is v[i-1]; we will write code in 0-indexed numpy and math in 1-indexed notation, and you should expect to translate.
Why do we stack components in a column instead of a row? Pure convention, but a load-bearing one. By default in this book (and in Strang, and in most of physics) a vector is a column, an $n \times 1$ array. The reason is forward-looking: in Chapter 7, multiplying a matrix $A$ by a vector $\mathbf{x}$ is written $A\mathbf{x}$ with the vector on the right as a column, and the shapes only line up if $\mathbf{x}$ is a column. A row vector is the transpose, $\mathbf{v}^{\mathsf{T}}$, and it plays a different role (it eats column vectors to produce numbers, via the dot product of Chapter 18). numpy blurs this distinction — a 1-D array
np.array([3,1])is neither row nor column until you make it 2-D — which is convenient but occasionally confusing; we'll be explicit about shapes whenever it matters.
2.8 Seeing vectors move: scaling and adding with the visualizer
Vectors are the things matrices move, so it is worth a moment to see one being scaled and added, using the recurring 2D visualizer from Chapter 1. The visualizer's main job is to show what a $2\times 2$ matrix does to the whole unit square, and we will use it that way constantly starting in Chapter 7. But scalar multiplication is the very simplest transformation, so it makes a gentle first reuse.
Consider the matrix that scales the $x$-direction by 2 and the $y$-direction by 3:
$$ A = \begin{bmatrix} 2 & 0 \\ 0 & 3 \end{bmatrix}. $$
Applying $A$ to the standard basis vector $\mathbf{e}_1 = \begin{bmatrix}1\\0\end{bmatrix}$ gives $\begin{bmatrix}2\\0\end{bmatrix}$ — that's $\mathbf{e}_1$ scaled by 2 — and applying it to $\mathbf{e}_2 = \begin{bmatrix}0\\1\end{bmatrix}$ gives $\begin{bmatrix}0\\3\end{bmatrix}$, which is $\mathbf{e}_2$ scaled by 3. (We are previewing Chapter 7's central fact: the columns of a matrix are the images of the standard basis vectors.) So this matrix performs scalar multiplication on each basis direction independently, and the visualizer will show the unit square stretched into a $2\times 3$ rectangle.
# Using the visualizer from Chapter 1 to SEE scalar multiplication on basis vectors.
import numpy as np
import matplotlib.pyplot as plt
from toolkit.visualizer import visualize_2d
A = np.array([[2, 0], [0, 3]]) # scale x by 2, y by 3
visualize_2d(A, title="Scaling: e1 -> 2*e1, e2 -> 3*e2")
plt.show()
print(np.linalg.det(A)) # 6.0 -> the area scale factor (Chapter 11)
Figure 2.1. The unit square (blue dashed) stretched into a $2 \times 3$ rectangle (orange) by the scaling matrix $A = \mathrm{diag}(2,3)$. The red arrow is $A\mathbf{e}_1 = (2,0)$ and the green arrow is $A\mathbf{e}_2 = (0,3)$ — each basis vector simply scaled. Alt-text: a small dashed unit square at the origin and a larger solid orange rectangle two units wide and three units tall, with a horizontal red arrow of length 2 and a vertical green arrow of length 3.
The printed determinant 6.0 is the area of the output rectangle ($2 \times 3 = 6$), and it equals the product of the two scale factors. We will not unpack determinants until Chapter 11; for now just notice that the visualizer is already whispering the connection between scaling vectors and scaling area. When the matrix is more interesting than a pure scaling — a shear, a rotation — the same tool will show genuine reshaping, but the principle is identical: a matrix acts on space by acting on vectors.
Geometric Intuition — Why use the visualizer for something as simple as scaling? Because it plants a habit: every time you meet a new matrix, ask "what does it do to the unit square and the basis arrows?" The answer is a picture, and the picture is the meaning. Here the meaning is "stretch horizontally by 2, vertically by 3," and you can read it off the figure without computing anything. This reflex — matrix as motion — is the single most valuable thing this book is trying to give you, and it starts with watching one vector get longer.
2.9 Vectors as data: when there is no arrow to draw
Everything so far has had a geometric picture, but the list view of a vector is powerful precisely because it keeps working when the picture runs out. This is where CS and data science live, so it deserves its own section.
Consider a house for sale, described by four numbers: square footage, bedrooms, bathrooms, and year built. Stack them and you have a vector in $\mathbb{R}^4$:
$$ \mathbf{h} = \begin{bmatrix} 1500 \\ 3 \\ 2 \\ 1995 \end{bmatrix}. $$
There is no arrow to draw — $\mathbb{R}^4$ has no faithful picture on a page — but every operation we defined still works, and now means something about data. Adding two house-vectors and dividing by 2 (a linear combination with weights $\tfrac12, \tfrac12$) gives the componentwise average house: the mean square footage, mean bedroom count, and so on. Subtracting one house-vector from another gives a difference vector whose components say how the two houses differ feature by feature. Scaling a feature vector rescales its units. The geometry is gone, but the algebra is unchanged — and that invariance is exactly why "a vector is a list of numbers" is worth taking seriously even when you can't draw it.
# Vectors as data records: arithmetic still works in R^4 (no picture needed).
import numpy as np
house_a = np.array([1500, 3, 2, 1995])
house_b = np.array([2100, 4, 3, 2010])
print((house_a + house_b) / 2) # [1800. 3.5 2.5 2002.5] -> the "average" house
print(house_b - house_a) # [600 1 1 15] -> feature-by-feature gap
The averaged vector [1800. 3.5 2.5 2002.5] is a linear combination of two data points; the difference [600 1 1 15] reports that house B is 600 sq ft larger, has one more bed and bath, and is 15 years newer. A spreadsheet with a million rows of houses is just a million vectors in $\mathbb{R}^4$ — and a matrix, as we'll see, is exactly a stack of vectors. This is the doorway from linear algebra into machine learning: a dataset is a cloud of points in a high-dimensional space, and learning is finding structure in that cloud.
Real-World Application — word embeddings (natural language processing). Modern language models turn each word into a vector of a few hundred numbers, a word embedding, learned so that words used in similar contexts get nearby vectors. The arrow picture is hopeless in 300 dimensions, but the list picture is exactly right: "king," "queen," "man," and "woman" become four vectors, and the famous observation that $\mathbf{king} - \mathbf{man} + \mathbf{woman}$ lands near $\mathbf{queen}$ is literally vector addition and subtraction — the same componentwise operations from §2.2, in a space too big to draw. The "direction" from man to woman encodes a gender relationship, and adding it to "king" walks you to "queen." Case Study 2 develops this in detail. The takeaway: the operations you just learned on 2D arrows are the exact same operations powering systems that read and write language — geometry gives the intuition, the list makes it computable at scale. For the calculus of how such models learn these vectors by following gradients, see vectors in calculus; to actually draw the 2D and 3D vectors of this chapter on real axes, see plotting vectors.
Is a vector the same as an array, then? Almost, with one caveat worth stating. In code, a vector is a 1-D array — that is its representation. But mathematically a vector carries more than its entries: it carries the promise that you can add it to other vectors of the same dimension and scale it by numbers, and that these operations obey the eight rules of §2.4. An arbitrary array (say, a list of names, or a list of unrelated settings) is not a vector unless adding and scaling make sense for it. The structure — addition and scaling — is what makes a list of numbers a vector. Keep the distinction in mind and you will avoid the beginner's error of treating every tuple of numbers as something you can meaningfully add.
2.10 How do velocities and displacements combine? A second application
Let's rotate the field once more, to motion, because the displacement reading of vectors is where the arrow picture is most vivid — and it is not physics-only, as the navigation and graphics framings show.
When a character in a video game moves, its new position is its old position plus a displacement vector. If the character then receives a second displacement — a push from an explosion, say, or a second input from the player — the net displacement is the vector sum, by exactly the tip-to-tail rule of §2.2. The engine never reasons about "first this, then that" as separate steps when it only needs the result; it just adds the vectors. Velocities combine the same way: a swimmer's velocity relative to the water plus the water's velocity relative to the ground equals the swimmer's velocity relative to the ground — vector addition again.
Here is a concrete navigation version. An airplane's airspeed vector (its velocity relative to the air) points due east at 200 km/h: $\mathbf{p} = \begin{bmatrix} 200 \\ 0 \end{bmatrix}$. A wind blows due north at 30 km/h: $\mathbf{w} = \begin{bmatrix} 0 \\ 30 \end{bmatrix}$. The plane's actual velocity over the ground is the sum, and its ground speed is the magnitude of that sum:
$$ \mathbf{g} = \mathbf{p} + \mathbf{w} = \begin{bmatrix} 200 \\ 30 \end{bmatrix}, \qquad \lVert \mathbf{g} \rVert = \sqrt{200^2 + 30^2} = \sqrt{40900} \approx 202.24 \text{ km/h}. $$
# Navigation: ground velocity = airspeed + wind; ground speed = its magnitude.
import numpy as np
plane = np.array([200.0, 0.0]) # airspeed, due east
wind = np.array([0.0, 30.0]) # wind, due north
ground = plane + wind
print(ground) # [200. 30.]
print(np.linalg.norm(ground)) # 202.23748416156684
The ground velocity [200. 30.] and ground speed 202.237... km/h match the hand computation. Notice both operations of this chapter appear: addition combines the two velocity vectors, and magnitude extracts the resulting speed from the resulting velocity. The wind barely changes the speed (202 vs. 200) but it does nudge the heading slightly north — a fact a pilot must correct for, and a fact that falls straight out of vector addition. The same arithmetic places a spaceship sprite on a screen, steers a robot, and combines forces in a simulation; learn it once, use it everywhere, just as the book's fourth theme promises.
Real-World Application — collision and steering in games. Game physics engines represent every position, velocity, and force as a vector and combine them by addition; "steering behaviors" (seek, flee, arrive) compute a desired-velocity vector and add a correction vector each frame. When two objects collide, the engine adds an impulse vector to each object's velocity. The entire feel of movement in a game — momentum, knockback, drift — is vector addition and scalar multiplication evaluated 60 times a second. The arrow you slid tip-to-tail in §2.2 is, frame by frame, how things move on screen.
Why does a 30 km/h crosswind barely dent a 200 km/h ground speed? Because lengths combine through Pythagoras, not by simple addition. The speeds don't add to $230$; the vectors add, and the magnitude of the sum is $\sqrt{200^2 + 30^2} \approx 202.24$ — only about 2 km/h more than the airspeed. A component perpendicular to a large one contributes to the total length only through its square, so a small sideways push is nearly invisible in the magnitude (though it does change the heading). This squared-contribution effect is everywhere in linear algebra, and it is exactly why a small amount of noise added orthogonally to a strong signal changes the signal's length very little — an idea we revisit with the dot product in Chapter 18.
2.11 What color is that, as a vector? RGB and the geometry of blending
Let's rotate the field one last time, into something you look at every waking second: the color on a screen. It turns out a color is a vector, and the two operations of this chapter — addition and scaling — are exactly how graphics software manufactures, dims, and crossfades color. This is the cleanest place in the whole chapter to see a 3D vector earn its keep, because here the three components are not abstract data but three things your eye can actually distinguish.
A screen makes color by mixing three lights — red, green, and blue — at independently chosen intensities. Stack those three intensities and you have a vector in $\mathbb{R}^3$, the RGB color vector. Using the common 0-to-255 byte convention for each channel,
$$ \mathbf{c}_{\text{red}} = \begin{bmatrix} 255 \\ 0 \\ 0 \end{bmatrix}, \qquad \mathbf{c}_{\text{green}} = \begin{bmatrix} 0 \\ 255 \\ 0 \end{bmatrix}, \qquad \mathbf{c}_{\text{blue}} = \begin{bmatrix} 0 \\ 0 \\ 255 \end{bmatrix}. $$
These three are exactly the scaled standard basis vectors $255\,\mathbf{e}_1$, $255\,\mathbf{e}_2$, $255\,\mathbf{e}_3$ of $\mathbb{R}^3$ — the "primary" directions of color space. Black is the zero vector $\mathbf{0} = (0,0,0)$ (no light at all), and white is $(255,255,255)$ (all three channels at full blast).
Geometric Intuition — The set of all displayable colors fills a cube in $\mathbb{R}^3$: each axis is one channel, running from 0 to 255, and every point inside is a color. Black sits at the origin corner, white at the opposite corner, and the pure red, green, and blue arrows run out along the three axes. Mixing colors is vector addition — walking along one axis and then another — and dimming a color is scalar multiplication toward the black corner. The whole of basic color manipulation is just moving around inside this cube with the two operations you already own.
Now watch the operations mean something visual. Adding red and green, $\mathbf{c}_{\text{red}} + \mathbf{c}_{\text{green}} = (255,255,0)$, gives yellow — a fact about light that surprises anyone raised on mixing paints, but it falls straight out of componentwise addition. Scaling a color by a number between 0 and 1 dims it without changing its hue: half-brightness yellow is $0.5\,(255,255,0) = (127.5, 127.5, 0)$, the same color seen in dimmer light. And the linear-combination machinery from §2.4 is exactly how graphics tools build a custom color from the three primaries: pick the weights, scale each primary, add.
Hand computation: mixing and dimming a 3D color
Let's confirm the yellow and its dimmed version by hand, treating colors as the 3-vectors they are:
$$ \mathbf{c}_{\text{red}} + \mathbf{c}_{\text{green}} = \begin{bmatrix} 255 \\ 0 \\ 0 \end{bmatrix} + \begin{bmatrix} 0 \\ 255 \\ 0 \end{bmatrix} = \begin{bmatrix} 255 \\ 255 \\ 0 \end{bmatrix} \;(\text{yellow}), \qquad 0.5\begin{bmatrix} 255 \\ 255 \\ 0 \end{bmatrix} = \begin{bmatrix} 127.5 \\ 127.5 \\ 0 \end{bmatrix}. $$
The addition runs channel by channel, with no interference between red, green, and blue — the very same independence that made tip-to-tail addition work in §2.2, now in three dimensions instead of two.
# A color is a vector in R^3; mixing is addition, dimming is scalar multiplication.
import numpy as np
red = np.array([255.0, 0.0, 0.0])
green = np.array([ 0.0, 255.0, 0.0])
yellow = red + green
print(yellow) # [255. 255. 0.] -> red + green = yellow
print(0.5 * yellow) # [127.5 127.5 0. ] -> same hue, half brightness
The outputs [255. 255. 0.] and [127.5 127.5 0.] match the hand results. Notice the third channel stays at 0 throughout: yellow has no blue in it, and dimming cannot conjure any, because scaling never changes which axes a vector touches — exactly the §2.3 lesson that scaling keeps you on your own line through the origin, now read as "dimming keeps the hue."
Blending two colors is the lerp from §2.6
Here is where the chapter's threads tie together. A color gradient — the smooth fade from one color to another across a button, a sky, or a progress bar — is nothing but the linear interpolation $(1-t)\mathbf{a} + t\mathbf{b}$ from §2.6, run on two color vectors as $t$ sweeps from 0 to 1. The midpoint of red and blue ($t = \tfrac12$) is
$$ \tfrac12\,\mathbf{c}_{\text{red}} + \tfrac12\,\mathbf{c}_{\text{blue}} = \tfrac12\begin{bmatrix} 255 \\ 0 \\ 255 \end{bmatrix} = \begin{bmatrix} 127.5 \\ 0 \\ 127.5 \end{bmatrix}, $$
a muted purple sitting exactly halfway along the segment from red to blue inside the color cube. Slide $t$ further and you trace the whole gradient.
# A color gradient is a lerp between two color vectors (the §2.6 blend).
import numpy as np
red = np.array([255.0, 0.0, 0.0])
blue = np.array([ 0.0, 0.0, 255.0])
for t in (0.0, 0.25, 0.5, 1.0):
print(t, (1-t)*red + t*blue)
# 0.0 [255. 0. 0.] -> pure red
# 0.25 [191.25 0. 63.75] -> reddish purple
# 0.5 [127.5 0. 127.5] -> purple (the midpoint)
# 1.0 [ 0. 0. 255.] -> pure blue
The printed values match: at $t=0$ the blend is pure red, at $t=\tfrac12$ it is the purple midpoint $(127.5, 0, 127.5)$, and at $t=1$ it is pure blue, with $t=0.25$ giving the in-between $(191.25, 0, 63.75)$ that leans toward red. This single loop is a one-dimensional gradient. Every fade-to-black transition, every two-color theme, every alpha-blend that composites a semi-transparent layer over a background runs this exact arithmetic per pixel — which is why a graphics card is, at heart, a machine for evaluating linear combinations of vectors at enormous speed.
Common Pitfall — Real channels are clamped to the valid range, so vector arithmetic can leave the cube and must be pulled back in. Adding two bright colors like $(200, 100, 0) + (100, 200, 0) = (300, 300, 0)$ produces components above 255, which no display can show; the software clamps each channel back to 255 before drawing. So color addition is "add the vectors, then clamp into $[0,255]^3$," not pure vector addition — a reminder that a real application often wraps a clean linear operation in a domain constraint. The math is the linear combination; the clamp is the screen's physical limit.
Real-World Application — color in graphics, design, and image processing (graphics/data science). Treating color as a vector in $\mathbb{R}^3$ is the foundation of nearly all digital imaging. A photograph is a grid of RGB vectors; brightening it scales every pixel vector, and a crossfade between two video frames is a per-pixel lerp. Image editors compute the "difference" between two photos as the vector difference of their pixels (the basis of change-detection and compression), and the "distance" between two colors — how different they look — is a magnitude $\lVert \mathbf{c}_1 - \mathbf{c}_2 \rVert$, exactly the distance-is-length-of-a-difference idea from §2.5. Even the average color of an image is the centroid (§2.6) of all its pixel vectors. The operations you learned on 2D arrows are, pixel by pixel, how every image on every screen is made and edited. To plot these RGB vectors and color cubes on real axes, see plotting vectors.
2.12 Build your toolkit: vectors.py
You are going to build a working linear-algebra library from scratch over the course of this book, one chapter at a time, with no numpy inside the implementations — numpy appears only to check your work. Chapter 1 gave you the display helper visualizer.py. This chapter starts the real mathematics.
Build Your Toolkit. Create
toolkit/vectors.pyand implement three pure-Python functions on vectors represented as plain Python lists: -add(u, v)— return the componentwise sum; raise aValueErroriflen(u) != len(v)(the dimension condition from the §2.2 Warning). -scale(c, v)— return the vector with every component multiplied by the scalarc. -magnitude(v)— return the Euclidean length $\sqrt{v_1^2 + \cdots + v_n^2}$ (usemath.sqrt, not numpy).Then verify against numpy: for several test vectors, check that
add(u, v)equals(np.array(u) + np.array(v)).tolist(), thatscale(c, v)matchesc * np.array(v), and thatmagnitude(v)is within floating-point tolerance ofnp.linalg.norm(v). A reference sketch:```python
toolkit/vectors.py — vectors from scratch (no numpy in the implementation).
import math
def add(u, v): if len(u) != len(v): raise ValueError(f"dimension mismatch: {len(u)} vs {len(v)}") return [u[i] + v[i] for i in range(len(u))]
def scale(c, v): return [c * v[i] for i in range(len(v))]
def magnitude(v): return math.sqrt(sum(v[i] * v[i] for i in range(len(v)))) ```
Note the 0-indexing (
range(len(v))runs0 .. len-1) — the programmer'sv[0]is the mathematician's $v_1$, exactly as in §2.7. In Chapter 18 you'll extend this module withdot,norm, andangle; themagnitudeyou write today is the seed of the general norm.
A quick verification run confirms the three functions agree with numpy:
# Verify the from-scratch functions against numpy.
import numpy as np
from toolkit.vectors import add, scale, magnitude
u, v = [3, 1], [1, 2]
print(add(u, v)) # [4, 3]
print(scale(3, v)) # [3, 6]
print(magnitude([3, 4])) # 5.0
print(np.allclose(add(u, v), np.array(u) + np.array(v))) # True
The outputs [4, 3], [3, 6], 5.0, and True show the pure-Python results matching numpy's. Building these by hand — even though numpy already has them — is the point: you understand an operation when you can implement it, and you will lean on that understanding when later operations (Gaussian elimination, the determinant, the SVD) are too subtle to take on faith.
Why build it from scratch when numpy already has it? Two reasons. First, understanding: a function you have implemented holds no mysteries, and the toolkit's later modules — solving systems, inverting matrices, finding eigenvalues — are exactly the operations whose inner workings you must understand to use them well and to debug them when they misbehave. Second, trust through verification: by checking each from-scratch function against numpy you learn to treat the library not as an oracle but as an independent witness, which is the right professional habit. The rule for the whole toolkit is simple — no numpy inside the implementation, numpy only in the check — and
vectors.pyis where you adopt it.Common Pitfall — When you verify floating-point results, never test
magnitude(v) == np.linalg.norm(v)with exact equality. Real arithmetic and floating-point arithmetic round differently, so1.4142135623730951from one path may differ in its last bit from the other. Always compare with a tolerance —math.isclose(...)or numpy'snp.allclose(...)— which asks "are these equal up to rounding?" rather than "are these bit-for-bit identical?" Exact-equality tests on floats are a classic source of tests that fail mysteriously on a different machine.
2.13 Summary and the road ahead
We answered the title question twice. A vector is an arrow — a direction and a magnitude, free to slide because position is not part of its identity — and a vector is a list of components, an element of $\mathbb{R}^n$. The bridge between them is a coordinate system, and fluency is the ability to cross that bridge without thinking.
On these objects there are exactly two operations. Addition composes displacements tip-to-tail (the parallelogram rule) and is computed componentwise; from the picture alone we saw it must be commutative. Scalar multiplication stretches, shrinks, or flips a vector along its own line through the origin, and is computed by scaling every component; it scales length by $|c|$. Together they generate linear combinations, the single most important construction in the subject and the thing every later chapter is secretly about. We also gave magnitude an informal home via the Pythagorean theorem, saving the full theory of the norm for Chapter 18, and we confronted the 1-indexed math / 0-indexed numpy gap that will trip you up in code if you let it.
So what is the single thing to remember from this chapter? That a vector is simultaneously an arrow and a list, and that add and scale are the only two operations — everything else is a linear combination of those. If you keep just that, the rest of the book has a place to attach.
Where this goes: in Chapter 3 we ask which vectors $\mathbf{b}$ can be written as linear combinations of a given set — that question, in the language of equations, is a system of linear equations, and its geometry is intersecting lines and planes. In Chapter 6 the same question becomes span and linear independence. And in Chapter 7 the linear combination $c_1 \mathbf{a}_1 + \cdots + c_n \mathbf{a}_n$ of a matrix's columns turns out to be matrix–vector multiplication — the moment matrices reveal themselves as the functions that transform space. Every one of those ideas is built from the two operations you now own. You have the raw materials; next we start combining them.
The Key Insight — Two operations — add and scale — are the entire foundation. Master them as both motion and arithmetic, and the rest of linear algebra is the study of what you can build by combining vectors and how transformations respect that combining. Everything is a linear combination.