Chapter 29 — Quiz

Twelve conceptual checks on PageRank, stochastic matrices, and power iteration. Try each before opening the answer. One-line explanations follow every solution.

1. PageRank is which part of which eigenpair, and what is the value of the eigenvalue?

Answer PageRank is the **eigenvector** (one rank per page) belonging to **eigenvalue $\lambda = 1$** of the Google matrix. *The eigenvalue is always $1$ — fixed by the stochastic structure — so all the ranking information lives in the eigenvector's entries.*

2. In the link matrix $M$, what does it mean that column $j$ sums to $1$? What does it mean if column $j$ sums to $0$?

Answer Column $j$ summing to $1$ means page $j$ gives away all of its rank, split among its outgoing links (it is column-stochastic). Summing to $0$ means page $j$ has **no outgoing links** — it is a **dangling node**, and its column must be patched. *Column sums encode conservation of rank: a normal page passes its rank along; a dead end leaks it.*

3. Why does the stationary distribution belong to eigenvalue $1$ and not to a larger or smaller eigenvalue?

Answer Because multiplying a probability distribution by a stochastic matrix returns a probability distribution — total weight is conserved. An eigenvalue $>1$ would inflate the total; $|\lambda|<1$ would bleed it away. Only $\lambda = 1$ preserves the total exactly, so the steady state lives there. *The eigenvalue is $1$ precisely because probability is conserved.*

4. What is a dangling node, and what symptom tells you the raw matrix has one when you run power iteration?

Answer A dangling node is a page with no outgoing links (an all-zero column in $M$). The symptom is **leaking probability**: the iterate's entries shrink toward zero each step (the total is no longer $1$), and power iteration converges to the useless zero vector. *A dead end swallows rank and returns none, so the total drains away.*

5. Write the Google matrix formula and say, in words, what each of its two terms means.

Answer $G = d\,S + \tfrac{1-d}{n}\mathbf{1}\mathbf{1}^{\mathsf{T}}$. The first term: with probability $d$ (about $0.85$), follow a link as described by the patched stochastic matrix $S$. The second term: with probability $1-d$, **teleport** to a uniformly random page (every entry $1/n$). *Damping blends "follow the links" with a small "jump anywhere," which is what makes $G$ positive.*

6. State the four conclusions of the Perron-Frobenius theorem for a positive stochastic matrix, and name the hypothesis the Google matrix is built to satisfy.

Answer For a **positive** (every entry $>0$) column-stochastic $G$: (i) $1$ is an eigenvalue; (ii) it is *simple* (unique stationary distribution); (iii) its eigenvector can be chosen *strictly positive*; (iv) every other $|\lambda_i| < 1$ (a *spectral gap*). The hypothesis is **positivity** (equivalently, primitivity = irreducible + aperiodic), which the teleport term guarantees. *Damping forces positivity, which is exactly what PF requires for a unique, fast-to-compute answer.*

7. Why does the damping factor make power iteration converge, and what sets the convergence rate?

Answer Damping pulls every eigenvalue except the Perron $1$ strictly inside the unit circle, creating a spectral gap. Power iteration's error shrinks like $|\lambda_2|^t$, and for the Google matrix $|\lambda_2| \le d$. So the convergence rate is (at worst) the damping factor itself — about $0.85$ per step. *No spectral gap, no convergence; the gap *is* the rate.*

8. A random surfer follows links with probability $0.85$ and teleports with probability $0.15$. After landing on a dangling page (no links), what does she do in the patched model, and why is this the same as teleporting?

Answer At a dangling page she jumps to a uniformly random page — exactly the dangling-node patch, which replaces the dead-end column with the uniform column $1/n$. It is effectively a forced teleport: with no links to follow, the only sensible move is a random jump. *Dead ends are modeled as "teleport with probability $1$."*

9. True or false: a page with no incoming links must have PageRank $0$. Explain.

Answer **False** in the damped model. Every page receives a baseline rank from the teleport term ($\tfrac{1-d}{n}$ of the surfers land there by jumping) plus any spillover from dangling-node patching. Perron-Frobenius conclusion (iii) guarantees *strictly positive* rank for every page. (It would be true for the *undamped* raw matrix.) *Teleportation gives every page a positive floor.*

10. A two-page web has links $1 \to 2$ and $2 \to 1$ only. The raw matrix is $\begin{psmallmatrix}0&1\\1&0\end{psmallmatrix}$. Why does power iteration on this raw matrix fail to converge, and what fixes it?

Answer Its eigenvalues are $+1$ and $-1$ — *tied in magnitude*. Power iteration's rate is $|\lambda_2|/|\lambda_1| = 1$, so the error never shrinks; the surfer oscillates $1 \to 2 \to 1 \to 2$ forever (a periodic trap). Damping fixes it: $G$'s subdominant eigenvalue becomes $d\cdot(-1) = -0.85$, strictly inside the unit circle, so iteration converges to $(\tfrac12,\tfrac12)$. *A magnitude tie stalls power iteration; damping breaks the tie.*

11. PageRank uses only the link structure of the web and ignores the words on a page. Is PageRank, by itself, the complete ranking a modern search engine uses?

Answer No. PageRank is one *query-independent* importance signal computed offline from links. A modern search engine combines it with hundreds of other signals — text relevance to the query, freshness, user behavior, machine-learned models — to produce final results. PageRank was the historic breakthrough, but it is one input among many today. *Link-importance is necessary context, not the whole ranking.*

12. Why is power iteration — rather than np.linalg.eig or solving $(G-I)\mathbf{r}=\mathbf{0}$ — the algorithm actually used to rank the web?

Answer Because the web has billions of pages: the dense Google matrix cannot be stored, and an $O(n^3)$ eigen-decomposition is hopeless at that scale. Power iteration needs only repeated **matrix-times-vector** products, and that product can be computed from the *sparse* list of links (work proportional to the number of links, not $n^2$). The guaranteed spectral gap means a few dozen iterations suffice. *Power iteration exploits sparsity and the spectral gap; direct solvers exploit neither.*