Chapter 29 — Quiz

Q: PageRank is which part of which eigenpair, and what is the value of the eigenvalue?

PageRank is the eigenvector (one rank per page) belonging to eigenvalue of the Google matrix. The eigenvalue is always — fixed by the stochastic structure — so all the ranking information lives in the eigenvector's entries.

Q: Write the Google matrix formula and say, in words, what each of its two terms means.

. The first term: with probability (about ), follow a link as described by the patched stochastic matrix . The second term: with probability , teleport to a uniformly random page (every entry ). Damping blends "follow the links" with a small "jump anywhere," which is what makes positive.

DataField.Dev

Chapter 29 — Quiz

Twelve conceptual checks on PageRank, stochastic matrices, and power iteration. Try each before opening the answer. One-line explanations follow every solution.

1. PageRank is which part of which eigenpair, and what is the value of the eigenvalue?

Answer

PageRank is the **eigenvector** (one rank per page) belonging to **eigenvalue $\lambda = 1$** of the Google matrix. *The eigenvalue is always $1$ — fixed by the stochastic structure — so all the ranking information lives in the eigenvector's entries.*

2. In the link matrix $M$, what does it mean that column $j$ sums to $1$? What does it mean if column $j$ sums to $0$?

Answer

Column $j$ summing to $1$ means page $j$ gives away all of its rank, split among its outgoing links (it is column-stochastic). Summing to $0$ means page $j$ has **no outgoing links** — it is a **dangling node**, and its column must be patched. *Column sums encode conservation of rank: a normal page passes its rank along; a dead end leaks it.*

3. Why does the stationary distribution belong to eigenvalue $1$ and not to a larger or smaller eigenvalue?

Answer

Because multiplying a probability distribution by a stochastic matrix returns a probability distribution — total weight is conserved. An eigenvalue $>1$ would inflate the total; $|\lambda|<1$ would bleed it away. Only $\lambda = 1$ preserves the total exactly, so the steady state lives there. *The eigenvalue is $1$ precisely because probability is conserved.*

4. What is a dangling node, and what symptom tells you the raw matrix has one when you run power iteration?

Answer

A dangling node is a page with no outgoing links (an all-zero column in $M$). The symptom is **leaking probability**: the iterate's entries shrink toward zero each step (the total is no longer $1$), and power iteration converges to the useless zero vector. *A dead end swallows rank and returns none, so the total drains away.*

5. Write the Google matrix formula and say, in words, what each of its two terms means.

Answer

$G = d\,S + \tfrac{1-d}{n}\mathbf{1}\mathbf{1}^{\mathsf{T}}$. The first term: with probability $d$ (about $0.85$), follow a link as described by the patched stochastic matrix $S$. The second term: with probability $1-d$, **teleport** to a uniformly random page (every entry $1/n$). *Damping blends "follow the links" with a small "jump anywhere," which is what makes $G$ positive.*

6. State the four conclusions of the Perron-Frobenius theorem for a positive stochastic matrix, and name the hypothesis the Google matrix is built to satisfy.

Answer

For a **positive** (every entry $>0$) column-stochastic $G$: (i) $1$ is an eigenvalue; (ii) it is *simple* (unique stationary distribution); (iii) its eigenvector can be chosen *strictly positive*; (iv) every other $|\lambda_i| < 1$ (a *spectral gap*). The hypothesis is **positivity** (equivalently, primitivity = irreducible + aperiodic), which the teleport term guarantees. *Damping forces positivity, which is exactly what PF requires for a unique, fast-to-compute answer.*

7. Why does the damping factor make power iteration converge, and what sets the convergence rate?

Answer

Damping pulls every eigenvalue except the Perron $1$ strictly inside the unit circle, creating a spectral gap. Power iteration's error shrinks like $|\lambda_2|^t$, and for the Google matrix $|\lambda_2| \le d$. So the convergence rate is (at worst) the damping factor itself — about $0.85$ per step. *No spectral gap, no convergence; the gap *is* the rate.*

8. A random surfer follows links with probability $0.85$ and teleports with probability $0.15$. After landing on a dangling page (no links), what does she do in the patched model, and why is this the same as teleporting?

Answer

At a dangling page she jumps to a uniformly random page — exactly the dangling-node patch, which replaces the dead-end column with the uniform column $1/n$. It is effectively a forced teleport: with no links to follow, the only sensible move is a random jump. *Dead ends are modeled as "teleport with probability $1$."*

9. True or false: a page with no incoming links must have PageRank $0$. Explain.

Answer

**False** in the damped model. Every page receives a baseline rank from the teleport term ($\tfrac{1-d}{n}$ of the surfers land there by jumping) plus any spillover from dangling-node patching. Perron-Frobenius conclusion (iii) guarantees *strictly positive* rank for every page. (It would be true for the *undamped* raw matrix.) *Teleportation gives every page a positive floor.*

10. A two-page web has links $1 \to 2$ and $2 \to 1$ only. The raw matrix is $\begin{psmallmatrix}0&1\\1&0\end{psmallmatrix}$. Why does power iteration on this raw matrix fail to converge, and what fixes it?

Answer

Its eigenvalues are $+1$ and $-1$ — *tied in magnitude*. Power iteration's rate is $|\lambda_2|/|\lambda_1| = 1$, so the error never shrinks; the surfer oscillates $1 \to 2 \to 1 \to 2$ forever (a periodic trap). Damping fixes it: $G$'s subdominant eigenvalue becomes $d\cdot(-1) = -0.85$, strictly inside the unit circle, so iteration converges to $(\tfrac12,\tfrac12)$. *A magnitude tie stalls power iteration; damping breaks the tie.*

11. PageRank uses only the link structure of the web and ignores the words on a page. Is PageRank, by itself, the complete ranking a modern search engine uses?

Answer

No. PageRank is one *query-independent* importance signal computed offline from links. A modern search engine combines it with hundreds of other signals — text relevance to the query, freshness, user behavior, machine-learned models — to produce final results. PageRank was the historic breakthrough, but it is one input among many today. *Link-importance is necessary context, not the whole ranking.*

12. Why is power iteration — rather than np.linalg.eig or solving $(G-I)\mathbf{r}=\mathbf{0}$ — the algorithm actually used to rank the web?

Answer

Because the web has billions of pages: the dense Google matrix cannot be stored, and an $O(n^3)$ eigen-decomposition is hopeless at that scale. Power iteration needs only repeated **matrix-times-vector** products, and that product can be computed from the *sparse* list of links (work proportional to the number of links, not $n^2$). The guaranteed spectral gap means a few dozen iterations suffice. *Power iteration exploits sparsity and the spectral gap; direct solvers exploit neither.*