Application: PageRank — How Google Ranks the Internet with Eigenvectors

DataField.Dev

52 min read

> Learning paths. Math majors — read everything, especially the Perron-Frobenius conditions in §29.5 and the convergence argument in §29.7; the Math-Major Sidebar makes the stochastic-matrix spectrum precise. CS / Data Science — this chapter is for...

Prerequisites

chapter-23-eigenvalues-and-eigenvectors

Learning Objectives

Model a set of web pages as a directed graph and turn it into a column-stochastic link/transition matrix.
Explain the random-surfer model as repeated multiplication by a stochastic matrix, and the rank vector as its stationary distribution.
Identify the PageRank vector as the dominant eigenvector of the Google matrix, belonging to eigenvalue 1.
State the Perron-Frobenius conditions and explain how the damping factor builds the Google matrix that guarantees a unique rank.
Diagnose the dangling-node and reducibility failures of the raw link matrix and explain why damping fixes both.
Run power iteration on a small web by hand and in numpy, and verify the result is the eigenvalue-1 eigenvector against np.linalg.eig.

In This Chapter

29.1 What does it mean to rank the web? The graph picture
29.2 How do you turn a web of links into a matrix?
29.3 What is a column-stochastic matrix, and why is it the right object?
29.4 Why is PageRank an eigenvector? The stationary distribution
29.5 What breaks, and what is the Google matrix? Dangling nodes, dead ends, and damping
29.6 Computing PageRank on a small web, by hand and in numpy
29.7 Why is power iteration the right algorithm? Convergence to the dominant eigenvector
29.8 What goes wrong without the fixes? Seeing the failures
29.9 Build it yourself: the pagerank toolkit function
29.10 Putting it together: the eigenvalue story, paid off
29.11 Frequently asked questions

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Application: PageRank — How Google Ranks the Internet with Eigenvectors

Learning paths. Math majors — read everything, especially the Perron-Frobenius conditions in §29.5 and the convergence argument in §29.7; the Math-Major Sidebar makes the stochastic-matrix spectrum precise. CS / Data Science — this chapter is for you above all: focus on the graph-to-matrix construction, the random-surfer model, the Google-matrix fix, and the power-iteration code that is the algorithm. Physics / Engineering — read PageRank as a Markov chain reaching equilibrium; the stationary-distribution and spectral-gap ideas are the same ones that govern relaxation toward steady state in any linear dynamical system.

In 1996, two Stanford graduate students faced a problem that the entire web-search industry had failed to solve: given a search query, which of the millions of matching pages should you show first? The reigning search engines ranked pages mostly by counting how often the query words appeared on a page — a method so easy to game that the top results were routinely spam, pages stuffed with invisible keywords. Larry Page and Sergey Brin had a different idea, and it was a linear-algebra idea. Forget the words on a page for a moment, they reasoned; look instead at the links between pages. A link from page X to page Y is a kind of vote — X is vouching for Y, recommending that a reader go there. The web, taken as a whole, is an enormous network of these votes, and buried inside that network is an objective measure of which pages the web itself considers important. That measure is PageRank, and computing it is exactly the eigenvector problem we have been building toward since Chapter 3.

This is the chapter the entire eigenvalue story has been pointing at. We seeded it twice — informally in Chapter 3, where the web first appeared as a system of linear equations, and again in Chapter 23, where we previewed the city-and-suburb migration model and promised that PageRank was the same calculation at planetary scale. Now we have every tool we need. We know what eigenvectors mean (the invariant directions of a transformation, Chapter 23), how to find them (Chapter 24), what it means to apply a matrix over and over (the diagonalization and matrix-power machinery of Chapter 25), and how power iteration locks onto the dominant eigenvector (Chapter 23's Build Your Toolkit). PageRank assembles all of it into a single, world-changing computation: the rank of every page on the web is one giant dominant eigenvector, and Google's original algorithm was nothing more than power iteration on a matrix with a row and a column for every page on the internet.

The promise of the chapter is to take you from a doodle of arrows between pages to a working ranking algorithm — geometrically first, as always, then algebraically, then in code that matches the math number-for-number. We will build a tiny five-page web you can compute by hand, watch its rank vector emerge as the steady state of a random walk, confront the two ways the naive version breaks (dangling pages and disconnected islands), repair them with the famous damping factor, and prove — via the Perron-Frobenius theorem — that the repaired problem has exactly one answer. By the end you will have implemented pagerank in your own toolkit, built on the power_iteration you wrote in Chapter 23, and verified that its output is the eigenvalue-1 eigenvector of the Google matrix. Let us begin where every concept in this book begins: with the picture.

29.1 What does it mean to rank the web? The graph picture

Picture the web not as pages of text but as a network of dots and arrows. Each page is a dot (mathematicians call it a node or vertex), and each hyperlink from one page to another is an arrow pointing from the linking page to the linked page. Because a link goes one way — page X links to page Y, but Y need not link back — the arrows have direction. A collection of dots joined by directed arrows is a directed graph, and the web is the largest directed graph humans have ever built: billions of nodes, hundreds of billions of arrows. Forget, for now, what any page says. PageRank's first and most radical move is to ignore the words entirely and look only at the shape of this graph.

The intuition Page and Brin started from is the one you already use when you judge reputation in the real world. Imagine you are trying to figure out who is important in a community where the only information you have is who recommends whom. A person recommended by many others is probably important. But not all recommendations weigh the same — a recommendation from someone who is themselves highly recommended counts for far more than a recommendation from a nobody. Importance, in other words, is recursive: you are important if important people point to you. PageRank takes this exact idea and applies it to web pages, where "recommend" means "link to."

Geometric Intuition — Think of importance as a fluid that flows through the graph along the arrows. Pour an equal amount of "rank fluid" onto every page to start. Now let each page pour all of its fluid out through its outgoing links, splitting it evenly among the pages it points to. A page with three outgoing links sends a third of its current fluid down each. Run this redistribution over and over and the fluid sloshes around the network, draining out of obscure corners and pooling on the pages that many well-connected pages point to. PageRank is where the fluid settles — the stable distribution that no longer changes when you run another round of pouring. Important pages are the ones where the fluid pools deep.

That image already contains the whole chapter in disguise. "Pour the fluid and redistribute" is multiplication by a matrix. "Run it over and over until it settles" is power iteration. "The stable distribution that no longer changes" is the eigenvector for eigenvalue $1$. Everything that follows is making this precise.

29.1.1 Why counting links is not enough

You might think we could rank pages by a much simpler rule: just count each page's incoming links, and call the page with the most links the most important. This is the web equivalent of counting recommendation letters. It is a reasonable first guess, and it captures part of the truth, but it misses the recursive heart of the idea — and it is trivially easy to cheat.

Suppose a spammer creates a thousand junk pages that all link to their product page. By raw link-counting, the product page now has a thousand incoming links and shoots to the top. PageRank defeats this because those thousand junk pages are themselves pointed to by nobody — they hold almost no rank fluid to begin with, so the thousand votes they cast are each nearly worthless. A single link from a genuinely important page (one that many important pages point to) outweighs a thousand links from worthless ones. The recursion is not a mathematical nicety; it is precisely what makes the measure robust. We cannot get it by counting. We have to solve for it, and solving for it is an eigenvector computation.

The recursion also explains a phenomenon you have surely noticed: a link from a single highly-trusted source — a major news outlet, a university homepage, an encyclopedia — can do more for a page's standing than thousands of links from obscure blogs. There is a deep chicken-and-egg flavor to this. To know how important page $i$ is, you need to know how important the pages linking to it are; but to know their importance, you need to know the importance of the pages linking to them; and so on, in an infinite regress around the whole web. A naive person might despair: the definition seems circular, importance defined in terms of itself. The eigenvector is the resolution of the circularity. Demanding $\mathbf{r} = M\mathbf{r}$ is demanding a set of importance scores that is self-consistent — every page's score equals the weighted sum of its donors' scores, all $n$ equations satisfied simultaneously. A self-consistent solution to a circular definition is exactly a fixed point, and a fixed point of a linear map is exactly an eigenvector with eigenvalue $1$. The apparent paradox of circular importance dissolves into a clean, solvable eigenproblem.

Historical Note — The idea of measuring importance by an eigenvector of a link-style matrix predates the web. Sociologists studying social networks in the mid-twentieth century proposed eigenvector centrality — rank a person by the eigenvector of the "who-knows-whom" matrix — and bibliometricians ranked academic journals by citation eigenvectors. [verify] In economics, Wassily Leontief's input-output models (1930s–40s) and later eigenvector-based influence measures used the same mathematics. [verify] Larry Page and Sergey Brin's 1998 contribution was not the eigenvector idea in the abstract but the random-surfer formulation with damping, applied at web scale with an algorithm (power iteration on a sparse matrix) that actually ran on the internet of the late 1990s. PageRank is named for Larry Page — a pun, since it ranks pages. [verify]

29.2 How do you turn a web of links into a matrix?

To compute anything we must turn the graph into numbers, and the natural container for "who points to whom" is a matrix. This is the same move we have made all through the book: a transformation, an abstract action, becomes a grid of numbers once we fix a coordinate system. Here the "coordinates" are simply the pages, listed in some fixed order, and the matrix will record how rank flows from each page to the others.

Let us build a concrete example — a tiny web of five pages we can carry through the whole chapter by hand. Call the pages $1, 2, 3, 4, 5$ (think of them as five blogs, or five Wikipedia articles). Here is the link structure:

Page $1$ is linked to by pages $2$, $3$, $4$, and $5$ (it is the most-pointed-to page).
Page $2$ is linked to by pages $3$, $4$, and $5$.
Page $3$ is linked to by pages $4$ and $5$.
Page $4$ is linked to by page $5$.
Page $5$ links to pages $1$, $2$, $3$, $4$.

Reading it the other way — by outgoing links, which is what we need — page $5$ links out to all four others; page $4$ links to $1, 2, 3$; page $3$ links to $1, 2$; page $2$ links to $1$; and page $1$ links to nothing (it is the oldest, most foundational page; it cites no one). This is deliberately the shape of a citation network, where newer papers cite older ones, and we will return to that reading in the case studies.

The Key Insight — The matrix we want is the link matrix (or transition matrix) $M$, where the entry $m_{ij}$ records the rank that flows from page $j$ to page $i$. The convention that matters: column $j$ describes where page $j$ sends its rank. If page $j$ has $k$ outgoing links, each linked page receives an equal share $1/k$, so every nonzero entry in column $j$ equals $1/k$. The columns describe the outgoing links; the rows collect the incoming rank.

Why split a page's outgoing rank equally among its links, each getting $1/k$? Because that is the simplest honest model of a reader on that page: with no other information, a reader is equally likely to click any of the $k$ links. Page $j$ has a fixed amount of rank to give away, and it divides that endowment evenly among the pages it endorses. A page with one outgoing link hands its entire rank to that single target; a page with twenty links gives each target a twentieth.

29.2.1 Building the link matrix by hand

Let us write $M$ for our five-page web. We order the pages $1$ through $5$, so row $i$ and column $j$ both refer to page $i$ or $j$ in that order. We fill in column by column, asking for each page "where do its outgoing links go, and how many are there?"

Column $1$ (page $1$'s outgoing links): page $1$ links to nothing. Its column is all zeros — a problem we will name and fix in §29.4.
Column $2$ (page $2$): one outgoing link, to page $1$. So $m_{1,2} = 1$, the rest zero.
Column $3$ (page $3$): two outgoing links, to pages $1$ and $2$. So $m_{1,3} = m_{2,3} = \tfrac12$.
Column $4$ (page $4$): three outgoing links, to pages $1, 2, 3$. So $m_{1,4} = m_{2,4} = m_{3,4} = \tfrac13$.
Column $5$ (page $5$): four outgoing links, to pages $1, 2, 3, 4$. So $m_{1,5} = m_{2,5} = m_{3,5} = m_{4,5} = \tfrac14$.

Assembling these columns gives

$$M = \begin{bmatrix} 0 & 1 & \tfrac12 & \tfrac13 & \tfrac14 \\[2pt] 0 & 0 & \tfrac12 & \tfrac13 & \tfrac14 \\[2pt] 0 & 0 & 0 & \tfrac13 & \tfrac14 \\[2pt] 0 & 0 & 0 & 0 & \tfrac14 \\[2pt] 0 & 0 & 0 & 0 & 0 \end{bmatrix}.$$

Read any column as a recipe for redistributing one page's rank. Read any row as a list of where one page's incoming rank arrives from. Notice that columns $2, 3, 4, 5$ each sum to exactly $1$ — page $j$ gives away all of its rank, no more and no less — while column $1$ sums to $0$, because page $1$ has nothing to give away. That zero column is the first crack in the naive model, and we will return to it. But first, let us understand the property the other four columns have, because it is the key to everything.

One more observation about $M$ will matter enormously at web scale: it is overwhelmingly sparse — almost all of its entries are zero. A typical web page links to a few dozen others, not to a billion, so each column has only a handful of nonzero entries among its billions of slots. The full Google matrix $G$ is dense (every entry is at least the teleport floor $\tfrac{1-d}{n}$), and storing it would be impossible. But we never form $G$ explicitly. The redistribution $G\mathbf{x} = d\,(S\mathbf{x}) + \tfrac{1-d}{n}\mathbf{1}\,(\mathbf{1}^{\mathsf{T}}\mathbf{x})$ splits into a sparse matrix-vector product $S\mathbf{x}$ (touching only the actual links) plus a single scalar correction (since $\mathbf{1}^{\mathsf{T}}\mathbf{x}$ is just the sum of $\mathbf{x}$'s entries). Each power-iteration step therefore costs work proportional to the number of links, not the square of the number of pages — the difference between feasible and impossible. Sparsity is the unsung hero of PageRank: the matrix is astronomically large but almost empty, and power iteration is exactly the algorithm that exploits emptiness.

29.3 What is a column-stochastic matrix, and why is it the right object?

A matrix whose columns are each nonnegative and sum to $1$ is called column-stochastic (sometimes just stochastic, or a Markov matrix). The word "stochastic" means "random," and the name is exactly right: a column-stochastic matrix is a table of probabilities. Each column is a probability distribution — a list of nonnegative numbers summing to $1$ — describing where something goes next.

The Key Insight — A column-stochastic matrix is the matrix of a random process that conserves total probability. If $\mathbf{x}$ is a probability distribution (its entries are nonnegative and sum to $1$, describing "the chance of being on each page right now"), then $M\mathbf{x}$ is again a probability distribution — the chance of being on each page one step later. Total probability is neither created nor destroyed; it only redistributes. This conservation is precisely what column sums of $1$ guarantee.

Let us see why $M\mathbf{x}$ stays a probability distribution, because the argument is short and it is the structural reason stochastic matrices behave so well. Suppose $\mathbf{x} = (x_1, \dots, x_n)$ has nonnegative entries summing to $1$. The entries of $M\mathbf{x}$ are nonnegative (sums of products of nonnegative numbers), so we only need to check that they sum to $1$. The sum of all entries of $M\mathbf{x}$ is

$$\sum_{i} (M\mathbf{x})_i = \sum_i \sum_j m_{ij} x_j = \sum_j \left(\sum_i m_{ij}\right) x_j = \sum_j 1 \cdot x_j = \sum_j x_j = 1,$$

where the crucial step is $\sum_i m_{ij} = 1$ — each column sums to one. Swapping the order of summation and collecting by column, every column contributes its full $x_j$ back, so the grand total is preserved. The column-sum-one condition is not a technical convenience; it is the algebraic encoding of "probability is conserved," and it is why repeated multiplication by $M$ will settle down rather than blow up or decay.

29.3.2 The same idea as a system of equations (the Chapter 3 connection)

It is worth seeing that the eigenvector framing and the system-of-equations framing — the two ways this book introduced PageRank — are the same statement wearing different clothes. Back in Chapter 3 we hinted that a page's rank could be written as an equation: page $i$'s rank equals the sum of the ranks it receives from the pages linking to it, each donor contributing its rank divided by its out-degree. Written out for page $i$,

$$r_i = \sum_{j \,\to\, i} \frac{r_j}{(\text{out-degree of } j)},$$

one such equation per page. That is a linear system in the unknowns $r_1, \dots, r_n$ — exactly the kind we learned to solve in Part I. But collect all $n$ equations into vector form and the right-hand side is precisely $M\mathbf{r}$ (donor $j$ contributes $m_{ij} r_j$ to page $i$), so the whole system is $\mathbf{r} = M\mathbf{r}$. The system-of-equations view of Chapter 3 and the eigenvector view of this chapter are identical: "solve the rank equations" and "find the eigenvector for eigenvalue $1$" describe the same vector $\mathbf{r}$. Chapter 3 lacked the tools to solve a system that large or to know the solution was unique; eigenvalues, Perron-Frobenius, and power iteration are precisely those missing tools. The promise of Chapter 3 is the achievement of Chapter 29.

Real-World Application — Markov chains everywhere. A column-stochastic matrix iterated on a probability vector is exactly a Markov chain — a random process where the next state depends only on the current state, not on the full history. The same mathematics models customer churn (the columns give the chance a subscriber switches plans next month), board-game movement (Monopoly square-to-square transition probabilities), weather (sunny-to-rainy chances), genetics, queueing networks, and Google's web surfer. PageRank is one Markov chain among thousands; what makes it special is its sheer size and the cleverness of the matrix it iterates. For the full theory of these chains — transient versus recurrent states, mixing times, and detailed balance — see Markov chains.

29.3.1 The random surfer: a story for the matrix

The Markov chain hiding in $M$ has a vivid story, and it is the story Page and Brin told. Imagine a random surfer turned loose on the web. She starts on some page and, at each step, picks one of that page's outgoing links uniformly at random and clicks it, arriving at a new page. Then she does it again, and again, forever — a mindless, infinitely patient reader following links with no goal and no memory of where she has been.

Where is the surfer likely to be after a very long time? If $\mathbf{x}_t$ is the probability distribution of her location at step $t$ — a vector whose $i$-th entry is the chance she is on page $i$ — then one click advances her by exactly one multiplication: $\mathbf{x}_{t+1} = M\mathbf{x}_t$. (Check the meaning: the chance she lands on page $i$ next is the sum, over all pages $j$ that link to $i$, of the chance she is on $j$ now times the chance she clicks the $j \to i$ link, which is $m_{ij}$ — exactly the $i$-th entry of $M\mathbf{x}_t$.) The surfer's location after $t$ steps is $\mathbf{x}_t = M^t \mathbf{x}_0$, and PageRank is the long-run fraction of time she spends on each page. Pages she visits often are important; pages she rarely reaches are not. This is the same recursive idea — a page is important if important (frequently-visited) pages link to it — now dressed as a random walk.

Geometric Intuition — The random surfer turns "importance" into "how much of forever do you spend here?" Picture a million surfers released at once, scattered uniformly across the web, all clicking links independently. At first they are spread evenly. But as the clicks pile up, they drain out of dead-end corners and pile up on the well-connected hubs. After enough steps the fraction of surfers on each page stops changing — as many arrive as leave each tick. That frozen, self-sustaining distribution is the PageRank vector. It is the demographic equilibrium of an endless click-stream.

29.4 Why is PageRank an eigenvector? The stationary distribution

Now comes the payoff that justifies placing this chapter at the climax of Part V. We want the surfer's long-run distribution — the one that stops changing. Call it $\mathbf{r}$ (for rank). "Stops changing" means that applying one more step of the random walk leaves it alone: a step takes $\mathbf{r}$ to $M\mathbf{r}$, and we want that to equal $\mathbf{r}$ again. In symbols,

$$\boxed{\,M\mathbf{r} = \mathbf{r}\,}.$$

Stare at that equation. It is the eigen-equation $A\mathbf{v} = \lambda\mathbf{v}$ of Chapter 23, with the matrix $M$, the eigenvector $\mathbf{r}$, and the eigenvalue $\lambda = 1$. A distribution that survives one step of the Markov chain unchanged is precisely an eigenvector of $M$ with eigenvalue $1$. The PageRank vector is not assigned, not voted on, not counted — it is solved for, as the eigenvector belonging to eigenvalue $1$ of the link matrix.

The Key Insight — A page's rank is a fixed point, not a tally. The stationary distribution $\mathbf{r}$ of the random walk satisfies $M\mathbf{r} = \mathbf{r}$, which is the eigen-equation with $\lambda = 1$. PageRank is the dominant eigenvector of the link matrix, belonging to eigenvalue $1$ — this is the "pagerank eigenvector explained" in one sentence. Everything else in the chapter is making sure that eigenvector exists, is unique, and can be computed at the scale of the web.

There is a subtle point worth pinning down, because it is where the two descriptions of PageRank — "the long-run fraction of time the surfer spends on each page" (§29.3.1) and "the distribution that is unchanged by one step" (this section) — secretly meet. These are not obviously the same thing: one is a time-average over a single surfer's infinite journey, the other is a property of a distribution at one instant. That they coincide is the content of the ergodic theorem for Markov chains: for a primitive stochastic matrix, the long-run fraction of time a single trajectory spends in each state equals the stationary distribution $\mathbf{r}$, and moreover the distribution $\mathbf{x}_t = G^t\mathbf{x}_0$ converges to $\mathbf{r}$ from any start $\mathbf{x}_0$. So whether you imagine one surfer wandering forever and tally where she lingers, or a cloud of surfers relaxing toward equilibrium, or simply the algebraic fixed point of $G$, you arrive at the identical vector. The eigenvector for $\lambda = 1$ is the single object all three pictures describe. (The "from any start" universality is conclusion (ii)+(iv) of Perron-Frobenius doing its work: a one-dimensional eigenspace at $\lambda=1$ with everything else decaying means the limit cannot depend on where you began.)

This is the same $\lambda = 1$ steady state we met in Chapter 23's city-and-suburb migration model. There, the population distribution settled at the eigenvector $(\tfrac23, \tfrac13)$ of the migration matrix, "regardless of where they started," and we found it by solving $(P - I)\mathbf{x} = \mathbf{0}$. PageRank is that exact calculation — find the eigenvector for eigenvalue $1$ of a column-stochastic matrix — with billions of states instead of two. The migration model was a two-page web in disguise; the real web is the same problem grown to span the internet.

Why is the eigenvector for $\lambda = 1$ the one we want, and not some other eigenvalue's? Because we proved in §29.3 that multiplying a probability distribution by a stochastic matrix gives back a probability distribution — total weight conserved. A direction that grew (eigenvalue $> 1$) would break conservation by inflating the total; a direction that shrank ($|\lambda| < 1$) would bleed weight away. Only $\lambda = 1$ preserves the total exactly, which is why the steady state lives there. In fact every column-stochastic matrix has $\lambda = 1$ as an eigenvalue, and no eigenvalue is larger in magnitude — a fact we now make precise, because it is the linchpin of the whole method.

29.4.1 Why eigenvalue 1 always exists for a stochastic matrix

Here is a clean reason a column-stochastic matrix must have $1$ as an eigenvalue. The condition "every column of $M$ sums to $1$" says exactly that the all-ones row vector $\mathbf{1}^{\mathsf{T}} = (1, 1, \dots, 1)$ satisfies $\mathbf{1}^{\mathsf{T}} M = \mathbf{1}^{\mathsf{T}}$ — multiplying $M$ on the left by all-ones adds up each column, returning all-ones. So $1$ is an eigenvalue of $M^{\mathsf{T}}$ (with left-eigenvector $\mathbf{1}$). But a matrix and its transpose have the same eigenvalues — they share the characteristic polynomial, since $\det(M - \lambda I) = \det((M-\lambda I)^{\mathsf{T}}) = \det(M^{\mathsf{T}} - \lambda I)$ (the determinant is unchanged by transpose, Chapter 11). Therefore $1$ is an eigenvalue of $M$ as well. The matching right-eigenvector of $M$ for eigenvalue $1$ — the one satisfying $M\mathbf{r} = \mathbf{r}$ — is the stationary distribution we are after. Existence is guaranteed; the harder questions are uniqueness and how to compute it.

Math-Major Sidebar — Why is no eigenvalue of a stochastic matrix larger than $1$ in magnitude? Let $\lambda$ be any eigenvalue of $M$. Since $M$ and $M^{\mathsf{T}}$ share eigenvalues (shown above), let $\mathbf{v}$ be an eigenvector of the row-stochastic $M^{\mathsf{T}}$, whose rows sum to $1$, with $M^{\mathsf{T}}\mathbf{v} = \lambda\mathbf{v}$. Pick the index $k$ with $|v_k|$ maximal, i.e. $|v_k| \ge |v_i|$ for all $i$. From $(M^{\mathsf{T}}\mathbf{v})_k = \lambda v_k$ we get $|\lambda|\,|v_k| = \left|\sum_i (M^{\mathsf{T}})_{ki} v_i\right| \le \sum_i (M^{\mathsf{T}})_{ki} |v_i| \le \sum_i (M^{\mathsf{T}})_{ki} |v_k| = |v_k|$, using nonnegativity of the entries, the triangle inequality, and rows summing to $1$. Dividing by $|v_k| > 0$ gives $|\lambda| \le 1$. So the entire spectrum of a stochastic matrix lies in the closed unit disk, with $\lambda = 1$ on its boundary — exactly the setup power iteration needs (largest eigenvalue on top), provided no other eigenvalue also sits on the unit circle. Ensuring that "provided" is what damping accomplishes.

29.5 What breaks, and what is the Google matrix? Dangling nodes, dead ends, and damping

The clean story — "PageRank is the eigenvalue-$1$ eigenvector of the link matrix" — has two holes that a real web tears open immediately. Both are visible in our five-page example, and both have the same elegant fix. Confronting them honestly is what separates a real algorithm from a fairy tale, and it is the heart of why the Google matrix looks the way it does.

Problem 1: dangling nodes (dead ends). Page $1$ in our web has no outgoing links — its column in $M$ is all zeros. Such a page is called a dangling node, and the web is full of them: PDFs, images, pages that link nowhere, pages whose links the crawler has not yet followed. A dangling node is a leak in the random walk. When the surfer lands on page $1$, she has nowhere to click — the model gives her no next move, so her probability simply vanishes from the system. Iterate the raw matrix $M$ and watch the total probability drain away to zero, because the dangling page swallows whatever reaches it and returns nothing. A "stochastic" matrix with a zero column is not actually stochastic, and its dominant eigenvalue is less than $1$; the steady state collapses to the zero vector, which ranks every page $0$ — useless.

Common Pitfall — Dangling nodes silently destroy the ranking, and the symptom is leaking probability. If you iterate the raw link matrix with even one zero column, the vector's entries shrink toward zero each step (its total is no longer conserved) and you converge to $\mathbf{0}$, not to a meaningful rank. The fix is to decide what the surfer does at a dead end: she gives up on links and jumps to a page chosen uniformly at random. Concretely, replace each all-zero column of $M$ with the uniform column $(\tfrac1n, \dots, \tfrac1n)$. This patched matrix $S$ is genuinely column-stochastic — every column now sums to $1$ — and models a surfer who, hitting a dead end, types a random URL and keeps going. The leak is plugged.

Problem 2: reducibility (disconnected islands and traps). Even with no dangling nodes, the web is not one connected piece. It has islands — clusters of pages that link among themselves but never out — and traps — a page or group whose links all point back inside, so the surfer who wanders in can never leave. A matrix whose graph has such disconnected or one-way-trapping structure is called reducible. Reducibility is fatal to uniqueness: if the web splits into two islands, the random walk has two different stationary distributions (one living on each island), and "the" PageRank vector is no longer well defined. The eigenvalue $1$ has more than one independent eigenvector, and the algorithm cannot say which ranking is correct.

The cure for both problems at once is one of the most consequential modeling decisions in the history of computing, and it is breathtakingly simple. We admit that real surfers do not follow links forever. Every so often a reader gets bored, abandons the link they are on, and teleports — jumps to some random page anywhere on the web, typed into the address bar. Let $d$ (the damping factor) be the probability that the surfer follows a link as usual, and $1 - d$ the probability that she teleports to a uniformly random page. Page and Brin set $d = 0.85$: about $85\%$ of the time the surfer clicks a link, about $15\%$ of the time she jumps somewhere random. The resulting transition matrix is the Google matrix:

$$\boxed{\,G = d\,S + \frac{1-d}{n}\,\mathbf{1}\mathbf{1}^{\mathsf{T}}\,}$$

where $S$ is the dangling-patched link matrix, $n$ is the number of pages, and $\mathbf{1}\mathbf{1}^{\mathsf{T}}$ is the all-ones $n \times n$ matrix (so $\tfrac{1}{n}\mathbf{1}\mathbf{1}^{\mathsf{T}}$ has every entry $1/n$, the uniform teleport distribution). In words: with probability $d$, do what the links say ($S$); with probability $1-d$, jump to a page picked uniformly at random (every page equally likely, hence every entry $1/n$).

The Key Insight — The Google matrix $G = dS + \tfrac{1-d}{n}\mathbf{1}\mathbf{1}^{\mathsf{T}}$ blends the link structure with a tiny uniform "teleport" that connects every page to every other. That teleport term, however small, makes $G$ positive (every entry strictly $> 0$): from any page the surfer can reach any other in one step. A positive stochastic matrix is the best-behaved object in this whole theory — and that is exactly what the Perron-Frobenius theorem requires to guarantee a unique PageRank.

29.5.1 The Perron-Frobenius theorem: why there is exactly one answer

The reason damping works is a theorem about positive matrices proved by Oskar Perron in 1907 and extended to nonnegative matrices by Georg Frobenius in 1912. [verify] It is the theoretical bedrock under PageRank, and it deserves a careful statement with its conditions, in keeping with this book's insistence (§10 of the style guide) that we never present a special case as the general rule.

Warning

— The Perron-Frobenius guarantee has conditions, and they are exactly what the Google matrix is engineered to satisfy. The clean version we use: Let $G$ be a column-stochastic matrix that is also positive (every entry strictly greater than $0$). Then (i) $\lambda = 1$ is an eigenvalue of $G$; (ii) it is simple — its eigenspace is one-dimensional, so there is a unique stationary distribution up to scale; (iii) its eigenvector $\mathbf{r}$ can be chosen with all entries strictly positive; and (iv) every other eigenvalue is strictly smaller in magnitude, $|\lambda_i| < 1$ — there is a genuine spectral gap below $1$. The raw link matrix $M$ satisfies none of (ii)–(iv) reliably, because it is neither positive nor even stochastic. The damping term forces every entry of $G$ above zero, which is precisely hypothesis needed; drop the damping and the guarantees evaporate. (The weaker hypothesis that suffices is irreducible and aperiodic — "primitive" — but positivity is the simplest sufficient condition, and the one damping delivers outright.)

It is worth naming the two structural properties the weaker hypothesis bundles together, because they map cleanly onto the two failures of §29.5. A stochastic matrix is irreducible when its graph is strongly connected — you can get from any page to any other page by some path of links. Irreducibility is exactly what rules out the disconnected islands and one-way traps of Problem 2: if every page reaches every other, the random walk cannot get permanently stuck in a sub-region, so there is only one place for the probability to settle, hence a unique stationary distribution. A stochastic matrix is aperiodic when the walk is not forced into a rigid cycle — there is no integer $p > 1$ such that the surfer can only return to a page at steps that are multiples of $p$. Aperiodicity rules out the oscillation that stalls power iteration (the $A \to B \to A \to B$ trap with its eigenvalue $-1$). A matrix that is both irreducible and aperiodic is called primitive, and Perron-Frobenius applies to primitive nonnegative matrices in full force. The teleport term secures both at one stroke: because $G$ is strictly positive, every page links to every page in a single step, which makes the graph trivially strongly connected (irreducible) and gives every page a self-reachable path of every length (aperiodic). Positivity is the sledgehammer; primitivity is the precise nail; either way, damping delivers the hypothesis Perron-Frobenius demands.

Each conclusion is exactly a problem solved. Conclusion (i) says a stationary rank exists. Conclusion (ii) — simplicity — says it is unique: no more disconnected-island ambiguity, because the teleport links every island to every other, fusing the web into one communicating whole. Conclusion (iii) says every page gets a positive rank, so no page is spuriously assigned zero importance. And conclusion (iv) — the spectral gap, $|\lambda_2| < 1$ — is the one that makes the algorithm fast: it is exactly the condition power iteration needs to converge, as we will see in §29.7. Damping does not just patch a bug. It transforms the link matrix into the unique kind of matrix for which the eigenvector problem is guaranteed to have one positive answer, computable by repeated multiplication. That is why every search engine since has used some descendant of it.

29.5.2 What does the damping factor do to the eigenvalues?

There is a beautiful exact relationship between the spectrum of the patched link matrix $S$ and the spectrum of the Google matrix $G$, and it shows precisely how damping controls convergence. If the eigenvalues of $S$ are $1 = \mu_1, \mu_2, \mu_3, \dots, \mu_n$ (with $\mu_1 = 1$ the stochastic Perron value), then the eigenvalues of $G = dS + \tfrac{1-d}{n}\mathbf{1}\mathbf{1}^{\mathsf{T}}$ are

$$1, \quad d\mu_2, \quad d\mu_3, \quad \dots, \quad d\mu_n.$$

The top eigenvalue stays pinned at $1$ (the stationary distribution survives), but every other eigenvalue is multiplied by $d$ and so shrinks toward zero. Since $|\mu_i| \le 1$ for the stochastic matrix $S$, the subdominant eigenvalue of $G$ has magnitude at most $d = 0.85$. The damping factor is, almost exactly, the convergence rate of power iteration — and choosing $d = 0.85$ rather than, say, $0.99$ is a deliberate trade: a smaller $d$ gives a bigger spectral gap and faster convergence, but trusts the actual link structure less. We will watch this $0.85$ ratio appear, to the digit, in the error decay of §29.7. (We verify the relationship numerically below; a proof is left to the exercises.)

29.6 Computing PageRank on a small web, by hand and in numpy

Theory in hand, let us compute the actual PageRank of our five-page web and watch the abstractions become numbers. We will build $S$ (patching the dangling page $1$), form $G$ with $d = 0.85$, and find its eigenvalue-$1$ eigenvector — first conceptually, then with numpy, with the two matching exactly as the book's verification standard demands.

Step 1: patch the dangling node. Page $1$'s column was all zeros. Replace it with the uniform column $(\tfrac15, \tfrac15, \tfrac15, \tfrac15, \tfrac15)$ — a surfer stuck on page $1$ jumps to a uniformly random page. The patched, genuinely-stochastic matrix is

$$S = \begin{bmatrix} \tfrac15 & 1 & \tfrac12 & \tfrac13 & \tfrac14 \\[2pt] \tfrac15 & 0 & \tfrac12 & \tfrac13 & \tfrac14 \\[2pt] \tfrac15 & 0 & 0 & \tfrac13 & \tfrac14 \\[2pt] \tfrac15 & 0 & 0 & 0 & \tfrac14 \\[2pt] \tfrac15 & 0 & 0 & 0 & 0 \end{bmatrix}.$$

Every column now sums to $1$.

Step 2: form the Google matrix. With $d = 0.85$ and $n = 5$, the teleport term adds $\tfrac{1-d}{n} = \tfrac{0.15}{5} = 0.03$ to every entry, and the link part is scaled by $0.85$. So $G = 0.85\,S + 0.03\,\mathbf{1}\mathbf{1}^{\mathsf{T}}$. For instance the top-left entry becomes $0.85 \cdot \tfrac15 + 0.03 = 0.17 + 0.03 = 0.20$, and the entry $g_{1,2} = 0.85 \cdot 1 + 0.03 = 0.88$. Every entry of $G$ is now strictly positive — the Perron-Frobenius hypothesis holds — and every column still sums to $1$.

Step 3: find the eigenvector for $\lambda = 1$. By hand this means solving $(G - I)\mathbf{r} = \mathbf{0}$ and normalizing so the entries sum to $1$, exactly the null-space-of-$(A - \lambda I)$ computation from Chapter 23 — tedious for a $5 \times 5$ but entirely mechanical. Rather than grind through it, we will hand it to numpy and, in §29.7, recover the same answer the way Google actually does: by power iteration. Here is the construction and the eigen-check.

# Build the five-page web, patch the dangling node, form the Google matrix.
import numpy as np
np.set_printoptions(precision=4, suppress=True)

# Raw link matrix M: column j = where page j sends its rank (1/outdeg each).
M = np.array([[0, 1, 1/2, 1/3, 1/4],
              [0, 0, 1/2, 1/3, 1/4],
              [0, 0, 0,   1/3, 1/4],
              [0, 0, 0,   0,   1/4],
              [0, 0, 0,   0,   0  ]])
n = 5
S = M.copy()
S[:, 0] = 1/n                      # patch page 1's all-zero (dangling) column
d = 0.85
G = d * S + (1 - d) / n * np.ones((n, n))   # the Google matrix
print("column sums of G:", G.sum(axis=0))   # [1. 1. 1. 1. 1.]  -> stochastic

w, V = np.linalg.eig(G)                       # full spectrum
print("eigenvalues:", np.round(w, 4))         # dominant is exactly 1; rest complex
idx = np.argmax(w.real)                        # locate the lambda = 1 eigenvector
r = np.abs(V[:, idx].real)
r = r / r.sum()                                # normalize to a probability vector
print("PageRank:", np.round(r, 4))
# eigenvalues: [ 1.+0.j  -0.1627+0.3177j  -0.1627-0.3177j  -0.1773+0.052j  -0.1773-0.052j ]
# PageRank:   [0.4066 0.2198 0.1542 0.1202 0.0991]

The dominant eigenvalue comes back as exactly $1.0$, confirming §29.4.1, and the eigenvector for it, normalized to sum to $1$, is

$$\mathbf{r} \approx (0.4066,\; 0.2198,\; 0.1542,\; 0.1202,\; 0.0991).$$

(The other four eigenvalues come back as two complex-conjugate pairs — our matrix is not symmetric, so by Chapter 26 some of its eigenvalues are complex. That is perfectly fine: only the Perron eigenvalue $1$ and its real, positive eigenvector matter for ranking, and every complex eigenvalue has magnitude strictly less than $1$, as Perron-Frobenius promised.)

Page $1$ wins decisively with rank $0.4066$ — it is pointed to by all four other pages, so the most rank fluid pools there. Page $2$ is second ($0.2198$), pointed to by three pages; then $3$, $4$, and finally $5$ ($0.0991$), which links out to everyone but is pointed to by no one and so accumulates the least rank. The order matches our intuition exactly: importance flows toward the pages the network points at. (Notice page $5$'s fate — it is generous with links but receives none, the web equivalent of someone who recommends everyone but is recommended by no one.)

Computational Note — np.linalg.eig returns the eigenvalues unsorted and the eigenvectors as the columns of V, each normalized to Euclidean length $1$ — not to sum $1$. For a probability interpretation you must renormalize by the sum ($\ell_1$ norm), as the code does with r / r.sum(). The eigenvector may also come back with all-negative entries (since $-\mathbf{r}$ is equally an eigenvector); taking np.abs before normalizing fixes the sign, which is legitimate here because Perron-Frobenius guarantees the true eigenvector has all entries of one sign. Finally, for a real-world web you would never call np.linalg.eig — it costs $O(n^3)$ and demands the whole matrix in memory, hopeless for billions of pages. That is why the real algorithm is power iteration, the subject of the next section.

29.7 Why is power iteration the right algorithm? Convergence to the dominant eigenvector

We could find PageRank by solving $(G - I)\mathbf{r} = \mathbf{0}$ or by calling an eigen-solver, and for five pages either works. But the web has billions of pages, and the Google matrix has billions of rows and columns. No computer can store that matrix densely (it would need $10^{18}$ entries), let alone run an $O(n^3)$ eigen-decomposition on it. The genius of PageRank is partly that its matrix never needs to be stored or decomposed at all: the only operation required is "multiply the matrix by a vector," and that can be done using just the list of links. The algorithm that needs nothing more than matrix-times-vector is power iteration — the very method you built in Chapter 23's Build Your Toolkit — and it is why we made you write power_iteration in the first place.

Recall the power-iteration idea from Chapter 23: to find the dominant eigenvector of a matrix, just apply the matrix over and over to any starting vector, renormalizing each step, and the iterates lock onto the eigenvector whose eigenvalue is largest in magnitude. For the Google matrix that dominant eigenvalue is $1$ (Perron-Frobenius conclusions (i) and (iv)), so power iteration converges to exactly the eigenvector we want — the PageRank vector. And because $G$ is stochastic, multiplying a probability vector by $G$ keeps it a probability vector, so we barely even need to renormalize. Each step is one round of the random surfer's redistribution: start everyone spread uniformly, apply $G$, and watch the distribution flow toward equilibrium.

Geometric Intuition — Power iteration is the random surfer, simulated. The starting vector $\mathbf{x}_0 = (\tfrac1n, \dots, \tfrac1n)$ is "a surfer equally likely to be anywhere." Each multiplication $\mathbf{x}_{t+1} = G\mathbf{x}_t$ advances the whole probability cloud by one click. The cloud sloshes toward the well-connected pages and away from the obscure ones, and after a surprisingly small number of steps it stops moving — it has reached the stationary distribution. You are not solving an equation so much as running the process until it settles, and where it settles is the answer.

Before reaching for code, let us do the first step by hand, because doing it once with a pencil makes the algorithm concrete in a way no amount of code can. Start the surfer uniform: $\mathbf{x}_0 = (\tfrac15, \tfrac15, \tfrac15, \tfrac15, \tfrac15) = (0.2, 0.2, 0.2, 0.2, 0.2)$. One step is $\mathbf{x}_1 = G\mathbf{x}_0$. Rather than build all of $G$, use the structure $G\mathbf{x} = d\,(S\mathbf{x}) + \tfrac{1-d}{n}\mathbf{1}\,(\mathbf{1}^{\mathsf{T}}\mathbf{x})$ — and since $\mathbf{x}_0$ sums to $1$, the teleport term contributes exactly $\tfrac{1-d}{n} = 0.03$ to every component. So we only need $S\mathbf{x}_0$, the link part. Reading $S$ row by row against $\mathbf{x}_0$:

Page $1$ receives from itself (patched, $\tfrac15$), from page $2$ (all of it, $1$), from page $3$ ($\tfrac12$), page $4$ ($\tfrac13$), page $5$ ($\tfrac14$): $(S\mathbf{x}_0)_1 = (\tfrac15 + 1 + \tfrac12 + \tfrac13 + \tfrac14)(0.2) = (2.2833)(0.2) = 0.4567$.
Page $2$ receives from page $1$ ($\tfrac15$), page $3$ ($\tfrac12$), page $4$ ($\tfrac13$), page $5$ ($\tfrac14$): $(\tfrac15 + \tfrac12 + \tfrac13 + \tfrac14)(0.2) = (1.2833)(0.2) = 0.2567$.
Page $3$: from page $1$ ($\tfrac15$), page $4$ ($\tfrac13$), page $5$ ($\tfrac14$): $(0.7833)(0.2) = 0.1567$.
Page $4$: from page $1$ ($\tfrac15$), page $5$ ($\tfrac14$): $(0.45)(0.2) = 0.09$.
Page $5$: from page $1$ only ($\tfrac15$): $(0.2)(0.2) = 0.04$.

Now apply $G\mathbf{x}_0 = 0.85\,(S\mathbf{x}_0) + 0.03$: multiply each by $0.85$ and add $0.03$. Page $1$: $0.85(0.4567) + 0.03 = 0.4182$. Page $2$: $0.85(0.2567) + 0.03 = 0.2482$. Page $3$: $0.85(0.1567)+0.03 = 0.1632$. Page $4$: $0.85(0.09)+0.03 = 0.1065$. Page $5$: $0.85(0.04)+0.03 = 0.064$. The result $\mathbf{x}_1 \approx (0.4182, 0.2482, 0.1632, 0.1065, 0.064)$ already sums to $1$ (no renormalization needed, because $G$ is stochastic) and already has the correct ranking order — page $1$ first, page $5$ last — after a single step. Repeating this redistribution drives the vector toward the stationary $\mathbf{r}$ of §29.6; the code below confirms it converges in a handful more steps. That hand calculation, scaled up, is literally what Google's machines did: one round of "every page hands its rank along its links," repeated until the numbers stop moving.

Let us run it on our five-page web and verify it reproduces the eigenvector from §29.6 to the digit.

# PageRank by power iteration: repeatedly apply the Google matrix to a uniform start.
import numpy as np
np.set_printoptions(precision=4, suppress=True)

# (M, S, G built exactly as in section 29.6)
M = np.array([[0,1,1/2,1/3,1/4],[0,0,1/2,1/3,1/4],
              [0,0,0,1/3,1/4],[0,0,0,0,1/4],[0,0,0,0,0]])
n = 5; S = M.copy(); S[:, 0] = 1/n; d = 0.85
G = d * S + (1 - d) / n * np.ones((n, n))

r = np.ones(n) / n                       # start: surfer equally likely anywhere
for t in range(9):
    r = G @ r                            # one round of the random surfer
    r = r / r.sum()                      # renormalize (tiny correction; G is stochastic)
    print(f"iter {t+1}: {np.round(r, 4)}")
# iter 1: [0.4182 0.2482 0.1632 0.1065 0.064 ]   (matches the by-hand step above)
# iter 8: [0.4067 0.2198 0.1542 0.1202 0.0991]   (converged)
print("numpy eigvec:", np.round(np.abs(np.linalg.eig(G)[1][:, 0].real)
                                 / np.abs(np.linalg.eig(G)[1][:, 0].real).sum(), 4))
# numpy eigvec: [0.4066 0.2198 0.1542 0.1202 0.0991]

After a single step the ranking order is already correct, and after eight steps the vector has converged to $(0.4066, 0.2198, 0.1542, 0.1202, 0.0991)$ — identical to the eigenvalue-$1$ eigenvector numpy found in §29.6. Power iteration and the direct eigen-solver agree to four decimals, as they must: they are computing the same dominant eigenvector by different routes. You have just run Google's original algorithm in eight lines.

29.7.1 How fast does it converge? The spectral gap, exactly

Why only eight steps? Because of the spectral gap that damping created. Recall from Chapter 23's convergence argument: if you expand the starting vector along the eigenvectors, $\mathbf{x}_0 = c_1\mathbf{r} + c_2\mathbf{v}_2 + \cdots$, then applying $G$ a total of $t$ times scales each piece by its eigenvalue to the $t$-th power,

$$G^t\mathbf{x}_0 = c_1\,1^t\,\mathbf{r} + c_2\,\lambda_2^t\,\mathbf{v}_2 + \cdots = \mathbf{r}\text{-direction} + (\text{terms decaying like }|\lambda_2|^t).$$

The dominant term ($\lambda_1 = 1$) holds steady while every other term decays geometrically, at a rate set by the second eigenvalue $|\lambda_2|$. The error after $t$ steps shrinks like $|\lambda_2|^t$, so each iteration multiplies the remaining error by roughly $|\lambda_2|$. For our Google matrix, §29.5.2 told us $|\lambda_2| = d \cdot |\mu_2| \le d = 0.85$ — and numerically $|\lambda_2| = 0.357$ here (well below the $0.85$ worst case, which is why a handful of steps suffices). One honest subtlety: our five-page matrix is not symmetric, and its subdominant eigenvalues are a complex-conjugate pair $\lambda_2 = -0.163 \pm 0.318i$ — exactly the "rotation in disguise" of Chapter 26. A complex $\lambda_2$ means the decaying error term spirals rather than shrinking monotonically, so the per-step error ratio oscillates around $|\lambda_2| = 0.357$ instead of equalling it exactly. The long-run rate is still governed by the magnitude $|\lambda_2|$; we just watch it average out rather than hold fixed.

# The error decays geometrically at the rate |lambda_2| -- watch it.
import numpy as np
M = np.array([[0,1,1/2,1/3,1/4],[0,0,1/2,1/3,1/4],
              [0,0,0,1/3,1/4],[0,0,0,0,1/4],[0,0,0,0,0]])
n = 5; S = M.copy(); S[:, 0] = 1/n; d = 0.85
G = d * S + (1 - d) / n * np.ones((n, n))
w = np.sort(np.abs(np.linalg.eigvals(G)))[::-1]
print("|eigenvalues| sorted:", np.round(w, 4))     # [1. 0.357 0.357 0.1848 0.1848]

r_true = np.abs(np.linalg.eig(G)[1][:, np.argmax(np.linalg.eig(G)[0].real)].real)
r_true /= r_true.sum()
r = np.ones(n) / n
errs = []
for _ in range(8):
    r = G @ r; r /= r.sum()
    errs.append(np.abs(r - r_true).sum())
print("L1 errors:", np.round(errs, 5))
# L1 errors: [0.09764 0.04096 0.01744 0.00384 0.00195 0.00079 0.00019 0.00009]
print("error ratios:", np.round([errs[i+1]/errs[i] for i in range(4)], 4))
# error ratios: [0.4195 0.4257 0.2202 0.5084]   <- oscillate around |lambda_2|=0.357

The error ratios oscillate around $|\lambda_2| = 0.357$ rather than holding fixed — the signature of a complex subdominant eigenvalue spiraling toward the limit — but they stay safely below $1$, so the error is crushed by an order of magnitude every two or three steps, reaching machine precision in under twenty iterations. The long-run convergence rate is set by the second-largest eigenvalue magnitude, exactly as the theory predicts; the oscillation is just the complex phase rotating. On the real web, with $d = 0.85$, the worst-case rate is bounded by $d$ (the spectral gap below $1$ is at least $1 - d = 0.15$), and roughly $50$–$100$ iterations suffice to rank the entire internet to working precision — a handful of sparse matrix-vector products, each touching only the actual links. That is the whole reason a problem with billions of unknowns is tractable: a guaranteed spectral gap turns an impossible eigen-decomposition into a few rounds of "follow every link once."

29.7.2 The teleport vector need not be uniform: personalized PageRank

There is a small generalization of the Google matrix that is both practically important and a clean test of whether you understand the construction. Nothing forced the teleport destination to be uniform. We chose $\tfrac{1}{n}\mathbf{1}\mathbf{1}^{\mathsf{T}}$ — jump to any page with equal probability — but we could instead pick any probability distribution $\mathbf{p}$ over pages and teleport according to it. The Google matrix becomes

$$G = d\,S + (1-d)\,\mathbf{p}\,\mathbf{1}^{\mathsf{T}},$$

where $\mathbf{p}$ is a nonnegative vector summing to $1$ (the personalization vector or teleport vector), and $\mathbf{p}\mathbf{1}^{\mathsf{T}}$ is the matrix every column of which equals $\mathbf{p}$. Setting $\mathbf{p} = \tfrac1n\mathbf{1}$ recovers ordinary PageRank. But choose $\mathbf{p}$ to concentrate on, say, sports pages, and the surfer who gets bored always restarts somewhere in the sports neighborhood — the resulting stationary vector ranks the whole web as seen from a sports-interested reader, boosting pages near the sports cluster. This is topic-sensitive or personalized PageRank, and it is how link-based ranking can be tailored to an interest, a user, or a starting context. [verify]

The mathematics is unchanged in every respect that matters. As long as $\mathbf{p}$ is strictly positive, $G$ remains a positive column-stochastic matrix, so Perron-Frobenius still guarantees a unique positive stationary eigenvector for eigenvalue $1$, and power iteration still converges at rate $\le d$. (Even when $\mathbf{p}$ has some zeros, the chain stays primitive as long as $\mathbf{p}$'s support is reachable, which it generally is.) The same algorithm, the same eigenvalue-$1$ eigenvector, the same convergence — only the teleport target moved. This flexibility is one reason the random-surfer model proved so durable: the damping/teleport term is not just a bug-fix but a tunable knob, and personalization is the knob turned toward a chosen corner of the web. It also previews a theme of recommendation systems (Chapter 33), where "rank the items as seen from this user" is exactly the personalized-teleport idea applied to a user-item graph.

Common Pitfall — Power iteration can stall if the top two eigenvalues tie in magnitude. The convergence rate is $|\lambda_2|/|\lambda_1|$, so if $|\lambda_2| = |\lambda_1|$ the error never shrinks and the iterates oscillate forever instead of settling. For a raw link matrix this genuinely happens — a periodic trap (the surfer bouncing $A \to B \to A \to B$) produces an eigenvalue $-1$ tied with $+1$ in magnitude, and the walk never converges. Damping is again the rescue: it pulls every eigenvalue except the Perron $1$ strictly inside the unit circle (§29.5.2), guaranteeing $|\lambda_2| \le d < 1$ and therefore guaranteeing convergence. The spectral gap that Perron-Frobenius promises is not a bonus; it is what makes the algorithm terminate.

29.8 What goes wrong without the fixes? Seeing the failures

It is worth seeing, with numbers, exactly how the naive algorithm fails — both because it cements why the Google matrix is built as it is, and because diagnosing these failures is a skill the exercises will test. Let us run power iteration on the raw link matrix $M$, dangling page and all, and watch it collapse.

# What happens if we skip the fixes and iterate the RAW link matrix M?
import numpy as np
M = np.array([[0,1,1/2,1/3,1/4],[0,0,1/2,1/3,1/4],
              [0,0,0,1/3,1/4],[0,0,0,0,1/4],[0,0,0,0,0]])
n = 5
r = np.ones(n) / n                        # uniform start, sums to 1
for t in range(1, 6):
    r = M @ r                             # NO renormalization, NO damping
    print(f"after {t} iters: {np.round(r, 5)}   total = {r.sum():.5f}")
# after 1 iters: [0.41667 0.21667 0.11667 0.05    0.     ]   total = 0.80000
# after 2 iters: [0.29167 0.075   0.01667 0.      0.     ]   total = 0.38333
# after 3 iters: [0.08333 0.00833 0.      0.      0.     ]   total = 0.09167
# after 4 iters: [0.00833 0.      0.      0.      0.     ]   total = 0.00833
# after 5 iters: [0.      0.      0.      0.      0.     ]   total = 0.00000

The total probability is not conserved — it collapses, $1 \to 0.8 \to 0.383 \to 0.092 \to 0.008 \to 0$, leaking out through dangling page $1$ a little more each step until nothing is left. The rank vector converges to the zero vector, which assigns every page rank $0$ and ranks nothing. (This particular raw matrix is even nilpotent — being strictly upper-triangular, $M^5 = 0$ exactly — so the collapse is total in five steps; a general dangling-laden web leaks more gradually but to the same fate.) This is the dangling-node leak of §29.5 made concrete: the raw matrix has dominant eigenvalue less than $1$, so power iteration drives everything to zero. Patch the dangling column to get the stochastic $S$, then damp to get the positive $G$, and the leak is sealed, the limit is the unique positive PageRank vector, and convergence is fast. Three lines of fix turn a broken computation into the algorithm that organized the web.

Real-World Application — How search actually uses this. PageRank is one query-independent signal: it scores every page's general importance once, offline, from the link graph alone, independent of what anyone searches for. When you type a query, the search engine first finds pages matching your words, then ranks those candidates using PageRank together with hundreds of other signals — text relevance, freshness, location, click behavior, and (today) machine-learned models. PageRank was the breakthrough that made early Google's results dramatically better than keyword-only competitors, but modern ranking is far more than PageRank alone. For the bigger picture of retrieval, indexing, and ranking, see how search works. The eigenvector you computed in this chapter is the historical seed of the entire field.

29.9 Build it yourself: the `pagerank` toolkit function

You now understand every piece: the link matrix, the stochastic patch, the damping that builds the Google matrix, and power iteration as the engine. It is time to assemble them into a single reusable function — the capstone of your eigenvalue toolkit and a direct application of the power_iteration you wrote in Chapter 23.

Build Your Toolkit — Implement pagerank(link_matrix, damping=0.85, tol=1e-10) in toolkit/capstone/pagerank.py, built on the power_iteration you wrote in Chapter 23 (import it from toolkit/eigen.py). Your function should: (1) patch dangling nodes — replace every all-zero column of link_matrix with the uniform column $1/n$; (2) form the Google matrix $G = d\,S + \tfrac{1-d}{n}\mathbf{1}\mathbf{1}^{\mathsf{T}}$; (3) call your power_iteration on $G$ starting from the uniform vector, renormalizing to sum $1$ ($\ell_1$) each step instead of unit length; and (4) return the rank vector. Write the matrix-building and patching in pure Python (no numpy in the implementation — only your own toolkit's vector and matrix operations from Chapters 2 and 7). Then verify two things against numpy on the five-page web of §29.6: that your vector matches np.linalg.eig's eigenvalue-$1$ eigenvector (normalized to sum $1$) to within tol, and that it really is an eigenvector of $G$ with eigenvalue $1$ — i.e. $G\mathbf{r} = \mathbf{r}$, so $\lVert G\mathbf{r} - \mathbf{r}\rVert \approx 0$. This seeds toolkit/capstone/pagerank.py, the runnable PageRank demo for the Chapter 39 capstone.

Here is a numpy sketch of the assembled algorithm — the real toolkit version does the matrix construction in pure Python, but the logic is identical, and the output matches §29.6 and §29.7 exactly:

# pagerank(link_matrix, damping) -- the whole algorithm, built on power iteration.
import numpy as np

def pagerank(link_matrix, damping=0.85, tol=1e-10, max_iters=1000):
    M = np.asarray(link_matrix, dtype=float)
    n = M.shape[0]
    S = M.copy()
    dangling = S.sum(axis=0) == 0                 # columns that sum to zero
    S[:, dangling] = 1.0 / n                       # patch: dead-end -> uniform jump
    G = damping * S + (1 - damping) / n * np.ones((n, n))   # Google matrix
    r = np.ones(n) / n                             # uniform start
    for _ in range(max_iters):                     # power iteration (Ch. 23)
        r_next = G @ r
        r_next /= r_next.sum()                      # L1 renormalize (probabilities)
        if np.abs(r_next - r).sum() < tol:
            return r_next
        r = r_next
    return r

M = np.array([[0,1,1/2,1/3,1/4],[0,0,1/2,1/3,1/4],
              [0,0,0,1/3,1/4],[0,0,0,0,1/4],[0,0,0,0,0]])
r = pagerank(M)
print("PageRank:", np.round(r, 4))     # [0.4066 0.2198 0.1542 0.1202 0.0991]

# VERIFY it is the eigenvalue-1 eigenvector of the Google matrix:
n = 5; S = M.copy(); S[:, 0] = 1/5
G = 0.85 * S + 0.15 / 5 * np.ones((5, 5))
print("is eigenvector (||Gr - r||):", np.linalg.norm(G @ r - r))  # ~1e-11
print("numpy lambda=1 vector:",
      np.round(np.abs(np.linalg.eig(G)[1][:, np.argmax(np.linalg.eig(G)[0].real)].real)
               / np.abs(np.linalg.eig(G)[1][:, np.argmax(np.linalg.eig(G)[0].real)].real).sum(), 4))
# is eigenvector (||Gr - r||): 1.1e-11
# numpy lambda=1 vector: [0.4066 0.2198 0.1542 0.1202 0.0991]

The function returns $(0.4066, 0.2198, 0.1542, 0.1202, 0.0991)$, the residual $\lVert G\mathbf{r} - \mathbf{r}\rVert$ is essentially zero (confirming $\mathbf{r}$ is genuinely the eigenvalue-$1$ eigenvector), and the vector matches numpy's eigen-solver to four decimals. Three checks, all green: the result is a probability distribution, it is fixed by $G$, and it agrees with the independent eigen-decomposition. That is what it means to verify an eigenvector — and what your toolkit now does for any web you hand it.

Check Your Understanding — In our five-page web, page $5$ links out to all four other pages but receives no incoming links. Yet its PageRank is not zero — it is $0.0991$. Where does page $5$'s nonzero rank come from?
Answer
Two sources, both from the fixes in §29.5. First, the damping/teleport term gives every page, including page $5$, a baseline rank: with probability $1-d = 0.15$ the surfer teleports to a uniformly random page, so page $5$ is landed on $0.15/5 = 0.03$ of the time by teleport alone, regardless of links. Second, the dangling-node patch on page $1$ sends $1/5$ of page $1$'s (substantial) rank to every page, page $5$ included. So page $5$'s rank is floor set by teleportation and topped up by the dangling page's uniform spillover. This is exactly why Perron-Frobenius conclusion (iii) guarantees strictly positive ranks for every page once you damp — no page can be ranked zero, even one nobody links to. Without damping, a page with no incoming links would indeed collapse to rank $0$.

29.10 Putting it together: the eigenvalue story, paid off

Stand back and see what this chapter assembled, because it is the culmination of the entire eigenvalue arc that began in Chapter 23. We started with a web of pages and links — a directed graph — and turned "who points to whom" into a link matrix $M$ whose columns redistribute each page's rank along its outgoing links. We recognized $M$ (once stochastic) as the matrix of a Markov chain, a random surfer clicking links forever, and we asked where she settles. The answer was a distribution unchanged by one more step: $M\mathbf{r} = \mathbf{r}$ — the eigen-equation with eigenvalue $1$, so PageRank is the dominant eigenvector of the link matrix. We confronted the two failures of the naive model — dangling dead ends that leak rank, and disconnected islands that destroy uniqueness — and repaired both with the single elegant device of the damping factor, producing the positive, stochastic Google matrix $G = dS + \tfrac{1-d}{n}\mathbf{1}\mathbf{1}^{\mathsf{T}}$. The Perron-Frobenius theorem, with its stated conditions, then guaranteed exactly one positive stationary rank and a spectral gap below $1$. And power iteration — the method you built in Chapter 23 — turned that eigenvector into something computable at the scale of the web, converging at a rate set by the damping factor.

This chapter pays off the anchor we planted twice. In Chapter 3 the web appeared as a system of linear equations; in Chapter 23 we previewed the city-and-suburb steady state and promised PageRank was the same idea at planetary scale. Now you have seen the whole machine, and the promise is kept: Google's original ranking of the internet was the computation of a single dominant eigenvector by repeated matrix-vector multiplication. The search engine that organized human knowledge is, at its mathematical core, this chapter's eigenvalue problem — the most consequential eigenvector ever computed.

Above all, PageRank is the grandest possible demonstration of recurring theme #6: eigenvalues and eigenvectors reveal what a matrix really does. The Google matrix is a billion-by-billion grid of numbers, utterly opaque as a table. But its dominant eigenvector is meaning incarnate — one number per page, the web's own collective judgment of importance, extracted by finding the one invariant direction the transformation does not change. The eigenvector is not a byproduct of the matrix; it is the matrix's essential content. And notice the geometry-algebra unity (theme #2) at its most vivid: a single eigenvector is simultaneously a root-of-a-polynomial (eigenvalue $1$), a steady state of a random walk, a fixed point of a transformation, and the importance score of every page on the internet — four faces of one object.

Where does the eigenvalue story go from here? We have now seen eigenvectors diagonalize a matrix (Chapter 25), come out orthogonal for symmetric matrices (Chapter 27), and rank the web (this chapter). But eigenvectors as we have built them have a limitation: they require a square matrix, and not every square matrix even has a full set of them (the defective shears of Chapter 24). Part VI lifts both restrictions at once. The Singular Value Decomposition of Chapter 30 generalizes diagonalization to every matrix — rectangular, rank-deficient, anything — by finding not one set of special directions but two, an input set and an output set, linked by stretch factors called singular values. It is the eigenvector idea set free, and it will compress images, denoise data, and power machine learning. Turn the page; the most universal tool in the subject is next.

29.11 Frequently asked questions

How does PageRank work, in one paragraph? PageRank models the web as a directed graph and imagines a "random surfer" who clicks links forever, occasionally jumping to a random page (the damping factor, about $15\%$ of the time). The fraction of time the surfer spends on each page, in the long run, is that page's rank. Mathematically this long-run distribution is the dominant eigenvector (eigenvalue $1$) of a column-stochastic matrix called the Google matrix, and it is computed by power iteration — repeatedly multiplying the matrix by a vector until the result stops changing. Important pages are those that important pages link to, a recursion the eigenvector resolves all at once.

Is PageRank still what Google uses? PageRank was the original breakthrough (1996–1998) and a genuine eigenvector computation, but modern search ranks pages with hundreds of signals — text relevance, freshness, user behavior, and machine-learned models — of which link-based importance is only one. The eigenvector idea remains foundational and link analysis is still used, but "PageRank" today refers more to the historical algorithm and the family of ideas than to Google's current full ranking system. The mathematics in this chapter is the real original method; the production system has evolved far beyond it. [verify]

Why exactly $0.85$ for the damping factor? The choice $d = 0.85$ was reported in Brin and Page's original work and balances two pressures: a larger $d$ trusts the actual link structure more (the surfer follows links more faithfully), while a smaller $d$ produces a bigger spectral gap and faster convergence. At $d = 0.85$, power iteration's error shrinks by at most $0.85$ per step, so on the order of $50$–$100$ iterations rank the whole web to working precision. The value is a modeling choice, not a mathematical necessity — other engines use different dampings. [verify]

Does PageRank give the eigenvector or the eigenvalue? It gives the eigenvector — one number per page, the rank scores. The relevant eigenvalue is always $1$ (the stationary distribution lives there, guaranteed by the stochastic structure and Perron-Frobenius), so the eigenvalue itself carries no ranking information; all the content is in the eigenvector's entries. This is the reverse of, say, vibration analysis (Chapter 23's second case study), where the eigenvalues (natural frequencies) are the prize. PageRank is the rare application where the eigenvalue is fixed and known in advance, and only the eigenvector carries information — a clean reminder that an eigenproblem has two outputs, and different applications care about different ones.

Could you compute PageRank without ever mentioning eigenvectors? Operationally, yes — you could just say "iterate the surfer's redistribution until it stops changing" and never utter the word eigenvector, and many engineering descriptions do exactly that. But you would lose every guarantee. The eigenvector framing is what tells you the iteration converges (Perron-Frobenius spectral gap), converges to a unique answer (simplicity of $\lambda=1$), converges to a meaningful one (positivity), and how fast (the damping factor as convergence rate). The picture you can skip; the theory you cannot. That is the difference between an algorithm that happens to work and one you can prove will.

Prerequisites

Learning Objectives

In This Chapter

Application: PageRank — How Google Ranks the Internet with Eigenvectors

29.1 What does it mean to rank the web? The graph picture

29.1.1 Why counting links is not enough

29.2 How do you turn a web of links into a matrix?

29.2.1 Building the link matrix by hand

29.3 What is a column-stochastic matrix, and why is it the right object?

29.3.2 The same idea as a system of equations (the Chapter 3 connection)

29.3.1 The random surfer: a story for the matrix

29.4 Why is PageRank an eigenvector? The stationary distribution

29.4.1 Why eigenvalue 1 always exists for a stochastic matrix

29.5 What breaks, and what is the Google matrix? Dangling nodes, dead ends, and damping

29.5.1 The Perron-Frobenius theorem: why there is exactly one answer

Warning

29.5.2 What does the damping factor do to the eigenvalues?

29.6 Computing PageRank on a small web, by hand and in numpy

29.7 Why is power iteration the right algorithm? Convergence to the dominant eigenvector

29.7.1 How fast does it converge? The spectral gap, exactly

29.7.2 The teleport vector need not be uniform: personalized PageRank

29.8 What goes wrong without the fixes? Seeing the failures

29.9 Build it yourself: the pagerank toolkit function

29.10 Putting it together: the eigenvalue story, paid off

29.11 Frequently asked questions

29.9 Build it yourself: the `pagerank` toolkit function