Chapter 29 — Further Reading

DataField.Dev

Chapter 29 — Further Reading

Annotated pointers for going deeper, mapped to the standard texts and to free resources. The throughline: PageRank is the dominant eigenvector of a stochastic matrix, found by power iteration — so the most illuminating treatments present it as linear algebra (eigenvectors, Markov chains, Perron-Frobenius) rather than as a search-engine recipe.

The original sources

Sergey Brin and Lawrence Page, "The Anatomy of a Large-Scale Hypertextual Web Search Engine," Computer Networks and ISDN Systems 30 (1998), 107–117. [verify] The paper that introduced Google and described PageRank, including the random-surfer model and the damping factor. It is surprisingly readable and worth seeing in the original: the famous recursive definition, the choice of $d = 0.85$, and the claim that the computation scales to the whole web are all here. Pair it with the companion technical report "The PageRank Citation Ranking: Bringing Order to the Web" (Page, Brin, Motwani, Winograd, 1998/1999) [verify], which treats the mathematics — the stochastic matrix, the eigenvector, and the iteration — in more detail. Read these to see the chapter's mathematics in the words of its inventors; cite carefully, as exact dates and venues for the technical report are sometimes reported inconsistently.
Amy Langville and Carl Meyer, Google's PageRank and Beyond: The Science of Search Engine Rankings (Princeton, 2006). The definitive book-length treatment, and the single best next step if PageRank captivates you. Langville and Meyer develop the stochastic-matrix model, the dangling-node and reducibility problems, the Google matrix, the full Perron-Frobenius theory, and the numerical analysis of power iteration (including the convergence-rate-equals-damping-factor result of §29.5.2) with exactly this chapter's blend of rigor and intuition. They also cover the personalized/topic-sensitive variants of §29.7.2 and the broader landscape of ranking algorithms (HITS, SALSA). Rigorous but accessible to a reader who has finished Part V.

Core linear algebra texts

Gilbert Strang, Introduction to Linear Algebra (5th/6th ed.), §10.3 ("Markov Matrices, Population, and Economics") and the eigenvalue chapters (§§6.1–6.4). Strang is the right first stop because he tells exactly this chapter's story: he treats Markov (stochastic) matrices, proves that eigenvalue $1$ always exists with all other $|\lambda| \le 1$, and presents the steady state as the eigenvector for $\lambda = 1$ reached by powers of the matrix — the migration-model and PageRank picture. His treatment of $A^k$ via eigenvalues (the diagonalization of Chapter 25) is the engine behind the convergence argument of §29.7. He also discusses PageRank explicitly as the headline application of Markov matrices. If you read one supplementary source on the mathematics, read this.
Gilbert Strang, Linear Algebra and Learning from Data (2019), the chapters on eigenvalues and on PageRank/optimization. A more application-driven companion that explicitly frames PageRank, power iteration, and the dominant eigenvector in the context of modern data science and machine learning — the bridge from this chapter to Part VI's SVD and PCA. Excellent for CS and data-science readers who want to see where the eigenvector idea goes next.
Sheldon Axler, Linear Algebra Done Right (4th ed.), Chapters 5 and 8 (eigenvalues, operators on real and complex vector spaces). Axler builds eigenvalues and eigenvectors abstractly and rigorously. He does not treat PageRank or stochastic matrices specifically, but his careful development of eigenvalues, the existence of eigenvalues over $\mathbb{C}$, and operator structure is the proper foundation for the "$\lambda = 1$ exists and is simple" claims. The natural companion for the proof track and for readers heading toward the abstract spaces of Chapter 34.

On Perron-Frobenius and nonnegative matrices

Carl Meyer, Matrix Analysis and Applied Linear Algebra (SIAM), Chapter 8 (Perron-Frobenius theory). The clearest standard reference for the theorem at the heart of §29.5.1. Meyer states the irreducible and primitive cases precisely, proves the existence of a simple dominant eigenvalue with a positive eigenvector, and connects it directly to Markov chains and to PageRank. The place to go for the full proof of the conditions this chapter only stated.
Roger Horn and Charles Johnson, Matrix Analysis (2nd ed.), Chapter 8. The encyclopedic graduate treatment of nonnegative and positive matrices and the Perron-Frobenius theorem in its full generality. More than this chapter requires, but the definitive source if spectral theory of nonnegative matrices becomes your field.

Markov chains, for the probability side

The probabilistic theory of Markov chains — stationary distributions, ergodicity (the §29.4 "time average equals stationary distribution" theorem), mixing times, and the conditions for convergence — is developed thoroughly in the data-science treatment of Markov chains. PageRank is one Markov chain among thousands, and seeing the general theory (transient vs. recurrent states, the ergodic theorem, detailed balance) makes precise why a primitive chain converges to a unique stationary distribution from any start. Read it alongside this chapter to connect the linear-algebra view (dominant eigenvector) with the probabilistic view (long-run state frequencies).

The search-engine and applications side

The broader picture of how retrieval, indexing, and ranking actually fit together — of which link-based importance is one signal among many — is surveyed in how search works. PageRank was the breakthrough that made early Google dramatically better than keyword-only competitors, but modern ranking combines it with text relevance, freshness, user behavior, and machine-learned models. This is the place to calibrate the honest claim of §29.8's Real-World Application: PageRank is foundational and historically decisive, not the whole of modern search.
Eigenvector centrality and the science of networks. For the application of the same dominant-eigenvector idea to social networks, citation networks (the Eigenfactor metric of Case Study 2), and influence ranking generally, see Mark Newman, Networks: An Introduction (Oxford), Chapter 7 (measures and metrics), which treats eigenvector centrality, Katz centrality, and PageRank as a family of related network-importance measures built from exactly this chapter's mathematics. [verify]

Free online resources

MIT OpenCourseWare 18.06 (Strang's Linear Algebra). The lectures on eigenvalues, diagonalization, and Markov matrices are the video companion to Part V and to this chapter; Strang's lecture on Markov matrices walks through the $\lambda = 1$ steady-state argument directly.
3Blue1Brown, Essence of Linear Algebra, the eigenvalue episodes. Geometric, animation-first intuition for eigenvectors as the invariant directions of a transformation — the picture underneath the whole PageRank story.
The toolkit pagerank you built (toolkit/capstone/pagerank.py). The most useful "further reading" is your own code: run it on a small web of your own design, vary the damping factor, watch the convergence rate change, and verify against np.linalg.eig that you are computing the eigenvalue-$1$ eigenvector. The Chapter 39 capstone assembles this into a runnable PageRank demo.