Case Study 1 — The Eigenvector That Predicts Market Share

DataField.Dev

Field: economics / business analytics (data science). This case study shows how a single eigenvector — the steady state of a Markov chain — forecasts the long-run market share of competing brands, and why the second eigenvalue tells you how long the forecast takes to come true. It is the same dominant-eigenvector idea that powers PageRank (Chapter 29), here at the scale of a marketing department rather than the whole web.

The problem: where does the market settle?

Imagine three streaming services — call them Brand A, Brand B, and Brand C — competing for the same subscribers. Every month, customers switch around. The marketing team at Brand A has measured the monthly switching behavior from a year of subscription data, and it looks like this:

Of Brand A's subscribers, $70\%$ stay, $20\%$ switch to B, and $10\%$ switch to C.
Of Brand B's subscribers, $60\%$ stay, $30\%$ switch to A, and $10\%$ switch to C.
Of Brand C's subscribers, $50\%$ stay, $30\%$ switch to A, and $20\%$ switch to B.

The executives want to know: if these switching rates hold steady, what share of the market will each brand command in the long run? And does that long-run answer depend on who is ahead today? These are exactly the questions an eigenvector answers.

Building the transition matrix

The natural state vector is the market share, $\mathbf{x} = (x_A, x_B, x_C)$, where the three numbers are non-negative and sum to $1$. One month of switching is a linear transformation: each component of next month's share is a weighted sum of this month's shares. We assemble the transition matrix $P$ so that column $j$ records where the customers of brand $j$ go. Reading down each column from the data above:

$$P = \begin{bmatrix} 0.7 & 0.3 & 0.3 \\ 0.2 & 0.6 & 0.2 \\ 0.1 & 0.1 & 0.5 \end{bmatrix}, \qquad \mathbf{x}_{\text{next}} = P\,\mathbf{x}.$$

Each column sums to $1$ — every customer of every brand ends up somewhere next month — so $P$ is a column-stochastic matrix, the same kind of object that PageRank uses for the web. (If you prefer rows-sum-to-one, you would transpose and multiply on the other side; the column convention here matches the chapter and PageRank.) Running the model forward is repeated multiplication by $P$: two months is $P^2\mathbf{x}$, a year is $P^{12}\mathbf{x}$, and the "long run" is what happens as the exponent grows.

The long run is an eigenvector with $\lambda = 1$

By the repeated-application principle of §23.8, the long-run behavior is governed by the eigenvalues of $P$. A steady state — a market distribution that stops changing month to month — is a vector $\mathbf{x}^\star$ with

$$P\mathbf{x}^\star = \mathbf{x}^\star,$$

which is precisely the eigen-equation $A\mathbf{v} = \lambda\mathbf{v}$ with eigenvalue $\lambda = 1$. The steady-state market share is the eigenvector of $P$ for the eigenvalue $1$, normalized so its entries sum to $1$. This is not a coincidence of this example: every column-stochastic matrix has $\lambda = 1$ as an eigenvalue (a fact we prove via Perron–Frobenius theory in Chapter 29), because the all-ones row vector satisfies $\mathbf{1}^{\mathsf{T}}P = \mathbf{1}^{\mathsf{T}}$, and a matrix and its transpose share eigenvalues (Exercise 23.21).

Let us compute it. We solve $(P - I)\mathbf{x}^\star = \mathbf{0}$, find the null space, and rescale to sum to $1$. Here is the computation with numpy, with output you can verify:

# The steady-state market share is the eigenvector of P for eigenvalue 1.
import numpy as np
P = np.array([[0.7, 0.3, 0.3],
              [0.2, 0.6, 0.2],
              [0.1, 0.1, 0.5]])
w, V = np.linalg.eig(P)
print("eigenvalues:", np.round(w, 4))            # [1.  0.4  0.4]
i = np.argmin(np.abs(w - 1))                      # locate lambda = 1
x_star = V[:, i].real
x_star = x_star / x_star.sum()                    # normalize to sum 1
print("steady-state shares:", np.round(x_star, 4))  # [0.5    0.3333 0.1667]
print("check P @ x_star:    ", np.round(P @ x_star, 4))  # unchanged

The eigenvalues come back as $1$, $0.4$, and $0.4$, and the eigenvector for $\lambda = 1$, normalized, is

$$\mathbf{x}^\star = \left(\tfrac12,\ \tfrac13,\ \tfrac16\right) = (0.500,\ 0.333,\ 0.167).$$

So in the long run, Brand A captures half the market, Brand B a third, and Brand C a sixth — regardless of today's shares. The executives now have their forecast, and it came from a single eigenvector.

Why the answer is independent of today's shares

The most striking claim is that the steady state does not depend on where the market starts. Eigenvalues explain why. Decompose any starting distribution $\mathbf{x}_0$ along the eigenvectors of $P$: one piece along the $\lambda = 1$ eigenvector $\mathbf{x}^\star$, and the rest along the eigenvectors for $\lambda = 0.4$. Applying $P$ a total of $k$ times multiplies each piece by its eigenvalue to the $k$-th power:

$$P^k\mathbf{x}_0 = \underbrace{1^k\,(\text{piece along } \mathbf{x}^\star)}_{\text{never shrinks}} + \underbrace{0.4^k\,(\text{pieces along the others})}_{\to\, 0}.$$

The $\lambda = 1$ piece is preserved forever; every other piece is multiplied by $0.4^k$, which races to zero. After enough months only the steady-state piece survives, no matter what $\mathbf{x}_0$ was. That is why the starting market share washes out — the non-dominant eigen-directions decay, leaving only the dominant eigenvector $\mathbf{x}^\star$. The market has a single attractor, and it is an eigenvector.

We can watch the decay happen. Starting from a market that is $100\%$ Brand A — the most lopsided start imaginable — and iterating:

x = np.array([1.0, 0.0, 0.0])                     # start: 100% Brand A
for k in [1, 3, 6, 12]:
    print(k, "months:", np.round(np.linalg.matrix_power(P, k) @ x, 4))
# 1  months: [0.7   0.2   0.1  ]
# 3  months: [0.532 0.312 0.156]
# 6  months: [0.502 0.332 0.166]
# 12 months: [0.5   0.3333 0.1667]

Within a year the shares have all but reached $(\tfrac12, \tfrac13, \tfrac16)$, even though we started from a single dominant brand. The forecast is robust.

The second eigenvalue is the speed of convergence

There is a bonus hidden in the spectrum. The steady state is set by the $\lambda = 1$ eigenvector, but how fast the market reaches it is governed by the second-largest eigenvalue in magnitude — here $|\lambda_2| = 0.4$. Each month, the gap between the current distribution and the steady state shrinks by a factor of about $0.4$, because that is the rate at which the non-dominant pieces decay. After $k$ months the leftover deviation is on the order of $0.4^k$: after $6$ months that is $0.4^6 \approx 0.004$, already under half a percent, which is exactly why the table above has essentially converged by month six.

This spectral gap — the distance between the dominant eigenvalue ($1$) and the next one ($0.4$) — is one of the most practically important numbers in all of applied linear algebra. A large gap means fast convergence; a small gap means the market drifts toward equilibrium agonizingly slowly. The very same quantity determines how many iterations PageRank needs to rank the web (Chapter 29) and how quickly a randomized algorithm mixes. The marketing team can tell their executives not just where the market is heading, but how soon it will get there — and both answers are eigenvalues of one matrix.

It is worth appreciating how much this changes the conversation in a boardroom. A naive analyst might simulate the chain month by month, run it for a few periods, eyeball the trend, and hope it has settled — an approach that is slow, approximate, and gives no guarantee. The eigen-analysis replaces all of that with two exact facts: the steady state is the $\lambda = 1$ eigenvector (computed once, exactly), and the time-to-converge is set by $|\lambda_2|$. If a competitor changes its pricing and the switching rates shift, you do not re-run a year of simulation; you rebuild $P$, recompute one eigenvector, and read off the new equilibrium instantly. The eigenvector is not just an answer — it is the closed-form answer that a simulation can only approximate, which is exactly why analytics teams reach for the spectral view whenever a process repeats.

The eigenvalue dictionary for this model. $\lambda_1 = 1$: the steady state exists, and its eigenvector is the long-run market share. $|\lambda_2| = 0.4 < 1$: the steady state is stable and unique, and the market converges to it geometrically at rate $0.4$ per month. Had a second eigenvalue equaled $1$ in magnitude, the market would have multiple steady states or oscillate forever — the kind of pathology Chapter 29 rules out by requiring the chain to be "regular."

What the eigenvector bought us

Notice everything we extracted from a $3\times3$ matrix using nothing but the ideas of this chapter. We turned a business question — what share will each brand hold? — into the search for an invariant direction, recognized the steady state as the eigenvector for $\lambda = 1$, used the decay of the other eigenvalues to explain why the answer ignores the starting point, and read the convergence speed straight off the second eigenvalue. No simulation was strictly necessary; the eigenvector is the forecast.

Scale this up and you get some of the most consequential analytics in industry. Replace three brands with millions of web pages and you get PageRank. Replace brands with customer states (active, lapsed, churned) and you get the customer-lifetime-value models that subscription businesses live by. Replace them with the words a user might type next and you get a Markov language model. In every case the long-run answer is a dominant eigenvector, found by exactly the reasoning above — and, at large scale, by the power iteration you will build in this chapter's toolkit and unleash in Chapter 29. The vector a matrix doesn't rotate turns out to be the vector a market settles into.