Case Study 2 — "People Like You Also Liked": Cosine Similarity in a Recommender

Field: data science / recommender systems (collaborative filtering). Concepts used: the dot product, cosine similarity, mean-centering, the connection to the correlation coefficient, orthogonality. Anchor tie-in: the chapter's cosine-similarity anchor, now measuring similarity between users instead of documents — and revealing a subtle trap that mean-centering fixes. This is the geometry behind the recommendations on every streaming and shopping site.

The problem: finding your taste-neighbors

When a streaming service suggests a film "because you watched X," it is often running the oldest and most intuitive recommendation algorithm: user-based collaborative filtering. The idea is disarmingly simple. Represent each user as a vector of their ratings across the catalog. To recommend something to you, find the users whose rating vectors are most similar to yours — your taste-neighbors — and suggest the things they rated highly that you have not seen. Everything hinges on one word, similar, and similarity between rating vectors is, once again, an angle: cosine similarity from §18.9.

Let us make it concrete with five movies — three science-fiction films (call them SciFi-A, SciFi-B, SciFi-C) and two romances (Rom-A, Rom-B) — and four users who have each rated all five on a $1$-to-$5$ scale:

$$ \begin{aligned} \text{Alice} &= (5,5,1,1,5), &\quad \text{Bob} &= (4,5,2,1,4),\\ \text{Carol} &= (1,1,5,5,1), &\quad \text{Dave} &= (2,1,5,4,2). \end{aligned} $$

Reading the rows: Alice loves sci-fi and dislikes romance; Bob is nearly identical to Alice; Carol is the mirror image (loves romance, dislikes sci-fi); Dave is close to Carol. By eye, Alice's taste-neighbor is Bob, and her taste-opposite is Carol. A good similarity measure must recover this. Watch what happens when we try the obvious thing.

The trap: raw cosine similarity is fooled by positivity

Compute the raw cosine similarity between Alice and each other user, exactly as in the document case study — dot product over the product of norms.

# User-based collaborative filtering: raw cosine similarity between rating vectors.
import numpy as np
def cosine_similarity(u, v):
    return (u @ v) / (np.linalg.norm(u) * np.linalg.norm(v))

ratings = {
    "Alice": np.array([5., 5, 1, 1, 5]),
    "Bob":   np.array([4., 5, 2, 1, 4]),   # same taste as Alice
    "Carol": np.array([1., 1, 5, 5, 1]),   # opposite taste
    "Dave":  np.array([2., 1, 5, 4, 2]),   # close to Carol
}
for name in ("Bob", "Carol", "Dave"):
    print(f"raw cossim(Alice, {name}) = {cosine_similarity(ratings['Alice'], ratings[name]):.4f}")
# raw cossim(Alice, Bob)   = 0.9842
# raw cossim(Alice, Carol) = 0.3913
# raw cossim(Alice, Dave)  = 0.5480

Bob scores $0.9842$ — correctly flagged as Alice's near-twin. But look at Carol: $0.3913$, a positive similarity, as if Carol were somewhat like Alice. She is the opposite of Alice in every preference. And Dave, also a romance-lover, scores $0.5480$ — even higher than Carol, suggesting moderate agreement with Alice. The raw cosine has been fooled.

The reason is that all the ratings are positive numbers between $1$ and $5$. Every rating vector therefore points into the "all-positive" corner of $\mathbb{R}^5$, and any two such vectors share that broad common direction, forcing their cosine to be substantially positive no matter how the preferences differ. Raw cosine measures "do both users rate things highly in an overlapping way," but it cannot see disagreement, because no user ever supplies the negative numbers that would pull a vector into opposition. The shared baseline of positivity drowns out the signal.

The geometric diagnosis: all four rating vectors are crammed into one octant of the space (everything positive), so the largest angle between any two of them is small — there is no room to be truly opposite. The information about taste is not in the vectors' absolute direction but in how they deviate from each user's average. We need to move the origin to each user's mean.

The fix: mean-centering turns cosine into correlation

The cure is mean-centering, and §18.9 already told us what it computes: the cosine similarity of the mean-centered vectors is the Pearson correlation coefficient. Subtract each user's average rating from their vector before comparing. Centering re-references each user to their own baseline, so a rating above a user's average becomes positive and one below becomes negative — and now genuine disagreement produces genuinely opposing vectors.

# Centered cosine = Pearson correlation: the standard fix for collaborative filtering.
import numpy as np
def cosine_similarity(u, v):
    return (u @ v) / (np.linalg.norm(u) * np.linalg.norm(v))
def centered_similarity(u, v):                 # Pearson correlation
    return cosine_similarity(u - u.mean(), v - v.mean())

ratings = {
    "Alice": np.array([5., 5, 1, 1, 5]),
    "Bob":   np.array([4., 5, 2, 1, 4]),
    "Carol": np.array([1., 1, 5, 5, 1]),
    "Dave":  np.array([2., 1, 5, 4, 2]),
}
for name in ("Bob", "Carol", "Dave"):
    print(f"centered(Alice, {name}) = {centered_similarity(ratings['Alice'], ratings[name]):+.4f}")
# centered(Alice, Bob)   = +0.9444
# centered(Alice, Carol) = -1.0000
# centered(Alice, Dave)  = -0.9444

Now the geometry tells the truth. Bob stays strongly positive ($+0.9444$) — a real taste-neighbor. Carol drops to $-1.0000$: a perfect anti-correlation, exactly capturing that she likes precisely what Alice dislikes (after centering, Carol's deviation vector is the exact negative of Alice's, so the angle is $180^\circ$). Dave lands at $-0.9444$, strongly negative, correctly grouping him with Carol as a romance-lover whose tastes oppose Alice's. The misleading positive scores are gone. The only change was moving each vector's origin to that user's mean — a translation — and then measuring the same angle.

This is why production recommender systems and statistical analyses use centered (Pearson) similarity rather than raw cosine whenever the data has a baseline offset, as ratings always do. It is one of the most practically important consequences of a fact that looks purely theoretical: correlation is an angle (§18.9), and centering is what aligns the angle with the question you actually care about.

Turning similarity into a prediction

Finding taste-neighbors is the means; the goal is a prediction. Suppose Alice has not seen a sixth movie, and we want to predict her rating. The standard recipe is a similarity-weighted average of her neighbors' ratings: weight each neighbor's rating by how similar they are to Alice, so close neighbors count more and opposites count negatively (their dislikes predict her likes). With centered similarities $s_{\text{Bob}}=+0.9444$, $s_{\text{Dave}}=-0.9444$ and (centered) neighbor ratings $r_{\text{Bob}}, r_{\text{Dave}}$ for the new movie, a simple prediction for Alice's centered rating is

$$ \hat{r}_{\text{Alice}} = \frac{s_{\text{Bob}}\,r_{\text{Bob}} + s_{\text{Dave}}\,r_{\text{Dave}}}{|s_{\text{Bob}}| + |s_{\text{Dave}}|}, $$

then add Alice's mean back to return to the $1$–$5$ scale. If Bob loved the new film and Dave hated it, both pieces of evidence push Alice's predicted rating up — Bob because he agrees with her, Dave because he reliably disagrees, so his dislike is positive evidence for her. The negative similarity is not noise to be discarded; it is signal with a sign, and the dot-product geometry handles the sign automatically. This weighting-by-cosine is the computational heart of memory-based collaborative filtering.

Item-based filtering: the same geometry, transposed

There is a dual to everything above that powers some of the most famous recommendations on the web — the "customers who bought this also bought that" strip. Instead of comparing users by their rating vectors, item-based collaborative filtering compares items by the vectors of ratings they received across all users. Two movies are similar if the people who rated one tend to rate the other the same way — that is, if their columns in the user-by-item rating matrix point in similar directions. The similarity measure is, once more, the cosine (usually centered) between two item vectors. To recommend for you, the system finds the items most similar to ones you already liked and suggests those. Item-based filtering is often preferred in practice because item-item similarities are more stable over time than user tastes and can be precomputed. But notice the geometry has not changed one bit: it is cosine similarity, the chapter's anchor, applied to the columns of the rating matrix instead of the rows. The transpose swaps "users" for "items," and the dot product does the rest — a small echo of the row-space/column-space duality from Chapter 14.

A note on scale, sparsity, and where this goes

Two honest caveats keep this from sounding like magic. First, real rating matrices are enormously sparse — most users have rated only a tiny fraction of the catalog — so similarities are computed only over the items two users have both rated, and a similarity based on two shared ratings is far less trustworthy than one based on fifty. Second, computing the cosine between every pair of millions of users is expensive; production systems either restrict to approximate nearest neighbors or abandon memory-based filtering for matrix factorization, which we reach in Chapter 33. There, each user and each item gets a short learned embedding vector, and a predicted rating is — once again — essentially a dot product between a user vector and an item vector. The operation never leaves us: from raw counts to centered ratings to learned factors, recommendation is the dot product reading how much two vectors point the same way.

The lesson

User-based collaborative filtering recommends by finding your taste-neighbors, and "neighbor" means small angle between rating vectors — cosine similarity, the chapter's anchor, applied to people instead of documents. But raw cosine is fooled when data carries a baseline, as ratings do: shared positivity masquerades as shared taste, and a perfect opposite (Carol) reads as a mild ally. Mean-centering — which, by §18.9, converts cosine similarity into the Pearson correlation — re-references each user to their own average and restores the truth, sending genuine opposites to $-1$. The fix flows directly from understanding cosine similarity as an angle and centering as a shift of origin. Master the dot product as both alignment and (after centering) correlation, and you hold the geometry behind "people like you also liked" — and the doorway to the matrix-factorization recommenders of Chapter 33.