Case Study 2 — A Quant's Pipeline: Pricing a Thousand Scenarios with One Factorization

DataField.Dev

Case Study 2 — A Quant's Pipeline: Pricing a Thousand Scenarios with One Factorization

A risk desk at an investment bank does not ask one question of its model; it asks thousands. What happens to the portfolio if rates rise 50 basis points? If equities drop 10%? If both, plus a credit shock? Each scenario is a different right-hand side fed into the same linear model, and a regulator may demand the answer for hundreds of stress scenarios before lunch. The naive way — re-solving the model from scratch for every scenario — turns a quick report into an overnight batch job. The professional way is this chapter's idea: factor the model matrix once, then price every scenario with a cheap pair of substitutions. This case study walks through that pattern in a data pipeline, from a hand-worked small model to the design of the loop that scales.

Many systems, one shared matrix

A surprising number of quantitative-finance and data-science computations reduce to solving $A\mathbf{x} = \mathbf{b}$ for a fixed matrix $A$ and a stream of different $\mathbf{b}$'s. A few concrete instances:

Scenario analysis and stress testing. A linearized risk model relates factor shocks to portfolio profit-and-loss through a fixed sensitivity matrix; each scenario is a new shock vector $\mathbf{b}$, and the matrix $A$ (the model's structure) is held constant across the whole scenario set.
Ridge regression and cross-validation. Fitting a regularized linear model solves the normal equations $(X^{\mathsf{T}}X + \lambda I)\,\boldsymbol\beta = X^{\mathsf{T}}\mathbf{y}$. In a cross-validation loop or a multi-target fit, the matrix $A = X^{\mathsf{T}}X + \lambda I$ changes little or not at all while the right-hand side $X^{\mathsf{T}}\mathbf{y}$ varies across folds or response columns.
Calibration sweeps. Calibrating a model to market data often means solving the same linear subproblem repeatedly as an outer routine adjusts parameters, with the inner matrix fixed across many right-hand sides.

In every case the structure is identical: an expensive-to-build matrix $A$, queried with many right-hand sides. That is precisely the signature that says factor once, reuse many.

A small model, factored by hand

Let's make it concrete with a three-factor risk model whose (symmetric, positive-definite) system matrix is

$$A = \begin{bmatrix} 4 & 2 & 1 \\ 2 & 5 & 3 \\ 1 & 3 & 6 \end{bmatrix}.$$

Think of $A$ as encoding how three risk factors (say, an interest-rate factor, an equity factor, and a credit factor) interact in the model. We will solve $A\mathbf{x} = \mathbf{b}$ for several scenario vectors $\mathbf{b}$, so we factor $A = LU$ first — once — by the elimination of §10.3.

Column 1. Pivot $4$. Multipliers $\ell_{21} = 2/4 = 0.5$ and $\ell_{31} = 1/4 = 0.25$. Subtracting these multiples of row 1 from rows 2 and 3:

$$\begin{aligned} R_2 &\leftarrow R_2 - 0.5R_1 = (0,\,4,\,2.5)\\ R_3 &\leftarrow R_3 - 0.25R_1 = (0,\,2.5,\,5.75)\end{aligned} \;\Longrightarrow\; \begin{bmatrix} 4 & 2 & 1 \\ 0 & 4 & 2.5 \\ 0 & 2.5 & 5.75 \end{bmatrix}.$$

Column 2. Pivot $4$. Multiplier $\ell_{32} = 2.5/4 = 0.625$. Row 3 becomes $(0,2.5,5.75) - 0.625(0,4,2.5) = (0,0,4.1875)$:

$$U = \begin{bmatrix} 4 & 2 & 1 \\ 0 & 4 & 2.5 \\ 0 & 0 & 4.1875 \end{bmatrix}, \qquad L = \begin{bmatrix} 1 & 0 & 0 \\ 0.5 & 1 & 0 \\ 0.25 & 0.625 & 1 \end{bmatrix}.$$

The factorization is built. Every scenario from now on is two substitutions.

Pricing the first scenario by hand

Take a scenario shock $\mathbf{b}_1 = (7, 11, 13)$. Solve $A\mathbf{x} = \mathbf{b}_1$ in two stages.

Forward-substitute $L\mathbf{y} = \mathbf{b}_1$: $y_1 = 7$; then $0.5y_1 + y_2 = 11 \Rightarrow y_2 = 11 - 3.5 = 7.5$; then $0.25y_1 + 0.625y_2 + y_3 = 13 \Rightarrow y_3 = 13 - 1.75 - 4.6875 = 6.5625$. So $\mathbf{y} = (7,\,7.5,\,6.5625)$.

Back-substitute $U\mathbf{x} = \mathbf{y}$: from the bottom, $4.1875\,x_3 = 6.5625 \Rightarrow x_3 = 1.5672$; then $4x_2 + 2.5x_3 = 7.5 \Rightarrow x_2 = (7.5 - 3.918)/4 = 0.8955$; then $4x_1 + 2x_2 + x_3 = 7 \Rightarrow x_1 = (7 - 1.791 - 1.5672)/4 = 0.9104$. The scenario result is $\mathbf{x} \approx (0.9104,\,0.8955,\,1.5672)$.

# Factor the model matrix once; price many scenarios by reuse.
import numpy as np
from scipy.linalg import lu_factor, lu_solve
A = np.array([[4., 2., 1.],
              [2., 5., 3.],
              [1., 3., 6.]])
L = np.array([[1., 0., 0.], [0.5, 1., 0.], [0.25, 0.625, 1.]])
U = np.array([[4., 2., 1.], [0., 4., 2.5], [0., 0., 4.1875]])
print(np.allclose(L @ U, A))                 # True -> hand factorization exact
lu_piv = lu_factor(A)                         # FACTOR ONCE
scenarios = [np.array([7., 11., 13.]),        # scenario 1 (worked by hand)
             np.array([4., 2., 1.]),          # scenario 2
             np.array([10., 20., 30.])]       # scenario 3
for b in scenarios:
    x = lu_solve(lu_piv, b)                    # REUSE: two substitutions each
    print(np.round(x, 4), np.allclose(A @ x, b))
# [0.9104 0.8955 1.5672] True
# [1.     0.     0.    ] True
# [0.8955 1.0448 4.3284] True

The first line prints True: our hand factorization is exact. The first scenario reproduces the hand result $(0.9104, 0.8955, 1.5672)$ to the digit, and the next two scenarios are priced by reusing the same lu_piv — the factorization is computed once, before the loop, and each scenario inside the loop costs only the cheap triangular solves. Three scenarios here; a real stress test runs hundreds.

Designing the pipeline that scales

The lesson becomes a design principle the moment the scenario count grows. Here is the same idea at realistic scale, contrasting the right way with the trap.

# 500 scenarios against a fixed model matrix: factor-once vs re-solve-each-time.
import numpy as np
from scipy.linalg import lu_factor, lu_solve
rng = np.random.default_rng(7)
n = 300
M = rng.standard_normal((n, n))
A = M @ M.T + n * np.eye(n)               # fixed, well-conditioned model matrix
B = rng.standard_normal((n, 500))         # 500 scenario right-hand sides (columns)

# RIGHT: factor once, reuse for every scenario.
lu_piv = lu_factor(A)                      # one O(n^3) factorization
X_fast = np.column_stack([lu_solve(lu_piv, B[:, k]) for k in range(500)])

# TRAP: re-solve from scratch each scenario (re-factors A 500 times internally).
X_slow = np.column_stack([np.linalg.solve(A, B[:, k]) for k in range(500)])

print(np.allclose(X_fast, X_slow))         # True -> identical answers

Both blocks compute the same answers (np.allclose is True), but they do wildly different amounts of work. The "right" version factors $A$ exactly once and then does $500$ cheap solves. The "trap" version calls np.linalg.solve inside the loop, and because np.linalg.solve re-factors its matrix on every call (it does not cache), it pays the $O(n^3)$ factorization cost $500$ times over. For $n = 300$ that is hundreds of redundant cubic-cost factorizations — the kind of silent inefficiency that turns a five-minute report into an hour-long one. The fix is one line: hoist the factorization out of the loop.

It is worth pausing on a structural feature of the matrices in this story, because it points to an even better tool. Risk and regression matrices are almost always symmetric positive-definite — symmetric because a covariance or Gram matrix $X^{\mathsf{T}}X$ is symmetric by construction, and positive-definite because such matrices represent genuine (non-negative) variances and energies. Symmetric positive-definite matrices never need pivoting for stability and admit a specialized factorization, the Cholesky factorization $A = R^{\mathsf{T}}R$ (with $R$ upper-triangular), which does about half the arithmetic of a general LU because it exploits the symmetry — you only compute one triangular factor and its transpose, not two independent ones. In production quant code, the factor-once step is therefore usually scipy.linalg.cho_factor rather than lu_factor, and the reuse step is cho_solve. The pattern is identical to everything you practiced in this chapter — factor the shared matrix once, solve each right-hand side cheaply — but Cholesky is the faster specialist for the symmetric positive-definite matrices that dominate finance and statistics. We develop it properly in Chapter 28; for now, recognize that LU is the general workhorse and Cholesky is its leaner cousin for a common, important special case.

The same np.linalg.solve-in-a-loop pattern is one of the most common performance bugs in real data and finance code. It is not wrong — the answers are correct — it is wasteful, and the waste is invisible until the scenario count grows. Recognizing "fixed matrix, many right-hand sides" and reaching for lu_factor/lu_solve (or, for the symmetric positive-definite matrices common in finance, cho_factor/cho_solve) is a hallmark of someone who understands not just how to solve a system but how to solve many efficiently.

What you should take away

A risk report, a cross-validation sweep, a calibration loop — each is a fixed linear model queried with a stream of inputs, and each is the factor-once-reuse-many pattern wearing different clothes. The matrix is the expensive, reusable asset; its LU factorization is how you make it reusable; and every scenario is one cheap query. The three-factor model worked by hand and the five-hundred-scenario sweep in code are the same computation at two scales, and a production risk engine pricing a whole trading book is the same computation at a third.

This case study deliberately sits in finance and data science, far from the engineering setting of the chapter's other case study, to make the chapter's recurring point: the same factorization that speeds a circuit simulation speeds a risk pipeline. Learn the structure once — fixed matrix, many right-hand sides, factor and reuse — and you recognize it everywhere, which is exactly the book's theme that linear algebra is the most applied branch of mathematics. When you reach the normal equations of least squares in Chapter 17 and the symmetric positive-definite matrices of Chapter 28, you will see the financial matrices here in their natural mathematical home, and you will already know the efficient way to solve with them.

Forward references: the normal equations $X^{\mathsf{T}}X\boldsymbol\beta = X^{\mathsf{T}}\mathbf{y}$ and least squares (Chapter 17); symmetric positive-definite matrices and the Cholesky factorization, LU's cheaper specialized cousin (Chapter 28).