Case Study 2: James's Policy Memo on Algorithmic Bias

Contributors

Case Study 2: James's Policy Memo on Algorithmic Bias

The Setup

Professor James Washington has spent two years analyzing the predictive policing algorithm used in Riverside County's criminal justice system. His findings are technically solid — and deeply uncomfortable.

The algorithm assigns risk scores from 1 (low risk) to 10 (high risk), intended to predict the likelihood of reoffending within two years. The county uses these scores for bail, pretrial detention, and sentencing recommendations. A score of 7 or higher is flagged as "high risk," which can mean the difference between going home and going to jail.

James's analysis shows: - Overall accuracy: The algorithm's risk scores predict actual recidivism with $R^2 = 0.85$ — by most standards, a good predictive model - Racial disparity: The model works much better for white defendants ($R^2 = 0.91$) than for Black defendants ($R^2 = 0.73$) - False positive rates: At a risk score of 7, the actual recidivism rate for white defendants is 51%, but only 38% for Black defendants — meaning Black defendants are systematically over-classified as high risk - Concrete impact: In the last two years, an estimated 340 Black defendants were detained pretrial based on risk scores that over-predicted their actual risk

James has been invited to present these findings to the County Criminal Justice Reform Commission — a body of judges, prosecutors, defense attorneys, community advocates, and county commissioners. They have the authority to continue, modify, or discontinue the algorithm.

This is the highest-stakes communication challenge in this textbook.

The Communication Dilemma

James faces a tension that no chart redesign can resolve: his findings are politically charged, emotionally loaded, and technically complex.

If he leads with "the algorithm is racially biased," he'll get the advocates' support but lose the prosecutors and judges who rely on the tool. If he leads with "the algorithm has $R^2 = 0.85$," he'll seem to be defending it. If he buries the racial disparity in technical jargon, he's being dishonest. If he presents it without context, he risks an overcorrection that eliminates a tool that does provide some useful information.

His communication strategy must be simultaneously: - Accurate (to the data) - Accessible (to a mixed-expertise audience) - Fair (to all parties affected) - Actionable (leading to concrete next steps)

James's Approach: The Policy Memo

James decides to write a formal policy memo — a format the commission members are familiar with — supplemented by a 15-minute oral presentation.

The Memo Structure

# ============================================================
# JAMES'S POLICY MEMO: HEADER AND EXECUTIVE SUMMARY
# ============================================================

print("=" * 70)
print("POLICY MEMO")
print("=" * 70)
print()
print("TO:      Riverside County Criminal Justice Reform Commission")
print("FROM:    Prof. James Washington, Department of Criminal Justice")
print("DATE:    March 15, 2026")
print("RE:      Evaluation of the Predictive Risk Assessment Algorithm")
print()
print("=" * 70)
print("EXECUTIVE SUMMARY")
print("=" * 70)
print()
print("The county's risk assessment algorithm predicts recidivism")
print("reasonably well overall, but it does not predict equally well")
print("for all defendants. Specifically:")
print()
print("  • The algorithm explains 91% of the variation in actual")
print("    recidivism for white defendants but only 73% for Black")
print("    defendants.")
print()
print("  • At a risk score of 7 (the 'high-risk' threshold), 51%")
print("    of white defendants actually reoffend, compared to only")
print("    38% of Black defendants. This means Black defendants are")
print("    more likely to be flagged as high-risk without actually")
print("    reoffending.")
print()
print("  • Over two years, an estimated 340 Black defendants were")
print("    detained pretrial based on risk scores that overestimated")
print("    their actual risk.")
print()
print("This memo presents three options for the Commission's")
print("consideration, each with documented tradeoffs.")
print()
print("=" * 70)

The Key Visualization

James knows that one chart will carry the weight of his argument. He needs it to show the disparity clearly without sensationalizing it — and without requiring the audience to understand regression.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

# ============================================================
# JAMES'S KEY CHART: RISK SCORE VS. ACTUAL RECIDIVISM BY RACE
# The chart the commission will remember
# ============================================================

# Data from Chapter 22 analysis
risk_scores = np.arange(1, 11)
white_actual = [6, 10, 16, 22, 29, 37, 51, 60, 68, 78]
black_actual = [10, 15, 20, 26, 32, 38, 38, 48, 56, 65]

fig, ax = plt.subplots(figsize=(10, 6.5))

# Plot data
ax.plot(risk_scores, white_actual, 'o-', color='#2980B9',
        linewidth=2.5, markersize=8, markerfacecolor='white',
        markeredgewidth=2, label='White defendants', zorder=3)
ax.plot(risk_scores, black_actual, 's-', color='#E74C3C',
        linewidth=2.5, markersize=8, markerfacecolor='white',
        markeredgewidth=2, label='Black defendants', zorder=3)

# Reference line: perfect calibration would mean score = actual rate
ax.plot(risk_scores, [r * 8 for r in risk_scores],
        '--', color='gray', linewidth=1, alpha=0.5,
        label='Perfect calibration')

# Highlight the threshold
ax.axvline(x=7, color='#F39C12', linewidth=2, linestyle=':',
           alpha=0.7, zorder=1)
ax.text(7.15, 75, 'High-risk\nthreshold', fontsize=10,
        color='#F39C12', fontweight='bold', va='top')

# Annotate the gap at score = 7
ax.annotate('',
            xy=(7, 51), xytext=(7, 38),
            arrowprops=dict(arrowstyle='<->', color='#8E44AD',
                           lw=2.5))
ax.text(7.4, 44.5, '13 percentage\npoint gap',
        fontsize=10, color='#8E44AD', fontweight='bold',
        va='center')

# Additional annotation explaining the gap
ax.annotate('At score = 7:\n51% of white defendants reoffend\n'
            '38% of Black defendants reoffend\n\n'
            'Same score, different meaning',
            xy=(7, 38),
            xytext=(2.5, 65),
            fontsize=9, color='#333333',
            bbox=dict(boxstyle='round,pad=0.5', facecolor='#FFF3E0',
                      edgecolor='#F39C12', alpha=0.9),
            arrowprops=dict(arrowstyle='->', color='#333333',
                           connectionstyle='arc3,rad=0.2'))

# Labels and title
ax.set_title('The Algorithm Over-Predicts Risk for Black Defendants\n'
             'Actual Recidivism Rates by Risk Score and Race',
             fontsize=13, color='#333333', pad=15)
ax.set_xlabel('Algorithm Risk Score', fontsize=12, color='#555555')
ax.set_ylabel('Actual Recidivism Rate (%)', fontsize=12,
              color='#555555')

# Formatting
ax.set_xlim(0.5, 10.5)
ax.set_ylim(0, 85)
ax.set_xticks(risk_scores)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_color('#CCCCCC')
ax.spines['bottom'].set_color('#CCCCCC')
ax.grid(axis='y', alpha=0.15)
ax.tick_params(colors='#555555')

# Legend
ax.legend(fontsize=10, loc='upper left', frameon=True,
          framealpha=0.9, edgecolor='#CCCCCC')

# Source
ax.text(0.99, -0.1,
        'Source: Riverside County Criminal Justice Database, 2024-2025\n'
        'Note: Recidivism defined as any new charge within 24 months',
        transform=ax.transAxes, fontsize=7.5, color='gray',
        ha='right', va='top')

plt.tight_layout()
plt.savefig('james_disparity_chart.png', dpi=300, bbox_inches='tight')
plt.show()

print("DESIGN DECISIONS:")
print("  1. Two lines, not a regression table — instant visual comparison")
print("  2. Different shapes (circles vs. squares) for accessibility")
print("  3. Gap annotated with a double-arrow and magnitude")
print("  4. Threshold line at score = 7 — where decisions are made")
print("  5. Title states the finding: 'Over-Predicts Risk for Black")
print("     Defendants' — honest and specific")
print("  6. Source and recidivism definition included — transparency")

The Three Options

James presents the commission with three concrete options, each with quantified tradeoffs. This is critical — he's not just diagnosing a problem; he's providing a framework for solving it.

# ============================================================
# JAMES'S OPTIONS FRAMEWORK
# ============================================================

print("=" * 70)
print("OPTIONS FOR THE COMMISSION'S CONSIDERATION")
print("=" * 70)
print()

print("OPTION A: Recalibrate the Algorithm")
print("-" * 70)
print("  What: Adjust the algorithm to use race-specific thresholds")
print("        (e.g., score 7 for white defendants = score 8 for Black")
print("        defendants)")
print()
print("  Pro:  Equalizes false positive rates across racial groups")
print("  Con:  Explicitly uses race in the scoring system, which may")
print("        raise legal and ethical objections")
print("  Con:  May increase overall false negative rate (releasing")
print("        higher-risk individuals)")
print()
print("  Estimated impact: Reduces racial disparity in detention by")
print("  approximately 60%. Would have prevented ~200 of the 340")
print("  over-detentions over two years.")
print()

print("OPTION B: Human Review for Threshold Cases")
print("-" * 70)
print("  What: Keep the algorithm but require judicial review for")
print("        all defendants scoring 6, 7, or 8 (the 'gray zone')")
print()
print("  Pro:  Preserves algorithmic efficiency for clear-cut cases")
print("  Pro:  Human judgment can incorporate factors the algorithm")
print("        cannot (context, individual circumstances)")
print("  Con:  Increases workload for judges")
print("  Con:  Human judgment introduces its own biases (which is")
print("        why algorithms were adopted in the first place)")
print()
print("  Estimated impact: Reduces racial disparity by 30-45%.")
print("  Requires ~1,200 additional judicial reviews per year.")
print()

print("OPTION C: Discontinue Algorithmic Scoring")
print("-" * 70)
print("  What: Return to fully judge-based pretrial decisions")
print()
print("  Pro:  Eliminates systematic algorithmic bias")
print("  Con:  Returns to a system where judicial discretion had")
print("        its own well-documented racial disparities")
print("  Con:  Increases decision time and inconsistency")
print("  Con:  Loses the predictive accuracy the algorithm provides")
print("        (R² = 0.85 overall)")
print()
print("  Estimated impact: Unknown. Prior research suggests judicial")
print("  discretion produces comparable or greater racial disparities")
print("  in many jurisdictions.")
print()

print("=" * 70)
print("RECOMMENDATION")
print("=" * 70)
print()
print("The data supports Option B as a starting point: maintain the")
print("algorithm for clear cases but add human review for threshold")
print("scores (6-8). This preserves the algorithm's benefits while")
print("adding a safety net for the cases most affected by the")
print("calibration gap.")
print()
print("Additionally, the Commission should mandate annual disparity")
print("audits — including the analysis presented here — to monitor")
print("whether the gap narrows, widens, or shifts to new patterns.")
print()
print("The choice between Options A, B, and C is ultimately a VALUES")
print("decision, not a statistical one. The data tells us the gap")
print("exists and how large it is. It cannot tell us which tradeoff")
print("the community finds acceptable.")

The Communication Principles at Work

1. Separating Facts from Values

James is meticulous about separating what the data shows (facts) from what the community should do about it (values):

Statement	Type
"The algorithm over-predicts risk for Black defendants"	Fact (directly supported by data)
"A 13-percentage-point gap at the threshold score is unacceptable"	Value judgment (reasonable, but not a statistical conclusion)
"340 defendants were over-detained"	Fact (directly computed from data)
"Option B is the best path forward"	Recommendation (informed by facts, shaped by values)

2. Choosing the Right Level of Detail

James presents different amounts of detail to different parts of his audience:

Audience Segment	What They See	Level of Detail
Community advocates	Executive summary + key chart	Finding and human impact
Judges and prosecutors	Options framework + estimated impacts	Tradeoffs and implementation
The one statistician on the commission	Full regression output in appendix	Technical details and methods
Media (if the report becomes public)	Executive summary only	The headline finding

3. Honest Uncertainty

James includes uncertainty throughout:

# ============================================================
# JAMES'S UNCERTAINTY COMMUNICATION
# ============================================================

print("LIMITATIONS AND CAVEATS")
print("=" * 70)
print()
print("1. OBSERVATIONAL DATA: This analysis establishes that a")
print("   disparity exists in the algorithm's performance. It does")
print("   not establish WHY the disparity exists. Possible causes")
print("   include biased training data, different base rates of")
print("   arrest (vs. actual criminal behavior) across racial groups,")
print("   or unmeasured socioeconomic factors.")
print()
print("2. SAMPLE SIZE: The analysis covers 2,450 defendants over")
print("   two years. Subgroup analyses (by race and risk score) have")
print("   smaller samples, and the estimates at extreme scores")
print("   (1-2 and 9-10) are less precise.")
print()
print("3. DEFINITION OF RECIDIVISM: 'Any new charge within 24 months'")
print("   includes minor offenses and charges that were later dropped.")
print("   A narrower definition (e.g., convictions only, or violent")
print("   offenses only) might produce different results.")
print()
print("4. THE COUNTERFACTUAL PROBLEM: We cannot observe what would")
print("   have happened to detained defendants if they had been")
print("   released. The algorithm's predictions may be partially")
print("   self-fulfilling — detention itself affects the likelihood")
print("   of future charges.")
print()
print("5. ESTIMATED IMPACTS: The estimates for Options A, B, and C")
print("   are projections based on the current data, not guarantees.")
print("   Actual impacts would depend on implementation details.")

4. The Thermometer Analogy

In his oral presentation, James uses an analogy that makes the statistical concept immediately intuitive:

"Imagine a thermometer that reads accurately for some patients but reads three degrees too high for others. The average reading across all patients might be correct — but individual patients would receive different treatments based on the same reading. That's what this algorithm does: a risk score of 7 means different things depending on who the defendant is."

This analogy: - Requires no statistical knowledge - Maps precisely to the actual finding (systematic over-prediction for one group) - Creates empathy (everyone has used a thermometer) - Avoids loaded language while still making the disparity clear

The Broader Lesson

James's case study illustrates the hardest truth in data communication: the most technically excellent analysis in the world is only as good as its communication. His regression models are rigorous. His effect sizes are precisely estimated. His limitations are honestly stated.

But what matters is whether the commission makes a good decision. That depends not on the $R^2$ value but on whether James can help twelve non-statisticians understand what the data says, what it doesn't say, and what they should do about it.

What James Got Right

Communication Principle	How James Applied It
State the finding, not the method	"The algorithm over-predicts risk for Black defendants" — not "We ran a disaggregated regression"
Quantify the human impact	"340 defendants over-detained" — not "the false positive rate differential was 13 percentage points"
Provide options, not ultimatums	Three options with explicit tradeoffs — the decision is the commission's
Separate facts from values	"The data tells us the gap exists. It cannot tell us which tradeoff the community finds acceptable."
Use analogy	The thermometer — intuitive, precise, non-inflammatory
Layer the detail	Executive summary → key chart → options → appendix
Show uncertainty	Five specific limitations, each explaining how it affects the conclusions

Discussion Questions

James recommends Option B (human review for threshold cases). Could a different analyst, using the same data, reasonably recommend Option A or Option C? What does this tell us about the relationship between data and policy?
The commission includes community advocates who have been fighting against the algorithm for years. How might they react to James's recommendation to keep the algorithm (with modifications)? Is James's framing fair to their perspective?
James uses the phrase "over-predicts risk" rather than "is racially biased." Is this the right word choice? Is it more honest, or is it euphemistic?
If the media reports James's findings with the headline "County Algorithm Discriminates Against Black Defendants," is that an accurate summary of his findings? How would James respond?
James includes a Limitations section that acknowledges the "counterfactual problem" — we can't know what would have happened if detained defendants had been released. Some advocates argue that including this limitation "gives cover" to people who want to keep the algorithm unchanged. Is this a valid concern? How should James balance scientific honesty with the practical impact of his communication?
Imagine you are James preparing for Q&A after the presentation. What is the hardest question you might receive, and how would you answer it?