Exercises: Conditional Probability and Bayes' Theorem

Contributors

Exercises: Conditional Probability and Bayes' Theorem

These exercises progress from concept checks through applied Bayesian reasoning. Estimated completion time: 3 hours.

Difficulty Guide: - ⭐ Foundational (5-10 min each) - ⭐⭐ Intermediate (10-20 min each) - ⭐⭐⭐ Challenging (20-40 min each) - ⭐⭐⭐⭐ Advanced/Research (40+ min each)

Part A: Conceptual Understanding ⭐

A.1. In your own words, explain the difference between $P(\text{rain})$ and $P(\text{rain} \mid \text{dark clouds})$. Which one would you expect to be larger? Why?

A.2. True or false: $P(A \mid B) = P(B \mid A)$. If false, give a concrete example where they differ dramatically.

A.3. A student says, "The test is 95% accurate, so if I test positive, there's a 95% chance I have the disease." Explain what's wrong with this reasoning. What additional information would you need to determine the actual probability?

A.4. Explain the difference between sensitivity and specificity in your own words. A test has 99% sensitivity and 80% specificity. Which is the bigger concern: missing sick people or alarming healthy people? Which measure addresses each concern?

A.5. What is the base rate fallacy? Give an example from everyday life (not medical testing) where someone might commit this error.

A.6. Explain why Bayes' theorem can be thought of as a "probability update machine." What are the three inputs, and what is the output?

A.7. A friend says, "If the DNA evidence has a 1-in-a-billion chance of being a coincidence, then there's a 1-in-a-billion chance the defendant is innocent." Identify the logical error and explain why this reasoning is wrong.

Part B: Conditional Probability from Tables ⭐

Use the following contingency table for problems B.1-B.5. A survey of 600 college students asked about study habits and exam performance.

	Passed Exam	Failed Exam	Total
Studied 3+ hours	210	30	240
Studied < 3 hours	180	180	360
Total	390	210	600

B.1. Calculate $P(\text{passed})$, the unconditional probability of passing.

B.2. Calculate $P(\text{passed} \mid \text{studied 3+ hours})$. Compare it to your answer in B.1. What does the comparison tell you?

B.3. Calculate $P(\text{studied 3+ hours} \mid \text{passed})$. Is this the same as your answer in B.2? Explain why or why not.

B.4. Calculate $P(\text{failed} \mid \text{studied < 3 hours})$. What does this probability tell a student who's deciding how much to study?

B.5. Are studying 3+ hours and passing the exam independent events? Show your work using the formal definition of independence: $P(A \mid B) = P(A)$.

Part C: Conditional Probability Calculations ⭐⭐

C.1. At a tech company, 40% of employees work remotely. Among remote workers, 70% report high job satisfaction. Among on-site workers, 55% report high job satisfaction.

(a) What is $P(\text{high satisfaction} \mid \text{remote})$?

(b) What is $P(\text{high satisfaction})$ for the company overall? (Hint: use the law of total probability.)

(c) What is $P(\text{remote} \mid \text{high satisfaction})$? (Hint: use Bayes' theorem.)

C.2. In a city, 30% of drivers speed on a particular highway. A speed camera correctly identifies 90% of speeders and incorrectly flags 5% of non-speeders.

(a) Draw a tree diagram for this scenario.

(b) If a driver is flagged by the camera, what is the probability they were actually speeding?

(c) If a driver is NOT flagged, what is the probability they were actually speeding?

C.3. Alex's StreamVibe data shows: - 25% of users are Premium subscribers - Among Premium subscribers, 80% watch at least 5 hours per week - Among non-Premium subscribers, 30% watch at least 5 hours per week

(a) If a user watches 5+ hours per week, what's the probability they're a Premium subscriber?

(b) Verify your answer using the natural frequency approach with 1,000 users.

C.4. A factory produces widgets on two machines. Machine A produces 60% of all widgets and has a 3% defect rate. Machine B produces 40% and has a 5% defect rate.

(a) What is the overall defect rate?

(b) If a widget is defective, what is the probability it came from Machine B?

(c) Draw the tree diagram.

Part D: Bayes' Theorem Applications ⭐⭐

D.1. A drug test has the following characteristics: - Sensitivity: 97% (catches 97% of drug users) - Specificity: 95% (correctly clears 95% of non-users) - In the general workforce, approximately 5% of employees use the drug being tested for.

(a) If a randomly selected employee tests positive, what is the probability they actually use drugs?

(b) If the test is applied only to employees in safety-sensitive positions where drug use rates are estimated at 15%, how does the answer change?

(c) What does the comparison between (a) and (b) illustrate about the base rate?

D.2. A spam filter has been trained with the following data: - 35% of incoming emails are spam - The word "congratulations" appears in 40% of spam emails - The word "congratulations" appears in 2% of legitimate emails

(a) If an email contains "congratulations," what is the probability it's spam?

(b) After learning the email also contains "winner" (which appears in 30% of spam and 0.5% of legitimate email), update the probability. (Use the posterior from part (a) as your new prior.)

(c) How does this two-step updating illustrate how Naive Bayes classifiers work?

D.3. (Maya's HIV Testing Scenario) In a low-prevalence setting, HIV affects approximately 0.3% of the general population. A modern HIV screening test has 99.7% sensitivity and 99.4% specificity.

(a) Calculate $P(\text{HIV} \mid \text{positive test})$ for a random person from the general population.

(b) Repeat the calculation for a person from a high-risk population where prevalence is 5%.

(c) Why do public health agencies recommend confirmatory testing after a positive screening result?

D.4. Professor Washington examines a facial recognition system used by campus security. The system has 95% accuracy for correctly identifying a person of interest and a 0.1% false positive rate. On any given day, approximately 1 in 10,000 people on campus is a person of interest.

(a) If the system flags someone, what is the probability they are actually a person of interest?

(b) On a campus of 30,000 people, how many false alarms would the system generate per day?

(c) Washington argues this system shouldn't be deployed. Based on your calculations, explain his reasoning.

Part E: Tree Diagrams ⭐⭐

E.1. A university admissions office classifies applicants as "in-state" (60%) or "out-of-state" (40%). In-state applicants have a 40% acceptance rate. Out-of-state applicants have a 25% acceptance rate.

(a) Draw the complete tree diagram with all branch probabilities.

(b) Calculate the overall acceptance rate.

(c) If a student was admitted, what is the probability they are in-state?

(d) Verify your answer to (c) using natural frequencies with 1,000 applicants.

E.2. A three-stage manufacturing process produces a component. At each stage, the component either passes quality control or is rejected. - Stage 1: 90% pass - Stage 2 (given passed Stage 1): 95% pass - Stage 3 (given passed Stages 1 and 2): 98% pass

(a) Draw the tree diagram (you only need to show the "pass all stages" path fully, but indicate where each rejection branches off).

(b) What is the probability a component passes all three stages?

(c) If a component is rejected, at which stage is it most likely to have failed? (Hint: calculate the probability of failing at each stage, not just the stage failure rate.)

E.3. Weather data for a particular city shows: - 20% of days are rainy - On rainy days, there's a 70% chance of a traffic jam on the highway - On non-rainy days, there's a 15% chance of a traffic jam

(a) Draw the tree diagram.

(b) What is the overall probability of a traffic jam?

(c) If there's a traffic jam, what's the probability it's raining?

(d) If there's no traffic jam, what's the probability it's raining?

Part F: The Prosecutor's Fallacy ⭐⭐⭐

F.1. In a criminal trial, a forensic analyst testifies that a shoe print found at a crime scene matches the defendant's shoes. The analyst states that only 1 in 500 people own this particular shoe model.

(a) What probability is the analyst actually reporting? Write it in conditional probability notation.

(b) The prosecutor argues: "There's only a 1-in-500 chance this match is a coincidence, so there's a 99.8% chance the defendant is guilty." Identify the fallacy.

(c) In a city of 200,000 people, how many would own this shoe model? If the crime could have been committed by anyone in the city, what should the prior probability of guilt for any specific individual be?

(d) Using Bayes' theorem, calculate $P(\text{guilty} \mid \text{shoe match})$ with appropriate priors. How does this compare to the prosecutor's claim?

F.2. A lie detector test has 85% sensitivity (detects 85% of liars) and 75% specificity (correctly clears 75% of truth-tellers). In a screening of 500 employees about a workplace theft, assume 2% are actually guilty.

(a) Calculate the PPV. If someone "fails" the lie detector, what's the probability they're actually guilty?

(b) How many innocent employees will be falsely accused?

(c) Write a paragraph arguing for or against using lie detector results in this scenario. Use your calculations as evidence.

Part G: Integration and Application ⭐⭐⭐

G.1. An AI-powered email system learns from user behavior. It starts with a 30% prior that a particular email is important (rather than routine). The system observes three features:

Feature	P(feature \| important)	P(feature \| routine)
From a manager	0.60	0.10
Contains "deadline"	0.45	0.05
Sent before 8 AM	0.20	0.15

Apply Bayes' theorem sequentially for each feature (using each posterior as the next prior) to determine the final probability that the email is important after observing all three features. Show each update step.

G.2. A company is deciding between two quality inspection methods: - Method A: 99% sensitivity, 90% specificity, cost = $2 per item - Method B: 95% sensitivity, 99% specificity, cost = $5 per item - The defect rate is 0.5%

For each method, calculate: (a) PPV (probability a flagged item is truly defective)

(b) Number of false alarms per 10,000 items inspected

(c) Number of defective items missed per 10,000 items inspected

(d) Based on your analysis, which method would you recommend? Under what circumstances might you switch to the other?

G.3. (Python Exercise) Using pandas, create a simulated dataset of 5,000 patients with the following characteristics: - 8% have a particular condition - Among those with the condition, 92% test positive - Among those without the condition, 6% test positive

(a) Create a DataFrame with columns has_condition and test_result.

(b) Build a contingency table using pd.crosstab.

(c) Calculate $P(\text{condition} \mid \text{positive test})$ three ways: directly from the data, using pd.crosstab(normalize='index'), and using your bayes_theorem() function. Verify all three match.

(d) Create a bar chart comparing the PPV at different prevalence rates (1%, 5%, 8%, 15%, 30%).

Part H: Reflection and Synthesis ⭐⭐⭐⭐

H.1. (Research Exercise) The mammography debate is one of the most contested topics in preventive medicine. Research the following and write a one-page analysis:

(a) What are the current sensitivity and specificity of screening mammography?

(b) What is the prevalence of breast cancer in women aged 40-49 vs. 50-69?

(c) Using Bayes' theorem, calculate the PPV for each age group.

(d) How do your calculations inform the debate about when to start routine mammography screening?

(e) What role does the concept of "overdiagnosis" play, and how does it relate to the base rate fallacy?

H.2. (Writing Exercise) Write a 500-word essay explaining Bayes' theorem to someone who has never taken a statistics course. Use an everyday example (not medical testing or courtroom evidence — those are overused). Your essay should include: - A statement of the problem in natural language - The solution using natural frequencies - An explanation of why the answer is surprising - A connection to how this thinking applies in daily life

H.3. (Ethics Exercise) Professor Washington's predictive policing algorithm (Section 9.11) has a PPV of 46% — meaning 54% of people labeled "high risk" will not re-offend.

(a) If the base rate of re-offense differs by race (e.g., 15% for Group A vs. 25% for Group B due to differential policing and socioeconomic factors), calculate the PPV for each group assuming the same algorithm sensitivity (75%) and false positive rate (22%).

(b) What does this differential PPV imply about the fairness of applying the same algorithm to both groups?

(c) Is this a problem with the algorithm, the data, the policy, or something deeper? Write a one-paragraph argument.

Answer Key: Selected Problems

Click to reveal answers

**B.1.** $P(\text{passed}) = 390/600 = 0.650$ **B.2.** $P(\text{passed} \mid \text{studied 3+ hrs}) = 210/240 = 0.875$. This is higher than the unconditional 0.650, suggesting that studying 3+ hours is associated with a higher pass rate. **B.3.** $P(\text{studied 3+ hrs} \mid \text{passed}) = 210/390 = 0.538$. This is NOT the same as B.2 (0.875). B.2 asks "among students who studied a lot, what fraction passed?" while B.3 asks "among students who passed, what fraction studied a lot?" Different questions, different denominators, different answers. **B.4.** $P(\text{failed} \mid \text{studied < 3 hrs}) = 180/360 = 0.500$. Half of students who studied less than 3 hours failed — a coin flip. **B.5.** $P(\text{passed}) = 0.650$ and $P(\text{passed} \mid \text{studied 3+ hrs}) = 0.875$. Since $0.875 \neq 0.650$, the events are NOT independent. Studying and passing are dependent events — studying changes the probability of passing. **C.1.** (a) $P(\text{high satisfaction} \mid \text{remote}) = 0.70$ (b) $P(\text{high satisfaction}) = 0.70 \times 0.40 + 0.55 \times 0.60 = 0.28 + 0.33 = 0.61$ (c) $P(\text{remote} \mid \text{high satisfaction}) = \frac{0.70 \times 0.40}{0.61} = \frac{0.28}{0.61} \approx 0.459$ **C.2.** (b) $P(\text{speeding} \mid \text{flagged}) = \frac{0.90 \times 0.30}{0.90 \times 0.30 + 0.05 \times 0.70} = \frac{0.27}{0.27 + 0.035} = \frac{0.27}{0.305} \approx 0.885$ (c) $P(\text{speeding} \mid \text{not flagged}) = \frac{0.10 \times 0.30}{0.10 \times 0.30 + 0.95 \times 0.70} = \frac{0.03}{0.03 + 0.665} = \frac{0.03}{0.695} \approx 0.043$ **C.3.** (a) $P(\text{premium} \mid \text{5+ hrs}) = \frac{0.80 \times 0.25}{0.80 \times 0.25 + 0.30 \times 0.75} = \frac{0.20}{0.20 + 0.225} = \frac{0.20}{0.425} \approx 0.471$ (b) 1,000 users: 250 Premium (200 heavy, 50 light); 750 non-Premium (225 heavy, 525 light). Heavy watchers: 200 + 225 = 425. Premium among heavy: 200/425 = 0.471. ✓ **C.4.** (a) $P(\text{defect}) = 0.60 \times 0.03 + 0.40 \times 0.05 = 0.018 + 0.020 = 0.038$ (3.8%) (b) $P(\text{B} \mid \text{defective}) = \frac{0.05 \times 0.40}{0.038} = \frac{0.020}{0.038} \approx 0.526$ **D.1.** (a) $PPV = \frac{0.97 \times 0.05}{0.97 \times 0.05 + 0.05 \times 0.95} = \frac{0.0485}{0.0485 + 0.0475} = \frac{0.0485}{0.096} \approx 0.505$ (about 50.5%) (b) With 15% prevalence: $PPV = \frac{0.97 \times 0.15}{0.97 \times 0.15 + 0.05 \times 0.85} = \frac{0.1455}{0.1455 + 0.0425} = \frac{0.1455}{0.188} \approx 0.774$ (about 77.4%) (c) Tripling the base rate from 5% to 15% increased PPV from 50% to 77%. Higher base rates make positive tests more reliable. **D.2.** (a) $P(\text{spam} \mid \text{congrats}) = \frac{0.40 \times 0.35}{0.40 \times 0.35 + 0.02 \times 0.65} = \frac{0.14}{0.14 + 0.013} = \frac{0.14}{0.153} \approx 0.915$ (b) New prior = 0.915. $P(\text{spam} \mid \text{winner}) = \frac{0.30 \times 0.915}{0.30 \times 0.915 + 0.005 \times 0.085} = \frac{0.2745}{0.2745 + 0.000425} = \frac{0.2745}{0.274925} \approx 0.998$ **D.4.** (a) $PPV = \frac{0.95 \times 0.0001}{0.95 \times 0.0001 + 0.001 \times 0.9999} = \frac{0.000095}{0.000095 + 0.0009999} = \frac{0.000095}{0.001095} \approx 0.0868$ (about 8.7%) (b) $30{,}000 \times 0.9999 \times 0.001 \approx 30$ false alarms per day. (c) The system would flag about 30 innocent people every day but correctly identify only about 3 actual persons of interest. More than 90% of all flags would be false alarms, causing unnecessary confrontations and eroding trust. **E.1.** (b) $P(\text{admitted}) = 0.60 \times 0.40 + 0.40 \times 0.25 = 0.24 + 0.10 = 0.34$ (34%) (c) $P(\text{in-state} \mid \text{admitted}) = \frac{0.24}{0.34} \approx 0.706$ (about 70.6%) **E.3.** (b) $P(\text{jam}) = 0.20 \times 0.70 + 0.80 \times 0.15 = 0.14 + 0.12 = 0.26$ (c) $P(\text{rain} \mid \text{jam}) = \frac{0.14}{0.26} \approx 0.538$ (d) $P(\text{rain} \mid \text{no jam}) = \frac{0.20 \times 0.30}{0.74} = \frac{0.06}{0.74} \approx 0.081$ **F.1.** (a) $P(\text{shoe match} \mid \text{innocent})$ — the probability of owning this shoe type among the general population. (c) $200{,}000 / 500 = 400$ people own this shoe model. Prior probability: $P(\text{guilty}) = 1/200{,}000 = 0.000005$. (d) $P(\text{guilty} \mid \text{match}) = \frac{1.0 \times 0.000005}{1.0 \times 0.000005 + (1/500) \times 0.999995} = \frac{0.000005}{0.000005 + 0.002} \approx 0.0025$ (about 0.25%), not 99.8%. **F.2.** (a) $PPV = \frac{0.85 \times 0.02}{0.85 \times 0.02 + 0.25 \times 0.98} = \frac{0.017}{0.017 + 0.245} = \frac{0.017}{0.262} \approx 0.065$ (about 6.5%) (b) $500 \times 0.98 \times 0.25 = 122.5 \approx 123$ innocent employees falsely accused.