Exercises: Probability — The Foundation of Inference

Contributors

Exercises: Probability — The Foundation of Inference

These exercises progress from concept checks to applied probability calculations and real-world reasoning. Estimated completion time: 3 hours.

Difficulty Guide: - ⭐ Foundational (5-10 min each) - ⭐⭐ Intermediate (10-20 min each) - ⭐⭐⭐ Challenging (20-40 min each) - ⭐⭐⭐⭐ Advanced/Research (40+ min each)

Part A: Conceptual Understanding ⭐

A.1. In your own words, explain the difference between the classical, relative frequency, and subjective approaches to probability. For each approach, give one example where that approach is the most appropriate choice.

A.2. A coin is flipped 20 times and lands on heads 14 times (70%). Your friend says, "This coin is unfair — the probability of heads is 0.70." What would you tell your friend about the law of large numbers? How many flips would you recommend before drawing a firm conclusion?

A.3. Explain why the gambler's fallacy is wrong. A roulette wheel has landed on black 8 times in a row. Your friend insists that red is "due." Using the concept of independence, explain why each spin is still 50/50 (ignoring the green 0/00).

A.4. What does it mean to say "the probability of rain tomorrow is 0.40"? Give an interpretation using the relative frequency approach.

A.5. True or false: If P(A) = 0.6 and P(B) = 0.5, then events A and B must overlap (they cannot be mutually exclusive). Explain your reasoning.

A.6. A student says, "If two events are mutually exclusive, they must be independent, because they have nothing to do with each other." Is this correct? Explain why or why not, using a specific example.

A.7. List the sample space for each of the following random processes: - (a) Flipping two coins - (b) Rolling a die and recording whether the result is even or odd - (c) Selecting a card from a standard deck and recording only its suit - (d) A traffic light that can be red, yellow, or green

Part B: Complement Rule ⭐

B.1. A weather forecast says there is a 0.35 probability of snow tomorrow. What is the probability it does NOT snow?

B.2. In a class of 200 students, 42 are left-handed. If you select a student at random, what is the probability the student is NOT left-handed?

B.3. A quality control inspector finds that 3.2% of items on an assembly line are defective. What is the probability that a randomly selected item is NOT defective?

B.4. Sam knows that a basketball player makes 72% of her free throws. In two independent free throw attempts, what is the probability she makes at least one? (Hint: Use the complement — "at least one" is the opposite of "none.")

B.5. A cybersecurity system has a 0.995 probability of correctly identifying a legitimate login attempt. What is the probability it incorrectly flags a legitimate attempt as suspicious?

Part C: Addition Rule ⭐⭐

C.1. A bag contains 4 red marbles, 6 blue marbles, and 5 green marbles.

(a) Are the events "drawing a red marble" and "drawing a blue marble" mutually exclusive? Why or why not? (b) What is P(red or blue)? (c) What is P(red or green)? (d) What is P(not blue)?

C.2. In a survey of 600 college students: - 280 play a sport - 220 are in a music group - 65 do both

(a) What is P(sport or music)? (b) What is P(neither sport nor music)? (c) Draw a Venn diagram showing these numbers.

C.3. From a standard deck of 52 cards, you draw one card at random.

(a) What is P(ace or king)? Are these mutually exclusive? (b) What is P(ace or heart)? Are these mutually exclusive? (c) What is P(face card or red card)? (Face cards are J, Q, K.)

C.4. Alex finds that among 800 StreamVibe users surveyed: - 320 watched at least one action movie last month - 280 watched at least one romance movie - 110 watched both genres

(a) What is P(action or romance)? (b) What is P(neither action nor romance)? (c) If Alex selects a user at random, what's the probability they watched exactly one of these two genres (but not both)?

C.5. Determine whether each pair of events is mutually exclusive. Explain.

(a) A student getting an A on an exam / getting a B on the same exam (b) A person owning a cat / owning a dog (c) Rolling an even number on a die / rolling a number greater than 4 (d) Drawing a red card / drawing a face card from a standard deck

Part D: Multiplication Rule ⭐⭐

D.1. You flip a fair coin three times. Assuming the flips are independent:

(a) What is P(three heads in a row)? (b) What is P(three tails in a row)? (c) What is P(at least one head in three flips)?

D.2. A manufacturing process has two independent quality checkpoints. The first checkpoint catches 95% of defective items, and the second catches 90% of defective items that pass the first. What is the probability a defective item passes BOTH checkpoints?

D.3. Maya's disease screening test has a 0.98 probability of correctly identifying an infected person (sensitivity). If three infected people are tested independently, what is the probability:

(a) All three test positive? (b) At least one tests negative?

D.4. Professor Washington examines two independent predictive algorithms. Algorithm A correctly identifies high-risk individuals 78% of the time. Algorithm B correctly identifies them 82% of the time. If both algorithms evaluate the same individual independently:

(a) What is the probability both algorithms correctly identify a high-risk individual? (b) What is the probability neither algorithm correctly identifies them? (c) What is the probability at least one algorithm correctly identifies them?

D.5. A password requires 4 digits (0-9), and each digit is chosen randomly and independently.

(a) How many possible passwords are there? (b) What is the probability of guessing the correct password on a single random try? (c) What is the probability of NOT guessing correctly on any of 3 independent attempts?

Part E: Contingency Tables ⭐⭐

E.1. A university surveyed 1,000 students about their study habits and exam performance.

	Passed Exam	Failed Exam	Total
Studied > 10 hrs/week	320	30	350
Studied 5-10 hrs/week	280	70	350
Studied < 5 hrs/week	120	180	300
Total	720	280	1,000

Calculate: (a) P(passed the exam) (b) P(studied > 10 hrs/week) (c) P(passed AND studied > 10 hrs/week) (d) P(failed OR studied < 5 hrs/week) (e) P(did NOT study < 5 hrs/week)

E.2. Maya collected data on 400 patients tested for a respiratory virus.

	Positive Test	Negative Test	Total
Vaccinated	18	232	250
Not Vaccinated	52	98	150
Total	70	330	400

Calculate: (a) P(vaccinated) (b) P(positive test) (c) P(vaccinated AND positive test) (d) P(vaccinated OR positive test) (e) P(not vaccinated AND negative test) (f) Is being vaccinated independent of testing positive? To check: does P(vaccinated AND positive) = P(vaccinated) × P(positive)?

E.3. Create a contingency table from the following information about 500 job applicants at a tech company: - 300 applicants had a computer science degree; 200 did not - 180 applicants were hired; 320 were not hired - 140 applicants had a CS degree AND were hired

Using your table, calculate: (a) P(hired) (b) P(CS degree AND not hired) (c) P(hired OR CS degree) (d) P(not hired AND no CS degree)

Part F: Mixed Application ⭐⭐⭐

F.1. Sam is analyzing three basketball players' free throw percentages: - Player A: 85% free throw rate - Player B: 72% free throw rate - Player C: 68% free throw rate

Each player takes one free throw, and their shots are independent.

(a) What is the probability all three make their free throws? (b) What is the probability none of them make their free throws? (c) What is the probability at least one makes their free throw? (d) What is the probability exactly one makes their free throw? (Hint: List the three scenarios where exactly one makes it.)

F.2. A airline estimates the following probabilities for a typical flight: - P(flight delayed) = 0.22 - P(luggage lost) = 0.03 - P(flight delayed AND luggage lost) = 0.01

(a) Are "flight delayed" and "luggage lost" mutually exclusive? How do you know? (b) What is P(flight delayed OR luggage lost)? (c) Are these events independent? Check whether P(delayed AND lost) = P(delayed) × P(lost). (d) What is P(flight on time AND luggage NOT lost)?

F.3. A medical test for a rare disease has the following characteristics: - The disease affects 1% of the population (prevalence = 0.01) - Among people WITH the disease, the test is positive 95% of the time (sensitivity = 0.95) - Among people WITHOUT the disease, the test is positive 3% of the time (false positive rate = 0.03)

In a population of 10,000 people:

(a) How many people actually have the disease? How many don't? (b) Of those WITH the disease, how many test positive? (c) Of those WITHOUT the disease, how many test positive? (d) Create a two-way table (test result vs. disease status) from your answers. (e) Of all people who test positive, what fraction actually have the disease? (This question previews Chapter 9 — but try it with the table you've built!)

F.4. Alex is designing an A/B test at StreamVibe. He wants to test whether changing the thumbnail image increases click-through rates. Currently, the click-through probability is 0.08 (8%). He's testing the change on 3 users as a pilot.

Assuming the current probability holds and clicks are independent:

(a) What's the probability all 3 users click? (b) What's the probability none of the 3 users click? (c) What's the probability at least one user clicks? (d) Alex wonders: "If I run this test on 3 users and none click, does that prove the new thumbnail is worse?" Explain why 3 observations isn't enough to draw conclusions, using the law of large numbers.

Part G: Python and Simulation ⭐⭐⭐

G.1. Write a Python simulation to verify the addition rule. Simulate rolling a single die 100,000 times. Calculate: - P(rolling a 2 or a 5) — the simulated proportion - Compare to the theoretical value of 2/6

# Starter code
import numpy as np
np.random.seed(42)

n = 100000
rolls = np.random.randint(1, 7, size=n)

# Your code here: calculate the proportion of rolls that are 2 or 5

G.2. Write a Python simulation to verify that the probability of getting at least one 6 in four rolls of a die is approximately 1 − (5/6)⁴ ≈ 0.5177.

# Starter code
import numpy as np
np.random.seed(42)

n_experiments = 100000
# Each experiment: roll 4 dice, check if at least one is a 6
# Your code here

G.3. Simulate the birthday problem. For group sizes from 2 to 60, estimate P(at least one shared birthday) using 10,000 simulations per group size. Plot your results and compare to the analytical solution from Section 8.9.

# Starter code
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)

def simulate_birthday(group_size, n_simulations=10000):
    """Estimate P(shared birthday) for a given group size."""
    matches = 0
    for _ in range(n_simulations):
        birthdays = np.random.randint(1, 366, size=group_size)
        if len(set(birthdays)) < group_size:  # A repeat exists
            matches += 1
    return matches / n_simulations

# Your code here: loop over group sizes, plot results

Part H: Critical Thinking and Real-World Application ⭐⭐⭐⭐

H.1. The Prosecutor's Fallacy (Preview of Chapter 9)

In a criminal trial, a forensic expert testifies: "The probability that a randomly selected innocent person would match the DNA evidence is 1 in 1,000,000." The prosecutor argues: "Therefore, there is only a 1 in 1,000,000 chance the defendant is innocent."

(a) Explain why the prosecutor's argument is wrong. What probability did the expert actually state? (b) If a city has 5,000,000 residents, approximately how many innocent people would match the DNA evidence? (c) How does this change your interpretation of the evidence? (d) What additional information would you need to properly assess the probability of the defendant's guilt?

H.2. AI and Probability

A social media platform uses an algorithm to flag potentially harmful content. The algorithm's performance: - P(flagged | actually harmful) = 0.92 (it catches 92% of truly harmful content) - P(flagged | not harmful) = 0.05 (it incorrectly flags 5% of harmless content) - Only 0.1% of all posts are actually harmful

(a) In 1,000,000 posts, how many are actually harmful? How many are not? (b) How many harmful posts get flagged? How many harmless posts get flagged? (c) Create a contingency table from your answers. (d) Of all flagged posts, what percentage are actually harmful? What does this tell you about the reliability of the algorithm's flags? (e) Connect this to Theme 3 (AI uses probability): Why is it important for platform users to understand these probabilities? How might misunderstanding them lead to overconfidence in AI moderation?

H.3. Election Forecasting and Probability

A news organization publishes an election forecast: "Candidate A has a 72% chance of winning."

(a) Using the subjective approach to probability, explain what this number means. Does it mean Candidate A will get 72% of the vote? (b) What is the probability Candidate B wins? (c) If similar forecasts are made for 100 different elections, and each time the favored candidate is given a 72% chance, how many of those elections would you expect the favored candidate to lose? (d) Why did many people feel the 2016 U.S. presidential election result was "impossible" when most forecasts gave Clinton 65-85% odds? What does this reveal about public understanding of probability?

H.4. Ethical Probability

Professor Washington discovers that a criminal justice risk algorithm uses the following data to estimate the probability of re-offense:

Prior arrests
Age at first offense
Neighborhood zip code
Employment status

(a) Which of these variables might serve as proxies for race, potentially introducing bias? Explain. (b) If the algorithm assigns P(re-offense) = 0.35 to a defendant, what does this mean? Is it appropriate to make an individual sentencing decision based on a group-level probability? (c) Using the concept of relative frequency, explain what a "35% probability of re-offense" means in practice. Over what group of people was this estimate calculated? (d) This connects to the difference between group-level probabilities and individual predictions. Write a paragraph explaining why it would be problematic to tell an individual defendant, "You have a 35% chance of committing another crime."