Chapter 3 Exercises: Python Fundamentals I — Variables, Data Types, and Expressions

Contributors to Introduction to Data Science

Chapter 3 Exercises: Python Fundamentals I — Variables, Data Types, and Expressions

How to use these exercises: Work through the sections in order. Parts A and B check your understanding of concepts and basic skills. Part C is all about debugging — finding and fixing broken code. Part D applies your skills to realistic data scenarios. Part E pushes you to combine ideas. Part M mixes in review questions from Chapters 1 and 2.

For every "predict the output" question: Write your prediction before running the code. The learning happens in the predicting, not the running.

Difficulty key: ⭐ Foundational | ⭐⭐ Intermediate | ⭐⭐⭐ Advanced | ⭐⭐⭐⭐ Extension

Part A: Conceptual Understanding ⭐

These questions check whether you've internalized the core ideas. Try to answer from memory before checking the chapter.

Exercise 3.1 — Variables as labels

The chapter introduced a threshold concept: variables are labels pointing to values, not boxes containing values. In your own words, explain the difference. Then explain what happens in memory when you execute these two lines:

a = 100
b = a

How many copies of the number 100 exist? How many labels point to it?

Guidance

One copy of `100` exists in memory. Two labels (`a` and `b`) point to it. If variables were boxes, `b = a` would create a copy of 100 and put it in a second box. But since variables are labels, `b = a` just sticks a second label on the same value. For simple types like integers, this distinction doesn't have practical consequences yet — but it becomes critical when you work with lists and dictionaries in [Chapter 5](../chapter-05-data-structures/index.md), where multiple labels pointing to the same object means changes through one label are visible through the other.

Exercise 3.2 — Type identification

Without running any code, state the type (int, float, str, or bool) of each value:

Value	Type
`42`
`42.0`
`"42"`
`True`
`0`
`""`
`3.14`
`"False"`

Answers

| Value | Type | |-------|------| | `42` | `int` | | `42.0` | `float` | | `"42"` | `str` | | `True` | `bool` | | `0` | `int` | | `""` | `str` (empty string, but still a string) | | `3.14` | `float` | | `"False"` | `str` (it's in quotes — a string that happens to spell a boolean keyword) | The tricky ones: `42.0` is a float (the decimal point makes it so, even though it's a whole number). `"False"` is a string, not a boolean — the quotes make it text. And `""` is a string (an empty one), not nothing.

Exercise 3.3 — Operator precedence

Write the result of each expression. Show your work by indicating which operation Python performs first.

2 + 3 * 4
(2 + 3) * 4
10 - 6 / 2
2 ** 3 + 1
15 // 4 + 15 % 4
10 > 5 and 3 + 2 == 5

Answers

1. `2 + (3 * 4)` = `2 + 12` = `14` — multiplication before addition 2. `(2 + 3) * 4` = `5 * 4` = `20` — parentheses first 3. `10 - (6 / 2)` = `10 - 3.0` = `7.0` — division before subtraction; note the result is a float because `/` returns float 4. `(2 ** 3) + 1` = `8 + 1` = `9` — exponentiation before addition 5. `(15 // 4) + (15 % 4)` = `3 + 3` = `6` — floor division and modulo have the same precedence as multiplication/division, evaluated left to right, but they're independent here 6. `(10 > 5) and ((3 + 2) == 5)` = `True and (5 == 5)` = `True and True` = `True` — arithmetic first, then comparisons, then `and`

Exercise 3.4 — Assignment vs. comparison

Explain the difference between = and == in Python. For each of the following, state whether it's assignment or comparison, and what the result is:

x = 10
x == 10
y = x
y == x

Guidance

1. **Assignment.** Gives the name `x` the value `10`. No output. 2. **Comparison.** Returns `True` if `x` equals `10`, `False` otherwise. (After line 1, this would return `True`.) 3. **Assignment.** Gives the name `y` the same value that `x` points to. 4. **Comparison.** Returns `True` if `y` and `x` have the same value. The key: `=` stores a value. `==` asks a question ("are these equal?").

Exercise 3.5 — Truthiness

Without running code, predict what bool() returns for each value:

bool(1)
bool(0)
bool(-1)
bool("")
bool(" ")
bool("0")
bool(0.0)
bool(None)

Answers

1. `True` — any nonzero number is truthy 2. `False` — zero is falsy 3. `True` — negative numbers are nonzero, therefore truthy 4. `False` — empty string is falsy 5. `True` — a space is a character, so the string isn't empty 6. `True` — the string "0" is not empty (it contains one character) 7. `False` — zero as a float is still falsy 8. `False` — `None` is falsy

Exercise 3.6 — Immutability of strings

What is wrong with the following code? What does the programmer probably intend, and how would you fix it?

name = "elena"
name.upper()
print(name)

Guidance

The programmer expects `name` to be `"ELENA"` after calling `.upper()`. But string methods return a *new* string — they don't modify the original. `name.upper()` creates `"ELENA"` and then it's immediately discarded because it isn't saved to any variable. The fix:

name = "elena"
name = name.upper()
print(name)    # ELENA

Or, if you want to keep the original: `upper_name = name.upper()`.

Part B: Applied Skills ⭐⭐

These exercises ask you to write code. Type every answer into a Jupyter notebook and run it.

Exercise 3.7 — Variable creation

Create variables to store the following information about a dataset. Use descriptive snake_case names. Then print a formatted summary using an f-string.

The dataset is called "Global Health Observatory"
It was last updated in 2023
It has 1,284 rows
It covers 195 countries
The average life expectancy across all countries is 73.4 years

Your f-string output should look something like:

Dataset: Global Health Observatory (updated 2023)
Rows: 1,284 | Countries: 195
Average life expectancy: 73.4 years

Guidance

dataset_name = "Global Health Observatory"
last_updated = 2023
row_count = 1284
country_count = 195
avg_life_expectancy = 73.4

print(f"Dataset: {dataset_name} (updated {last_updated})")
print(f"Rows: {row_count:,} | Countries: {country_count}")
print(f"Average life expectancy: {avg_life_expectancy} years")

Note the `:,` format specifier in `{row_count:,}` to add the comma separator.

Exercise 3.8 — Arithmetic with data

A basketball player attempted 82 three-point shots and made 31 of them. Write code to:

Store the attempts and makes in variables
Calculate the three-point shooting percentage (makes / attempts)
Print the result as a percentage with one decimal place using an f-string

Expected output: Three-point percentage: 37.8%

Guidance

three_pt_attempts = 82
three_pt_makes = 31

three_pt_pct = three_pt_makes / three_pt_attempts * 100
print(f"Three-point percentage: {three_pt_pct:.1f}%")

The `:.1f` format specifier means "one decimal place, float format."

Exercise 3.9 — String methods practice

Given the following messy data values (simulating what you might read from a file), clean each one using string methods:

country_raw = "   United States   "
temp_raw = "98.6 degrees"
code_raw = "us"

Remove the extra whitespace from country_raw
Extract just the number part from temp_raw (hint: use .split() and indexing)
Convert code_raw to uppercase

Guidance

country_raw = "   United States   "
temp_raw = "98.6 degrees"
code_raw = "us"

country_clean = country_raw.strip()
temp_number = temp_raw.split(" ")[0]   # Splits into ["98.6", "degrees"], takes first
code_upper = code_raw.upper()

print(f"Country: '{country_clean}'")
print(f"Temperature: {temp_number}")
print(f"Code: {code_upper}")

Output:

Country: 'United States'
Temperature: 98.6
Code: US

Note: `temp_number` is still a string (`"98.6"`). If you wanted to do math with it, you'd need `float(temp_number)`.

Exercise 3.10 — String slicing

A dataset uses patient IDs in the format "HOSP-YYYY-NNNNN" where HOSP is a hospital code, YYYY is the year, and NNNNN is a sequence number. Given:

patient_id = "MGH-2024-00142"

Use slicing to extract: 1. The hospital code ("MGH") 2. The year ("2024") 3. The sequence number ("00142") 4. Convert the year to an integer and add 1 to it

Guidance

patient_id = "MGH-2024-00142"

hospital = patient_id[:3]
year_str = patient_id[4:8]
sequence = patient_id[9:]

print(f"Hospital: {hospital}")
print(f"Year: {year_str}")
print(f"Sequence: {sequence}")

year_int = int(year_str) + 1
print(f"Next year: {year_int}")

Alternative approach using `.split("-")`:

parts = patient_id.split("-")
hospital = parts[0]    # "MGH"
year_str = parts[1]    # "2024"
sequence = parts[2]    # "00142"

Exercise 3.11 — Type conversion chain

Start with the string "3.14159" and perform the following conversions, printing the result and type at each step:

Convert to a float
Convert the float to an int
Convert the int back to a string
Convert the string to a bool

What value and type do you have at each step?

Guidance

step0 = "3.14159"
print(f"Step 0: {step0} ({type(step0).__name__})")

step1 = float(step0)
print(f"Step 1: {step1} ({type(step1).__name__})")

step2 = int(step1)
print(f"Step 2: {step2} ({type(step2).__name__})")

step3 = str(step2)
print(f"Step 3: {step3} ({type(step3).__name__})")

step4 = bool(step3)
print(f"Step 4: {step4} ({type(step4).__name__})")

Output:

Step 0: 3.14159 (str)
Step 1: 3.14159 (float)
Step 2: 3 (int)     ← truncated, not rounded!
Step 3: 3 (str)
Step 4: True (bool)  ← "3" is a non-empty string, so it's truthy

Key insights: `int()` truncates (3.14159 becomes 3, not 3). And `bool("3")` is `True` because any non-empty string is truthy — even `bool("0")` would be `True` and even `bool("False")` would be `True`!

Exercise 3.12 — Comparison expressions

Given the following variables, predict whether each comparison returns True or False. Then verify in Python.

a = 10
b = 3.0
c = "10"
d = True

a == 10
a == c
a == int(c)
b > 2 and b < 4
a != b
d == 1
type(a) == type(c)

Answers

1. `True` — `a` is 10, and 10 equals 10 2. `False` — `a` is an int, `c` is a string. `10 == "10"` is `False` in Python (no automatic type coercion) 3. `True` — `int("10")` is 10, and `10 == 10` is `True` 4. `True` — 3.0 is greater than 2 and less than 4 5. `True` — `10 != 3.0` is `True` (they're different numbers) 6. `True` — `True` is equal to `1` in Python (booleans are a subtype of integers: `True == 1`, `False == 0`) 7. `False` — `type(a)` is ``, `type(c)` is ``

Exercise 3.13 — f-string formatting

Write f-strings that produce the following outputs, given the variables below:

population = 8045311
growth_rate = 0.02847
city = "new york"
pi = 3.14159265358979

Target outputs: 1. Population: 8,045,311 2. Growth rate: 2.85% 3. City: New York 4. Pi to 4 decimals: 3.1416

Guidance

print(f"Population: {population:,}")
print(f"Growth rate: {growth_rate * 100:.2f}%")
print(f"City: {city.title()}")
print(f"Pi to 4 decimals: {pi:.4f}")

Notes: - `:,` adds comma separators - `:.2f` formats as float with 2 decimal places - `.title()` capitalizes the first letter of each word - `:.4f` formats with 4 decimal places (and rounds the last digit)

Exercise 3.14 — Augmented assignment

What is the value of x after each line executes? Track the value step by step.

x = 10
x += 5
x *= 2
x -= 7
x //= 4
x %= 3

Answer

x = 10     → x is 10
x += 5     → x is 15   (10 + 5)
x *= 2     → x is 30   (15 * 2)
x -= 7     → x is 23   (30 - 7)
x //= 4    → x is 5    (23 // 4 = 5, remainder discarded)
x %= 3     → x is 2    (5 % 3 = 2, the remainder)

Part C: Debugging ⭐⭐

Every exercise in this section contains buggy code. Find the error, identify the error type (NameError, TypeError, SyntaxError, or ValueError), and fix it.

Exercise 3.15 — Debug this

city_name = "Chicago"
print(City_name)

Answer

**Error:** `NameError: name 'City_name' is not defined` **Cause:** Python is case-sensitive. The variable was defined as `city_name` (lowercase c) but referenced as `City_name` (uppercase C). **Fix:** `print(city_name)`

Exercise 3.16 — Debug this

score = "95"
curved_score = score + 5
print(curved_score)

Answer

**Error:** `TypeError: can only concatenate str (not "int") to str` **Cause:** `score` is a string (`"95"`), not a number. You can't add an integer to a string. **Fix:** `curved_score = int(score) + 5`

Exercise 3.17 — Debug this

print("The temperature is 72 degrees)

Answer

**Error:** `SyntaxError: EOL while scanning string literal` **Cause:** Missing closing quotation mark before the closing parenthesis. **Fix:** `print("The temperature is 72 degrees")`

Exercise 3.18 — Debug this

vaccination rate = 0.73

Answer

**Error:** `SyntaxError: invalid syntax` **Cause:** Variable names cannot contain spaces. Python sees `vaccination` as a variable and then doesn't know what to do with `rate = 0.73`. **Fix:** `vaccination_rate = 0.73`

Exercise 3.19 — Debug this

total = 100
average = total / 0

Answer

**Error:** `ZeroDivisionError: division by zero` **Cause:** You can't divide by zero — not in Python, not in math. This often happens when a count variable that's supposed to be the denominator hasn't been properly populated. **Fix:** This depends on the context. You might add a check: `if denominator != 0: average = total / denominator`. Or you might need to trace back to figure out why the denominator is zero.

Exercise 3.20 — Debug this: multiple errors

This code has three separate errors. Find and fix all of them.

Patient_count = 450
vacc_rate = "0.82"
city = seattle

result = Patient_count * vacc_rate
print(f"In {city}, approximately {result} patients were vaccinated")

Answer

**Error 1 (line 3):** `NameError: name 'seattle' is not defined` — `seattle` needs quotes: `city = "Seattle"` **Error 2 (line 5):** `TypeError: can't multiply sequence by non-int of type 'str'` would occur if the NameError were fixed — `vacc_rate` is a string. Fix: `vacc_rate = 0.82` (remove the quotes) or convert: `float(vacc_rate)` **Error 3 (minor):** The variable naming is inconsistent — `Patient_count` uses different casing than the other variables. While not a Python error, convention says use `patient_count`. Fixed code:

patient_count = 450
vacc_rate = 0.82
city = "Seattle"

result = patient_count * vacc_rate
print(f"In {city}, approximately {result:.0f} patients were vaccinated")

Part D: Real-World Application ⭐⭐⭐

These exercises simulate tasks you'd encounter in actual data work.

Exercise 3.21 — BMI calculator

Body Mass Index (BMI) is calculated as weight in kilograms divided by height in meters squared. Write code to:

Store a weight of 70 kg and a height of 1.75 m in variables
Calculate the BMI
Print the result formatted to one decimal place
Create a boolean variable indicating whether the BMI is in the "normal" range (18.5 to 24.9)

Guidance

weight_kg = 70
height_m = 1.75

bmi = weight_kg / (height_m ** 2)
print(f"BMI: {bmi:.1f}")

is_normal = bmi >= 18.5 and bmi <= 24.9
print(f"Normal range: {is_normal}")

Output:

BMI: 22.9
Normal range: True

Exercise 3.22 — Temperature conversion

Write code that converts a temperature from Fahrenheit to Celsius using the formula: C = (F - 32) * 5/9. Use the temperature 98.6 F. Print the result with two decimal places. Then verify your answer by converting back to Fahrenheit: F = C * 9/5 + 32.

Guidance

temp_f = 98.6
temp_c = (temp_f - 32) * 5 / 9
print(f"{temp_f}°F = {temp_c:.2f}°C")

# Verify by converting back
verify_f = temp_c * 9 / 5 + 32
print(f"Verification: {verify_f:.2f}°F")

Output:

98.6°F = 37.00°C
Verification: 98.60°F

Exercise 3.23 — Data summary report

You have the following raw data about a survey. Write code that stores each value, performs calculations, and prints a formatted report.

Survey name: "Public Transit Satisfaction Survey"
Total respondents: 2,847
Satisfied: 1,891
Unsatisfied: 814
No response: 142
Survey start date: "2024-01-15"
Survey end date: "2024-02-28"

Your report should calculate and display: - The satisfaction rate as a percentage - The response rate (respondents who gave an answer / total) - The start year and month extracted from the date string

Guidance

survey_name = "Public Transit Satisfaction Survey"
total = 2847
satisfied = 1891
unsatisfied = 814
no_response = 142
start_date = "2024-01-15"
end_date = "2024-02-28"

responded = satisfied + unsatisfied
satisfaction_rate = satisfied / responded * 100
response_rate = responded / total * 100

start_year = start_date[:4]
start_month = start_date[5:7]

print(f"=== {survey_name} ===")
print(f"Total respondents: {total:,}")
print(f"Satisfaction rate: {satisfaction_rate:.1f}%")
print(f"Response rate: {response_rate:.1f}%")
print(f"Survey period: {start_year}, month {start_month}")

Exercise 3.24 — Course grade calculation

A student's grade is computed as: homework 30%, midterm 30%, final 40%. Given scores of homework=88, midterm=76, final=91, compute the weighted grade. Then determine whether the student passed (grade >= 60) and whether they earned honors (grade >= 90).

Guidance

homework = 88
midterm = 76
final = 91

grade = homework * 0.30 + midterm * 0.30 + final * 0.40
passed = grade >= 60
honors = grade >= 90

print(f"Weighted grade: {grade:.1f}")
print(f"Passed: {passed}")
print(f"Honors: {honors}")

Output:

Weighted grade: 85.6
Passed: True
Honors: False

Exercise 3.25 — Cleaning messy strings

Imagine you've read these values from a badly formatted spreadsheet. Use string methods to clean each one:

name = "  dr. elena RODRIGUEZ  "
email = "Elena.Rodriguez@Hospital.ORG"
phone = "555-867-5309"
department = "infectious diseases"

Transform them to produce: - Name as title case with no extra spaces: "Dr. Elena Rodriguez" - Email as all lowercase: "elena.rodriguez@hospital.org" - Phone with no dashes: "5558675309" - Department capitalized: "Infectious Diseases"

Guidance

name = "  dr. elena RODRIGUEZ  "
email = "Elena.Rodriguez@Hospital.ORG"
phone = "555-867-5309"
department = "infectious diseases"

name_clean = name.strip().title()
email_clean = email.lower()
phone_clean = phone.replace("-", "")
dept_clean = department.title()

print(f"Name: {name_clean}")
print(f"Email: {email_clean}")
print(f"Phone: {phone_clean}")
print(f"Department: {dept_clean}")

Part E: Synthesis and Extension ⭐⭐⭐⭐

These problems require combining multiple concepts.

Exercise 3.26 — Data type detective

Without using type(), write expressions that test whether a variable contains a specific type. For example, to check if x is an integer, you could use x == int(x) — but be careful, that doesn't always work.

For each of the following variables, write a boolean expression that evaluates to True:

a = 42
b = 42.0
c = "42"
d = True

Hint: use isinstance() — Python's built-in function for type checking. Look up how it works, or try isinstance(a, int).

Guidance

print(isinstance(a, int))    # True
print(isinstance(b, float))  # True
print(isinstance(c, str))    # True
print(isinstance(d, bool))   # True

# Interesting edge case:
print(isinstance(d, int))    # Also True! bool is a subclass of int

The fact that `isinstance(True, int)` returns `True` is a Python quirk — booleans are technically integers (`True == 1`, `False == 0`).

Exercise 3.27 — Building a data dictionary

A "data dictionary" is a description of every column in a dataset. Using only variables and f-strings (no lists or dictionaries yet — those come in Chapter 5), create a printed data dictionary for a small dataset with three columns. For each column, store and display:

Column name
Data type (as a descriptive string like "numeric" or "text")
Description
Example value

Format the output neatly. This is practice for the kind of documentation you'll write alongside every data science project.

Guidance

col1_name = "country"
col1_type = "text"
col1_desc = "Full country name"
col1_example = "Brazil"

col2_name = "year"
col2_type = "numeric (integer)"
col2_desc = "Year of observation"
col2_example = "2023"

col3_name = "vaccination_rate"
col3_type = "numeric (float)"
col3_desc = "Percentage of population vaccinated"
col3_example = "0.73"

print("=== DATA DICTIONARY ===")
print(f"\n{'Column':<20} {'Type':<20} {'Description':<35} {'Example'}")
print("-" * 85)
print(f"{col1_name:<20} {col1_type:<20} {col1_desc:<35} {col1_example}")
print(f"{col2_name:<20} {col2_type:<20} {col2_desc:<35} {col2_example}")
print(f"{col3_name:<20} {col3_type:<20} {col3_desc:<35} {col3_example}")

The `:<20` format specifier left-aligns text in a field 20 characters wide.

Exercise 3.28 — Floating-point exploration

Investigate floating-point precision by answering these questions with code:

What does 0.1 + 0.2 equal in Python? Is it exactly 0.3?
What does 0.1 + 0.2 == 0.3 return?
What does round(0.1 + 0.2, 1) == round(0.3, 1) return?
Try 0.1 + 0.1 + 0.1 - 0.3. Is the result exactly zero?
In one or two sentences, explain why this happens and whether it matters for data science.

Guidance

print(0.1 + 0.2)                              # 0.30000000000000004
print(0.1 + 0.2 == 0.3)                       # False
print(round(0.1 + 0.2, 1) == round(0.3, 1))   # True
print(0.1 + 0.1 + 0.1 - 0.3)                  # 5.551115123125783e-17

This happens because computers store floats in binary (base 2), and some decimal fractions (like 0.1) can't be represented exactly in binary — similar to how 1/3 can't be represented exactly in decimal. For data science, this rarely matters because real-world measurements already have far more uncertainty than one quadrillionth. But it's a gotcha when comparing floats with `==`.

Part M: Mixed Review (Chapters 1-2) ⭐

These questions revisit earlier chapters. If you struggle with any of them, revisit the relevant chapter section.

Exercise 3.29 — Data science lifecycle revisited (from Chapter 1)

For each of the Python operations below, identify which stage of the data science lifecycle it most closely corresponds to:

vaccination_rate = vaccinated / total_population
source_url = "https://data.who.int/"
print(f"Vaccination rates range from {min_rate} to {max_rate}")
country = country_raw.strip().lower()

Lifecycle stages: Ask, Acquire, Clean, Explore, Model, Communicate

Answers

1. **Explore** (or Model, depending on context) — computing a summary statistic from data 2. **Acquire** — recording the source of data 3. **Communicate** — presenting findings in a readable format 4. **Clean** — standardizing text data by removing whitespace and converting to consistent case

Exercise 3.30 — Jupyter workflow (from Chapter 2)

You're working in a Jupyter notebook and encounter this situation: you defined patient_count = 4521 in cell 3, used it in cell 7, and then accidentally deleted cell 3.

Does cell 7 still work if you run it right now?
What happens if you restart the kernel and try to run cell 7?
How would you prevent this kind of problem in the future?

Guidance

1. **Yes** — the variable is still in memory from when you previously ran cell 3. Python doesn't "know" that the cell was deleted. 2. **NameError** — after a kernel restart, all variables are cleared. Cell 3 no longer exists to recreate `patient_count`. 3. Best practices: (a) periodically do Kernel → Restart & Run All to make sure your notebook works top to bottom, (b) keep your cells in logical order so the notebook tells a linear story, (c) don't delete cells that define important variables — if you want to hide them, move them to an "initialization" cell at the top.