Case Study: Excel vs. Python — When Each Tool Shines
The Scenario
Alex Rivera has been at StreamVibe for three months. She's settled into a rhythm: her team's weekly reports are built in Google Sheets, but the data science team down the hall works exclusively in Python. Both groups analyze the same data. Both produce insights. And both are a little suspicious of each other's tools.
"Why do you need code for something I can do with a pivot table?" asks Alex's manager during a team meeting.
"Why are you manually updating a spreadsheet every Monday when a script could do it automatically?" replies the lead data scientist.
They're both right. And they're both wrong. Let's see why.
Round 1: Quick Data Entry and Calculation
The task: Alex needs to enter survey results from 15 users who tested a new feature. She wants the average rating, the highest rating, and the lowest rating.
The Spreadsheet Way
Alex opens Google Sheets, types the 15 ratings into column A, and writes three formulas:
| Cell | Formula | Result |
|---|---|---|
| B1 | =AVERAGE(A1:A15) |
4.1 |
| B2 | =MAX(A1:A15) |
5 |
| B3 | =MIN(A1:A15) |
2 |
Time: 2 minutes. She can see all 15 values at once, spot-check them visually, and share the sheet with her manager via a link.
The Python Way
import pandas as pd
ratings = [4, 5, 3, 4, 5, 4, 3, 5, 2, 4, 5, 4, 3, 5, 4]
s = pd.Series(ratings)
print(f"Average: {s.mean():.1f}")
print(f"Max: {s.max()}")
print(f"Min: {s.min()}")
Time: 3-4 minutes (including opening the notebook and importing pandas).
Verdict: Spreadsheet wins.
For 15 numbers and three calculations, a spreadsheet is genuinely faster and more natural. Opening a Jupyter notebook, importing pandas, and typing a list of numbers takes more time than it's worth. Don't use a chainsaw to cut a sandwich.
Round 2: Weekly Report on 50,000 Users
The task: Every Monday, Alex receives a CSV file with the previous week's user activity: 50,000 rows, 12 columns. She needs to calculate average watch time by subscription tier, count how many users churned, and flag users whose activity dropped by more than 50% compared to the previous week.
The Spreadsheet Way
Alex opens the CSV in Google Sheets. It takes 30 seconds to load — Sheets struggles with 50,000 rows. She creates a pivot table for the subscription tier averages. That works, but it's slow to update. For the churn count, she adds a COUNTIF formula. For the 50%-drop flag, she needs a helper column comparing each user's current week to their previous week, which means adding a VLOOKUP or INDEX/MATCH formula and dragging it down 50,000 rows.
After 45 minutes of clicking, dragging, and waiting for the spreadsheet to recalculate, she has her report. Then her manager says, "Can you also break it down by age group?" Alex sighs and starts adding more formulas.
Next Monday, she does it all again from scratch.
The Python Way
import pandas as pd
# Load this week's data
current = pd.read_csv("weekly_activity_2024-01-15.csv")
# Average watch time by subscription tier
print(current.groupby('subscription')['watch_hours'].mean().round(1))
# Count churned users
churned = current[current['status'] == 'churned']
print(f"Churned: {len(churned)}")
# Flag users with >50% activity drop
current['drop_flag'] = current['pct_change'] < -50
high_drop = current[current['drop_flag']]
print(f"High-drop users: {len(high_drop)}")
# Manager's extra request — 1 additional line
print(current.groupby('age_group')['watch_hours'].mean().round(1))
Time: 10 minutes the first time. Time next Monday: 30 seconds — just change the filename and re-run. When the manager asks for the age-group breakdown, Alex adds one line and re-runs in 5 seconds.
Verdict: Python wins. Decisively.
At this scale, Python is faster, more maintainable, and vastly more adaptable to changing requirements. The killer advantage is reproducibility: the same script works every Monday with a new file. No clicking, no dragging, no rebuilding pivot tables.
Round 3: Sharing Results with Non-Technical Stakeholders
The task: Alex needs to share user engagement numbers with the VP of Marketing, who does not know Python and has no interest in learning.
The Spreadsheet Way
Alex creates a clean summary tab in Google Sheets with key numbers, a simple chart, and color-coded cells for trends. She shares the link. The VP opens it on her phone during a meeting, sees the numbers, and asks a follow-up question. Alex can answer by pointing to a cell.
The Python Way
Alex's analysis is in a Jupyter notebook. She could export it as a PDF or HTML file, but the VP would see code blocks alongside the results, which is confusing for someone who doesn't code. Alex could also copy the key numbers into a Slides presentation, but that adds an extra step.
Verdict: Spreadsheet wins for stakeholder communication.
When your audience doesn't code, a spreadsheet or slide deck is usually the better delivery format. Python is excellent for producing results, but for presenting them to non-technical audiences, you often need to translate the output into a more familiar format.
Pro tip: Many data analysts use Python to do the analysis and then copy key results into a spreadsheet or dashboard for sharing. The best approach is often a hybrid.
Round 4: Reproducing Last Quarter's Analysis
The task: The CEO saw a number in last quarter's report and wants to know exactly how it was calculated. Alex needs to reproduce her analysis from three months ago.
The Spreadsheet Way
Alex opens the old spreadsheet. She finds the cell with the number, but it references three other cells, each of which references more cells, some of which use VLOOKUP to other sheets. Tracing the full calculation chain takes 20 minutes. She discovers that one referenced sheet has been edited since the original report — someone updated a formula for a different purpose. The original calculation may no longer be exactly reproducible.
The Python Way
Alex opens her notebook from three months ago. Every step is there: the data loading, the filtering, the calculation, and the result. She re-runs it on the original data file (which she saved) and gets the exact same number. Total time: 2 minutes.
Verdict: Python wins. This is its superpower.
Code-based analysis creates a permanent, readable record of every decision you made. Three months later, three years later, a different analyst on a different continent can open your notebook, read your code, and reproduce your results exactly. This isn't just convenient — in regulated industries (healthcare, finance, government), it's often legally required.
Round 5: Statistical Testing
The task: Alex wants to test whether premium subscribers have significantly higher satisfaction scores than free users — not just in this sample, but statistically. She needs a proper hypothesis test.
The Spreadsheet Way
Google Sheets has basic functions like AVERAGE, STDEV, and TTEST. Excel adds the Analysis ToolPak with more tests. But the options are limited, the output is minimal, and customization is difficult. For anything beyond a basic t-test, you're stuck.
The Python Way
from scipy import stats
premium = stream[stream['subscription'] == 'premium']['satisfaction']
free = stream[stream['subscription'] == 'free']['satisfaction']
t_stat, p_value = stats.ttest_ind(premium, free)
print(f"t-statistic: {t_stat:.3f}")
print(f"p-value: {p_value:.4f}")
Python offers every statistical test covered in this course (and far beyond), with detailed output and full customization. You'll use scipy.stats starting in Chapter 8 and throughout Parts 4-6.
Verdict: Python wins for anything beyond basic calculations.
Spreadsheets can do simple statistics. Python can do all the statistics. For this course, you'll need Python.
The Scoreboard
| Round | Task | Winner |
|---|---|---|
| 1 | Quick small-data calculations | Spreadsheet |
| 2 | Large-scale weekly analysis | Python |
| 3 | Sharing with non-technical audience | Spreadsheet |
| 4 | Reproducing past analysis | Python |
| 5 | Statistical testing | Python |
The Real Lesson
The debate between spreadsheets and code isn't about which tool is "better." It's about which tool fits the task.
Professional data analysts almost always use both. Alex's ideal workflow might look like this:
- Receive data as a CSV file
- Analyze it in a Jupyter notebook (Python)
- Export key results to a Google Sheet or dashboard for stakeholders
- Archive the notebook for reproducibility
The sign of a sophisticated analyst isn't tool loyalty — it's tool fluency. Know what each tool does well, know what it does poorly, and pick the right one for the job.
That said, this course will lean heavily on Python, for good reason. The statistical methods you'll learn in Chapters 8-24 require capabilities that spreadsheets simply don't have. Learning Python now means you'll have the right tool ready when the tasks get more complex.
Your Turn
Think about a task you do regularly that involves data — it could be tracking expenses, managing a fantasy sports team, organizing survey results for a club, or analyzing grades.
- Do you currently use a spreadsheet, paper, or some other method?
- Would Python be a better fit for any part of that task? Why or why not?
- Is there a part of the task where a spreadsheet is genuinely the right tool?
Write a short paragraph making the case for how you'd split the work between tools. There's no wrong answer — the goal is to start thinking like someone who chooses tools strategically rather than by habit.