Further Reading: Your Data Toolkit — Python, Excel, and Jupyter Notebooks
Recommended Resources for Learning Python and pandas
Interactive Tutorials (Start Here)
Google Colab Welcome Notebook When you open Google Colab, explore the built-in "Welcome to Colaboratory" notebook under Help > "Welcome notebook." It walks through Colab-specific features like connecting to Google Drive, installing extra packages, and using keyboard shortcuts. Essential for getting comfortable in the environment you'll use all course.
Kaggle's "Intro to Python" and "Pandas" Micro-Courses (kaggle.com/learn) Free, browser-based, and designed for absolute beginners. The Python course covers variables, functions, and basic data types in about 5 hours. The Pandas course covers loading, selecting, and summarizing data in about 4 hours. Both use hands-on exercises with instant feedback. If you want more practice after this chapter, these are the best place to start.
W3Schools Python Tutorial (w3schools.com/python)
A reference-style tutorial that covers Python syntax topic by topic. Good for looking up specific concepts ("how do I use a for loop?" or "what does .strip() do?"). Less narrative than Kaggle but excellent for quick answers.
Video Courses
Corey Schafer's pandas Tutorial Series (YouTube) A beloved set of video tutorials covering pandas from the basics to advanced operations. Each video is 15-30 minutes and focused on a single topic. Clear explanations, real datasets, and a patient teaching style. Particularly useful: the videos on reading/writing data, filtering, and groupby.
freeCodeCamp's "Data Analysis with Python" Course (freecodecamp.org) A full, free video course (about 10 hours) that covers Python, NumPy, and pandas for data analysis. Includes certification. Good for students who want a structured path from "I've never coded" to "I can analyze real data."
StatQuest with Josh Starmer (YouTube) While primarily focused on statistics and machine learning concepts, Josh Starmer's videos are exceptional for understanding the why behind data analysis techniques. His pandas and Python videos complement the statistical concepts in this textbook perfectly. Highly recommended throughout the course.
Books
McKinney, W. (2022). Python for Data Analysis (3rd ed.). O'Reilly Media. Written by the creator of pandas himself. This is the definitive reference for pandas. The third edition covers pandas 2.0 and modern Python data analysis workflows. Dense but comprehensive — more of a reference book than a read-cover-to-cover tutorial. Keep it handy for when you need to look up specific pandas operations.
Sweigart, A. (2019). Automate the Boring Stuff with Python (2nd ed.). No Starch Press. Not specifically about data analysis, but the best book for learning Python as a complete beginner. The philosophy — "learn by automating real tasks" — matches our approach of learning statistics by analyzing real data. The full text is available free online at automatetheboringstuff.com.
VanderPlas, J. (2016). Python Data Science Handbook. O'Reilly Media. Covers NumPy, pandas, matplotlib, and scikit-learn in depth. More advanced than what you need for this chapter, but you'll reference it throughout the course, especially for visualization (Chapter 5) and machine learning concepts (Chapter 26). Available free online at jakevdp.github.io/PythonDataScienceHandbook.
Official Documentation
pandas Documentation (pandas.pydata.org/docs)
The official pandas documentation is surprisingly readable for library docs. The "Getting started" tutorials are well-written, and the API reference is essential when you need to know every option for a function like .read_csv() or .groupby(). Bookmark this — you'll visit it often.
Jupyter Documentation (jupyter.org/documentation) Covers both JupyterLab and classic Jupyter Notebook. Useful for learning keyboard shortcuts, understanding notebook format (.ipynb files), and troubleshooting installation issues.
Google Colab FAQ (research.google.com/colaboratory/faq.html) Answers common questions about Colab's features, limitations, and integration with Google Drive. Good to skim once so you know what's possible.
Spreadsheet Resources
Google Sheets Function List (support.google.com/docs/table/25273) The complete reference for every function available in Google Sheets. Searchable by category (math, statistical, text, date, etc.). Useful when you know what calculation you want but can't remember the exact function name.
Chandoo.org (chandoo.org) The best free resource for learning Excel and Google Sheets for data analysis. Covers pivot tables, VLOOKUP/INDEX-MATCH, conditional formatting, and data visualization. The tutorials assume no prior knowledge.
ExcelJet (exceljet.net) Clean, concise formula guides with examples. Their "Excel formulas" section is organized by task ("How to count cells with multiple conditions") rather than by function name, making it easy to find what you need.
Practice Datasets
These datasets are excellent for practicing the skills from this chapter. Each can be loaded with pd.read_csv():
Gapminder (gapminder.org/data) Global health and economic data by country and year. Clean, well-documented, and endlessly interesting. Variables include life expectancy, GDP per capita, population, and more. Great for the progressive project.
Palmer Penguins (available via pip install palmerpenguins or CSV download)
A modern alternative to the classic Iris dataset. Contains measurements (bill length, flipper length, body mass) for three penguin species on three Antarctic islands. Small enough to explore quickly, rich enough to demonstrate filtering and grouping.
Tidy Tuesday (github.com/rfordatascience/tidytuesday) A weekly data project that publishes a new, clean dataset every Tuesday. Though designed for R users, the CSV files work perfectly with pandas. Datasets range from coffee ratings to astronaut missions to video games. Excellent for building exploration skills on novel data.
FiveThirtyEight Data (data.fivethirtyeight.com) The datasets behind FiveThirtyEight's data journalism articles. Each comes with context (the article it was used in), making it easy to understand what the variables mean. Topics include politics, sports, economics, and culture.
UCI Machine Learning Repository (archive.ics.uci.edu) A classic collection of datasets used in research. Many are well-suited for introductory statistics, including the Adult Income dataset, the Wine Quality dataset, and the Heart Disease dataset. Each includes a description of variables and data collection methods.
What to Explore Next
As you progress through this textbook, here's a roadmap of what to learn next:
| When You Reach... | Learn About... | Resource |
|---|---|---|
| Chapter 5 (Graphs) | matplotlib and seaborn | VanderPlas Ch.4, Kaggle "Data Visualization" course |
| Chapter 7 (Data Wrangling) | Advanced pandas (merge, reshape, missing data) | McKinney Ch.7-8, pandas docs "Reshaping" |
| Chapter 8 (Probability) | scipy.stats basics | SciPy docs "Statistical Functions" |
| Chapter 11 (CLT) | NumPy random sampling | VanderPlas Ch.2, NumPy docs |
| Chapter 22 (Regression) | statsmodels library | statsmodels.org getting started |
You don't need to jump ahead — each chapter will introduce the tools you need when you need them. But if you're eager to explore, these resources will be waiting for you.