Further Reading: Types of Data and the Language of Statistics

Recommended Books

For Deepening Your Understanding of Data Types

Wheelan, C. (2013). Naked Statistics: Stripping the Dread from the Data. W. W. Norton. Chapter 2 ("Descriptive Statistics") provides an accessible treatment of variable types and how they determine which summaries make sense. Wheelan's conversational style complements what you've learned here, with different examples and angles. If you found the categorical-vs-numerical distinction interesting, his treatment of misleading averages is excellent.

Hand, D. J. (2020). Dark Data: Why What You Don't See Matters. Princeton University Press. A fascinating exploration of how the data we don't have — missing values, unmeasured variables, invisible categories — shapes our understanding of the world. Directly relevant to this chapter's theme about how data classification decisions determine what stories data can and cannot tell. Chapters 3-5 are most relevant.

Cairo, A. (2019). How Charts Lie: Getting Smarter About Visual Information. W. W. Norton. While focused on visualization (a Chapter 5 topic), Cairo's discussion of how the wrong chart for the wrong data type leads to misleading conclusions connects directly to the classification skills you learned in this chapter. His examples of how nominal and ordinal data are frequently misrepresented in media are eye-opening.

For Context on Data Classification and Society

D'Ignazio, C., & Klein, L. F. (2020). Data Feminism. MIT Press. A powerful analysis of how data classification systems — especially categories for race, gender, and identity — reflect and reinforce power structures. Chapters 4 ("What Gets Counted Counts") and 6 ("The Numbers Don't Speak for Themselves") connect directly to this chapter's discussion of how categories shape stories. Available as a free open-access book at data-feminism.mitpress.mit.edu.

Bowker, G. C., & Star, S. L. (2000). Sorting Things Out: Classification and Its Consequences. MIT Press. An academic but accessible exploration of how classification systems (from medical diagnoses to racial categories) are socially constructed and have real-world consequences. If the case study about racial categories in EHR systems interested you, this book provides the full historical and philosophical context.

For Reference

Agresti, A., & Franklin, C. (2024). Statistics: The Art and Science of Learning from Data (5th ed.). Pearson. Chapter 2 provides a more formal treatment of variable types, levels of measurement, and data collection design. Useful for additional examples and a more traditional presentation of the same material.

Triola, M. F. (2021). Elementary Statistics (14th ed.). Pearson. Section 1-2 covers data types with many worked examples. Good for additional practice classifying variables across different fields.

Online Resources

UCI Machine Learning Repository (archive.ics.uci.edu) A curated collection of over 600 datasets used in research. Each dataset includes a description and variable list — essentially a data dictionary. Practice classifying variables by browsing a few datasets. Look for ones tagged with "Classification" or "Multivariate" for the most interesting variable mixes.

Data.gov (data.gov) The U.S. government's open data portal with over 300,000 datasets. Each dataset includes metadata about its variables. An excellent place to practice reading real-world data dictionaries and classifying variables. Try the "Health" or "Education" categories for datasets with clear connections to this chapter.

Kaggle Datasets (kaggle.com/datasets) A community-driven platform with thousands of real-world datasets. Many include data dictionaries and discussion forums where analysts debate variable classification decisions. Search for "survey data" or "healthcare data" for datasets with interesting categorical/numerical mixes.

Seeing Theory — Chapter 1: Basic Probability (seeing-theory.brown.edu) While focused on probability (Chapter 8), the interactive visualizations of different data types and distributions provide excellent visual reinforcement of the concepts in this chapter. The "Compound Probability" section shows how categorical variables combine.

Articles and Papers

Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103(2684), 677-680. The original paper that introduced the four levels of measurement (nominal, ordinal, interval, ratio). Short and readable, even after 80 years. Understanding Stevens's framework directly connects to Section 2.6. Available through most university library databases.

Jamieson, S. (2004). Likert scales: How to (ab)use them. Medical Education, 38(12), 1217-1218. A concise, one-page article summarizing the debate over whether Likert scale data should be treated as ordinal or interval. Directly relevant to the gray area discussed in Section 2.3. Accessible and well-argued on both sides.

Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453. Referenced in Chapter 1, but even more relevant here. The core issue — using healthcare spending (a continuous numerical variable) as a proxy for health needs — is fundamentally a data classification and measurement problem. The mismatch between what the variable measures and what researchers assumed it measured led to systematic racial bias.

Podcasts

Not So Standard Deviations (nssdeviations.com) A podcast by data scientists Roger Peng and Hilary Parker discussing real-world data analysis. Episodes frequently touch on the messy reality of variable classification — from defining "active user" at a tech company to dealing with inconsistent categorical codes in medical data.

Data Skeptic (dataskeptic.com) Alternates between mini-episodes explaining statistical concepts and longer interviews with data practitioners. The mini-episodes on "Nominal, Ordinal, Interval, and Ratio" and "Data Types" are directly relevant to this chapter.

Note: All recommendations use the citation honesty system. Tier 1 (verified) sources have full citations. The descriptions represent genuine, well-known resources.