Quiz: Communicating with Data: Telling Stories with Numbers

Test your understanding of data visualization principles, misleading graph techniques, writing statistical results, report structure, reproducibility, and ethical data communication. Try to answer each question before revealing the answer.


1. Tufte's "data-ink ratio" measures:

(a) The amount of ink used in a chart (b) The proportion of ink that displays data versus total ink used (c) The number of data points per square inch (d) The contrast ratio between data and background

Answer **(b) The proportion of ink that displays data versus total ink used.** The data-ink ratio = data ink / total ink. Tufte's principle is to maximize this ratio by removing every element that doesn't encode data — gridlines, borders, background fills, decorative elements, and redundant labels. The goal is to make the data the visual focus, not the decoration.

2. Which of the following is an example of "chartjunk"?

(a) Axis labels with units (b) A 3D drop shadow on a 2D bar chart (c) A regression line on a scatterplot (d) Data labels on bars

Answer **(b) A 3D drop shadow on a 2D bar chart.** Chartjunk is any visual element that doesn't convey information. A 3D drop shadow adds nothing — it doesn't encode data, it doesn't aid comparison, and it can actually distort the perceived height of bars. Axis labels (a), regression lines (c), and data labels (d) all serve informational purposes and are not chartjunk.

3. A bar chart showing quarterly revenue uses a y-axis that starts at $90 million instead of $0. This technique:

(a) Is always acceptable because it saves space (b) Can make small differences appear dramatically larger (c) Is only misleading if the data is fabricated (d) Is recommended by Tufte for all bar charts

Answer **(b) Can make small differences appear dramatically larger.** This is the truncated axis technique. By starting at $90M instead of $0, a difference between $95M and $100M (approximately 5%) looks like the second bar is many times taller than the first. For bar charts, the y-axis should generally start at zero because bar length is the visual encoding — a bar twice as tall should represent a value twice as large. For line charts, some truncation is acceptable since you're comparing changes rather than absolute magnitudes.

4. Small multiples are effective because they:

(a) Allow the viewer to compare patterns across groups using a consistent visual format (b) Save space by making charts smaller (c) Are more colorful than single charts (d) Eliminate the need for axis labels

Answer **(a) Allow the viewer to compare patterns across groups using a consistent visual format.** Small multiples are a series of small charts with identical axes and design, each showing a different subset of the data (e.g., different regions, time periods, or categories). Because the visual format is consistent, the viewer's eye can quickly spot differences in patterns. This is far more effective than overlaying many groups on a single busy chart.

5. What is the main problem with dual y-axis charts?

(a) They use too much ink (b) They're too complicated for non-technical audiences (c) The creator can manipulate the apparent relationship between variables by adjusting the axis scales (d) They can only show two variables

Answer **(c) The creator can manipulate the apparent relationship between variables by adjusting the axis scales.** With dual y-axes, you can make any two lines appear to track each other perfectly — or appear completely unrelated — simply by adjusting the scale of one axis. This gives the chart creator enormous power to fabricate visual correlations that may not exist in the data. The better alternative is side-by-side charts (small multiples) with clearly labeled, independent axes.

6. Which chart type is generally better for comparing 10 categories?

(a) A pie chart (b) A sorted horizontal bar chart (c) A 3D bar chart (d) A dual-axis line chart

Answer **(b) A sorted horizontal bar chart.** Pie charts become unreadable with more than about 5 slices because human perception of angles and areas is poor — we're much better at comparing lengths. A sorted horizontal bar chart allows instant ranking and precise comparison. Research by Cleveland and McGill (1984) confirmed that humans judge position along a common scale (bar charts) more accurately than angle judgments (pie charts).

7. When writing statistical results for a non-technical audience, you should:

(a) Include all test statistics and degrees of freedom (b) Report exact p-values to four decimal places (c) Lead with the practical meaning and use plain language (d) Avoid mentioning uncertainty because it confuses people

Answer **(c) Lead with the practical meaning and use plain language.** Non-technical audiences want to know: What did you find? How big is the effect? What should we do? They don't need test statistics, degrees of freedom, or exact p-values — those belong in the technical version. But you should absolutely mention uncertainty — just in plain language ("we're fairly confident" rather than "95% CI: [3.2, 7.8]"). Honest communication includes honest uncertainty.

8. A colleague says: "The p-value was 0.03, so there's a 97% chance our intervention works." What's wrong with this statement?

(a) Nothing — this is the correct interpretation (b) The p-value is the probability of the data given the null hypothesis, not the probability that the intervention works (c) They should have said 96% instead of 97% (d) P-values can't be interpreted without the sample size

Answer **(b) The p-value is the probability of the data given the null hypothesis, not the probability that the intervention works.** This is perhaps the most common p-value misinterpretation (Chapter 13). The p-value is P(data as extreme | H₀ is true). A p-value of 0.03 means: "If the intervention truly had no effect, we'd see results this extreme only 3% of the time." It does NOT mean "there's a 97% chance the intervention works." Making this inferential leap requires Bayesian reasoning (Chapter 9) and a prior probability of effectiveness.

9. An effective executive summary should:

(a) Include all the statistical details so the executive can evaluate the methods (b) Be at least two pages long to be thorough (c) Answer: What did we study? What did we find? Why does it matter? What should we do? (d) Avoid making recommendations because that's not the analyst's role

Answer **(c) Answer: What did we study? What did we find? Why does it matter? What should we do?** The executive summary is designed for readers who may read only this section. It should be concise (half a page or less), state the key finding, explain its significance, and provide an actionable recommendation. Including full statistical details defeats the purpose — those belong in the Methods and Results sections. And yes, analysts should make recommendations — that's the "so what?" that executives need.

10. The five standard sections of a data analysis report, in order, are:

(a) Abstract, Methods, Results, References, Appendix (b) Introduction, Methods, Results, Discussion, Limitations (c) Summary, Data, Analysis, Charts, Conclusion (d) Background, Hypothesis, Experiment, Findings, Next Steps

Answer **(b) Introduction, Methods, Results, Discussion, Limitations.** This structure — sometimes called IMRaD (Introduction, Methods, Results, and Discussion) with a Limitations section — is the standard for scientific reports, journal articles, and professional data analysis. The Introduction explains the "why," Methods explain the "how," Results present the "what," Discussion interprets the "so what," and Limitations acknowledge the "but." Some formats merge Discussion and Limitations, or add an executive summary at the top.

11. Which of the following best communicates uncertainty in a visualization?

(a) Making the chart title include a question mark (b) Using a dotted line instead of a solid line (c) Adding error bars or confidence bands to the chart (d) Including a footnote that says "Results may vary"

Answer **(c) Adding error bars or confidence bands to the chart.** Error bars (for bar charts) and confidence bands (for regression lines) are the standard visual tools for showing uncertainty. They communicate the range of plausible values directly in the visualization. A question mark in the title (a) is vague, a dotted line (b) is just a style choice, and a footnote (d) is easily overlooked. Visual uncertainty should be part of the data display, not an afterthought.

12. When presenting regression results, which of the following should always be reported alongside the p-value?

(a) The sample mean (b) The effect size and confidence interval (c) The programming language used (d) The number of charts in the report

Answer **(b) The effect size and confidence interval.** The p-value tells you whether an effect is statistically significant, but it says nothing about how *big* the effect is or how *precisely* you've estimated it. The effect size (e.g., Cohen's d, $R^2$) tells you the magnitude, and the confidence interval gives you the range of plausible values. As Chapter 17 emphasized, a large sample can make any tiny effect "significant" — the effect size and CI are what tell you whether the finding matters in practice.

13. Which of the following is the best title for a data chart?

(a) "Figure 3" (b) "Sales Data" (c) "Monthly Sales Increased 12% After the March Campaign" (d) "An Analysis of Sales Performance Metrics Over Time"

Answer **(c) "Monthly Sales Increased 12% After the March Campaign."** A good chart title states the *finding*, not just the variables. "Figure 3" (a) is a label, not a title. "Sales Data" (b) describes what the chart shows but not what the viewer should learn. "An Analysis of..." (d) is jargon. Option (c) tells the viewer exactly what the chart demonstrates — they know the finding before they even look at the data. This is the annotation principle: guide the viewer's attention to the insight.

14. Reproducibility in data analysis means:

(a) Getting the same result every time you run the same code (b) Being able to replicate the analysis using the same data, code, and methods (c) Finding the same patterns in different datasets (d) Using the same software as other researchers

Answer **(b) Being able to replicate the analysis using the same data, code, and methods.** Reproducibility means that someone else (or your future self) can take your code, your data, and your documentation and arrive at the exact same results. This requires: saving raw data, documenting cleaning steps, writing all analysis in code (not manual steps), setting random seeds, and recording library versions. Option (a) is a consequence of reproducibility, not the definition. Options (c) and (d) describe different concepts (generalizability and software standardization).

15. For a colorblind-accessible chart, you should:

(a) Only use shades of gray (b) Use color plus another visual encoding (shapes, patterns, or labels) (c) Avoid all color and use only text (d) Use the default matplotlib color palette

Answer **(b) Use color plus another visual encoding (shapes, patterns, or labels).** The key principle is to never rely on color *alone* to distinguish categories. About 8% of men have some form of color vision deficiency. By combining color with patterns (hatching), shapes (different markers), or direct labels, you ensure the chart is readable regardless of color perception. Shades of gray (a) can work but are limiting. Avoiding all color (c) is unnecessarily restrictive. Default palettes (d) often include red-green combinations that are problematic for colorblind viewers.

16. "We found no evidence that the treatment works" means:

(a) The treatment definitely doesn't work (b) We proved the null hypothesis (c) Our data was insufficient to detect an effect, if one exists (d) The study was a failure

Answer **(c) Our data was insufficient to detect an effect, if one exists.** "No evidence of an effect" is very different from "evidence of no effect." When we fail to reject the null hypothesis, we haven't proved anything — we simply didn't find sufficient evidence to conclude there IS an effect. The effect might be real but too small for our sample to detect (a power issue from Chapter 17). This distinction is crucial for honest communication: saying "we found no evidence" is accurate; saying "we proved it doesn't work" is not.

17. When giving an oral presentation of statistical findings, you should:

(a) Read every p-value and test statistic aloud (b) Lead with your recommendation and key finding (c) Present methodology before results to build suspense (d) Include as much text on each slide as possible so the audience can follow along

Answer **(b) Lead with your recommendation and key finding.** In oral presentations, you compete for attention from the first second. Leading with the punchline — what you found and what you recommend — hooks the audience and gives them a framework for understanding everything that follows. Reading p-values aloud (a) is meaningless to most audiences. Building suspense with methodology (c) risks losing the audience before the payoff. Text-heavy slides (d) force the audience to choose between reading and listening — they can't do both effectively.

18. Which of the following Limitations section statements is most appropriate?

(a) "There are no limitations to this study." (b) "This study has some limitations, but they don't matter." (c) "Our sample was drawn from a single university, which may limit generalizability to other populations." (d) "The data had some problems, but we ignored them."

Answer **(c) "Our sample was drawn from a single university, which may limit generalizability to other populations."** A good limitation is specific, honest, and connected to how it affects the conclusions. Option (a) is never true — every study has limitations. Option (b) is dismissive and undermines credibility. Option (d) is alarmingly honest about a bad practice. Option (c) identifies a specific limitation (single institution), names the type of threat (limited generalizability), and leaves the reader appropriately cautious without invalidating the entire analysis.

19. A chart uses icons of different-sized money bags to represent company revenues. Company A ($10M) has a bag with height 1 inch. Company B ($20M) has a bag with height 2 inches. This chart is misleading because:

(a) Money bags are not professional (b) The visual area of Company B's bag is approximately 4 times larger, not 2 times (c) Company B's bag should be exactly twice as tall (d) Icons should never be used in charts

Answer **(b) The visual area of Company B's bag is approximately 4 times larger, not 2 times.** When you double the height of a 2D icon, you typically also double the width (to maintain proportions), which quadruples the visual area. The human visual system perceives the area of shapes, not their linear dimensions. So Company B *looks* four times larger than Company A, even though its revenue is only twice as large. This is area/volume distortion — one of the most common (and sneaky) misleading techniques. The fix: use bars (which scale linearly) or scale icons by area, not height.

20. The single most important principle of data communication is:

(a) Use the most advanced statistical methods available (b) Make the chart as visually impressive as possible (c) Serve the audience's understanding honestly (d) Include every piece of data you have

Answer **(c) Serve the audience's understanding honestly.** Every principle in this chapter — Tufte's data-ink ratio, avoiding misleading techniques, writing for different audiences, presenting uncertainty, accessibility, reproducibility — flows from this single idea. Your job as a data communicator is not to impress, not to persuade, and not to overwhelm. It's to help your audience understand what the data says, what it doesn't say, and what they should do about it — honestly and clearly. Advanced methods (a) are meaningless if unexplained. Visual impressiveness (b) is chartjunk in disguise. Including everything (d) obscures the signal with noise. Honest understanding is the goal.