Key Takeaways: Communicating with Data: Telling Stories with Numbers

One-Sentence Summary

Effective data communication requires honest visualizations that follow Tufte's principles (maximize data-ink, minimize chartjunk), audience-appropriate writing that includes both statistical significance and effect sizes, structured reports with Introduction-Methods-Results-Discussion-Limitations, and reproducible analysis practices — because the most rigorous analysis in the world is worthless if your audience doesn't understand it.

Core Concepts at a Glance

Concept Definition Why It Matters
Data-ink ratio Proportion of ink used to display data vs. total ink Guides you to remove everything that doesn't serve the data
Chartjunk Visual elements that don't convey information (3D effects, gradients, decorations) Reduces clarity and distracts from the data
Small multiples Series of similarly designed charts for comparison across groups Leverages human pattern-detection across consistently formatted panels
Data storytelling Translating statistical findings into narrative with context, implications, and recommendations Bridges the gap between analysis and action
Reproducible analysis Analysis that someone else can reconstruct and arrive at the same results Ensures scientific integrity and professional credibility

Tufte's Principles

Principle Application
Maximize data-ink ratio Remove unnecessary gridlines, borders, backgrounds, and decorations
Eliminate chartjunk No 3D effects, gradient fills, decorative icons, or drop shadows
Use small multiples Compare groups with side-by-side panels sharing the same axes
Show the data Display individual observations, not just summaries
Encourage comparison Use shared axes and consistent design
Serve a clear purpose Every chart should answer a specific question
Integrate text and data Annotate key features; titles should state findings

Misleading Techniques Checklist

Technique Problem Fix
Truncated axis Small differences look enormous Start bar chart y-axis at 0; label breaks
Cherry-picked time window Controls narrative through selective framing Show longest reasonable time frame; justify window
Dual y-axes Any two variables can be made to look correlated Use small multiples instead
3D effects Distorts proportions through perspective Use flat 2D charts
Too many pie slices Comparison becomes impossible Switch to sorted bar chart
Area/volume distortion Non-linear scaling exaggerates differences Scale by area, not diameter; prefer bars

Writing Statistical Results

Template Sentences

Confidence Interval:

Technical: "The 95% CI for mean [variable] was ([lower], [upper])."

Plain: "We estimate the average [variable] is between [lower] and [upper]."

t-Test:

Technical: "t([df]) = [value], p = [value], d = [value], 95% CI: ([lower], [upper])."

Plain: "[Group 1] scored [higher/lower] by about [difference]. This is [unlikely] to be chance, and the effect is [small/medium/large]."

Regression:

Technical: "b = [slope], p = [value], $R^2$ = [value]."

Plain: "For every additional [unit of x], [y] tends to [change] by about [slope]. The model explains [R² × 100]% of the variation."

The "So What?" Checklist

Every result needs: 1. The finding — what happened 2. The magnitude — how big (effect size) 3. The uncertainty — how confident (CI) 4. The implication — so what? (recommendation)

Report Structure

The Five Sections

Section Purpose Key Content
Introduction The "Why" Research question, context, hypothesis
Methods The "How" Data source, sample size, analysis methods, cleaning decisions
Results The "What" Findings with test statistics, effect sizes, CIs, and visualizations
Discussion The "So What" Interpretation, practical significance, alternative explanations
Limitations The "But" Sampling, measurement, confounding, generalizability

Executive Summary Template

  1. What did we study? (One sentence)
  2. What did we find? (One or two sentences)
  3. Why does it matter? (One sentence)
  4. What should we do? (One sentence)

Presenting Uncertainty Honestly

Tool When to Use
Error bars Bar charts comparing group means
Confidence bands Regression lines and trend lines
Hedging language Text descriptions of findings
Exact p-values Reports (not just "p < .05")
Confidence intervals Always, alongside point estimates
Effect sizes Always, alongside p-values

Hedging Language Guide

Evidence Strength Language
Very strong (p < .001, large effect) "The data clearly shows..."
Good (p < .05, medium effect) "The data suggests..."
Suggestive (p = .05–.10) "There are hints, but further data is needed..."
No evidence (p > .10) "We found no evidence that..." (NOT "We proved no effect")

Accessibility Principles

Principle Implementation
Don't rely on color alone Use shapes, patterns, AND colors
Use colorblind-friendly palettes Viridis, cividis, or Wong's palette
Test your charts View in grayscale
Use direct labels Label data series on the chart, not just in legends

Reproducibility Checklist

Element What to Do
Raw data Save the original, unmodified dataset
Cleaning log Document every step (deletions, transformations, imputations)
Code Write all analysis in scripts or notebooks — no manual editing
Random seeds Set np.random.seed() for any simulation
Library versions Record version numbers of all packages
Comments Explain why you made each decision

Key Python Code

Professional Chart Template

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Set global style
sns.set_style("whitegrid")
plt.rcParams.update({
    'font.size': 11,
    'axes.titlesize': 14,
    'figure.dpi': 150,
    'savefig.dpi': 300
})

fig, ax = plt.subplots(figsize=(8, 5))

# Plot data
ax.bar(categories, values, color='steelblue', edgecolor='none')

# Clean styling
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_color('#CCCCCC')
ax.spines['bottom'].set_color('#CCCCCC')
ax.grid(axis='y', alpha=0.3)

# Descriptive title (states the finding)
ax.set_title('Monthly Sales Increased 12% After Campaign',
             color='#333333')
ax.set_ylabel('Sales ($K)', color='#555555')

# Data labels
for bar, val in zip(ax.patches, values):
    ax.text(bar.get_x() + bar.get_width() / 2,
            bar.get_height() + 1, f'${val}K',
            ha='center', fontsize=10)

plt.tight_layout()
plt.savefig('chart.png', dpi=300, bbox_inches='tight')

Annotation Example

# Add annotation to highlight key data point
ax.annotate('Peak: 380 visits',
            xy=(x_val, y_val),
            xytext=(x_offset, y_offset),
            arrowprops=dict(arrowstyle='->',
                           color='#E74C3C'),
            fontsize=10, color='#E74C3C',
            fontweight='bold')

# Add reference line
ax.axhline(y=target, color='gray', linestyle='--',
           linewidth=1, alpha=0.5)
ax.text(x_pos, target - 3, 'Target: 120 min',
        fontsize=9, color='gray')

Excel Chart Formatting

Task How
Remove legend (one series) Click legend → Delete
Lighten gridlines Format → Color: light gray
Remove chart border Format → No border
Descriptive title Replace "Chart Title" with finding
Start y-axis at 0 Format Axis → Minimum = 0
Single color Format bars → one muted color
Add data labels Right-click → Add Data Labels

Common Mistakes

Mistake Correction
Reporting only p-values Always include effect sizes and CIs
"We proved the treatment works" "The data provides strong evidence of an effect"
Charts without titles or with generic titles State the finding in the title
No limitations section Every analysis has limitations — state them
Manual data editing without documentation Script all analysis steps for reproducibility
Red-green color coding Use colorblind-friendly palettes
"p = .06 means no effect" "Evidence was suggestive but didn't reach conventional significance"
Same writing style for all audiences Adapt detail and language to the audience

Connections

Connection Details
Ch.5 (Graph types) Graph types from Ch.5 are now polished with design principles and accessibility
Ch.7 (Reproducibility) Cleaning logs from Ch.7 become the Methods section of your report
Ch.13 (p-values) p-value communication is one of the hardest challenges — now you have templates
Ch.17 (Effect sizes) Always report effect sizes alongside p-values — the rule from Ch.17 becomes a reporting standard
Ch.20 (Decomposing variability) $R^2$ is one of the most intuitive numbers to communicate: "explains X% of the variation"
Ch.22 (Regression) Regression results need careful communication — slope interpretation and $R^2$
Ch.26 (Critical consumer) The misleading techniques you learned to avoid as a producer, you'll learn to detect as a consumer
Ch.27 (Ethical data practice) Honest visualization is ethical practice — the line between design and deception