Key Takeaways: Communicating with Data: Telling Stories with Numbers

Contributors

Key Takeaways: Communicating with Data: Telling Stories with Numbers

One-Sentence Summary

Effective data communication requires honest visualizations that follow Tufte's principles (maximize data-ink, minimize chartjunk), audience-appropriate writing that includes both statistical significance and effect sizes, structured reports with Introduction-Methods-Results-Discussion-Limitations, and reproducible analysis practices — because the most rigorous analysis in the world is worthless if your audience doesn't understand it.

Core Concepts at a Glance

Concept	Definition	Why It Matters
Data-ink ratio	Proportion of ink used to display data vs. total ink	Guides you to remove everything that doesn't serve the data
Chartjunk	Visual elements that don't convey information (3D effects, gradients, decorations)	Reduces clarity and distracts from the data
Small multiples	Series of similarly designed charts for comparison across groups	Leverages human pattern-detection across consistently formatted panels
Data storytelling	Translating statistical findings into narrative with context, implications, and recommendations	Bridges the gap between analysis and action
Reproducible analysis	Analysis that someone else can reconstruct and arrive at the same results	Ensures scientific integrity and professional credibility

Tufte's Principles

Principle	Application
Maximize data-ink ratio	Remove unnecessary gridlines, borders, backgrounds, and decorations
Eliminate chartjunk	No 3D effects, gradient fills, decorative icons, or drop shadows
Use small multiples	Compare groups with side-by-side panels sharing the same axes
Show the data	Display individual observations, not just summaries
Encourage comparison	Use shared axes and consistent design
Serve a clear purpose	Every chart should answer a specific question
Integrate text and data	Annotate key features; titles should state findings

Misleading Techniques Checklist

Technique	Problem	Fix
Truncated axis	Small differences look enormous	Start bar chart y-axis at 0; label breaks
Cherry-picked time window	Controls narrative through selective framing	Show longest reasonable time frame; justify window
Dual y-axes	Any two variables can be made to look correlated	Use small multiples instead
3D effects	Distorts proportions through perspective	Use flat 2D charts
Too many pie slices	Comparison becomes impossible	Switch to sorted bar chart
Area/volume distortion	Non-linear scaling exaggerates differences	Scale by area, not diameter; prefer bars

Writing Statistical Results

Template Sentences

Confidence Interval:

Technical: "The 95% CI for mean [variable] was ([lower], [upper])."

Plain: "We estimate the average [variable] is between [lower] and [upper]."

t-Test:

Technical: "t([df]) = [value], p = [value], d = [value], 95% CI: ([lower], [upper])."

Plain: "[Group 1] scored [higher/lower] by about [difference]. This is [unlikely] to be chance, and the effect is [small/medium/large]."

Regression:

Technical: "b = [slope], p = [value], $R^2$ = [value]."

Plain: "For every additional [unit of x], [y] tends to [change] by about [slope]. The model explains [R² × 100]% of the variation."

The "So What?" Checklist

Every result needs: 1. The finding — what happened 2. The magnitude — how big (effect size) 3. The uncertainty — how confident (CI) 4. The implication — so what? (recommendation)

Report Structure

The Five Sections

Section	Purpose	Key Content
Introduction	The "Why"	Research question, context, hypothesis
Methods	The "How"	Data source, sample size, analysis methods, cleaning decisions
Results	The "What"	Findings with test statistics, effect sizes, CIs, and visualizations
Discussion	The "So What"	Interpretation, practical significance, alternative explanations
Limitations	The "But"	Sampling, measurement, confounding, generalizability

Executive Summary Template

What did we study? (One sentence)
What did we find? (One or two sentences)
Why does it matter? (One sentence)
What should we do? (One sentence)

Presenting Uncertainty Honestly

Tool	When to Use
Error bars	Bar charts comparing group means
Confidence bands	Regression lines and trend lines
Hedging language	Text descriptions of findings
Exact p-values	Reports (not just "p < .05")
Confidence intervals	Always, alongside point estimates
Effect sizes	Always, alongside p-values

Hedging Language Guide

Evidence Strength	Language
Very strong (p < .001, large effect)	"The data clearly shows..."
Good (p < .05, medium effect)	"The data suggests..."
Suggestive (p = .05–.10)	"There are hints, but further data is needed..."
No evidence (p > .10)	"We found no evidence that..." (NOT "We proved no effect")

Accessibility Principles

Principle	Implementation
Don't rely on color alone	Use shapes, patterns, AND colors
Use colorblind-friendly palettes	Viridis, cividis, or Wong's palette
Test your charts	View in grayscale
Use direct labels	Label data series on the chart, not just in legends

Reproducibility Checklist

Element	What to Do
Raw data	Save the original, unmodified dataset
Cleaning log	Document every step (deletions, transformations, imputations)
Code	Write all analysis in scripts or notebooks — no manual editing
Random seeds	Set `np.random.seed()` for any simulation
Library versions	Record version numbers of all packages
Comments	Explain why you made each decision

Key Python Code

Professional Chart Template

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Set global style
sns.set_style("whitegrid")
plt.rcParams.update({
    'font.size': 11,
    'axes.titlesize': 14,
    'figure.dpi': 150,
    'savefig.dpi': 300
})

fig, ax = plt.subplots(figsize=(8, 5))

# Plot data
ax.bar(categories, values, color='steelblue', edgecolor='none')

# Clean styling
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['left'].set_color('#CCCCCC')
ax.spines['bottom'].set_color('#CCCCCC')
ax.grid(axis='y', alpha=0.3)

# Descriptive title (states the finding)
ax.set_title('Monthly Sales Increased 12% After Campaign',
             color='#333333')
ax.set_ylabel('Sales ($K)', color='#555555')

# Data labels
for bar, val in zip(ax.patches, values):
    ax.text(bar.get_x() + bar.get_width() / 2,
            bar.get_height() + 1, f'${val}K',
            ha='center', fontsize=10)

plt.tight_layout()
plt.savefig('chart.png', dpi=300, bbox_inches='tight')

Annotation Example

# Add annotation to highlight key data point
ax.annotate('Peak: 380 visits',
            xy=(x_val, y_val),
            xytext=(x_offset, y_offset),
            arrowprops=dict(arrowstyle='->',
                           color='#E74C3C'),
            fontsize=10, color='#E74C3C',
            fontweight='bold')

# Add reference line
ax.axhline(y=target, color='gray', linestyle='--',
           linewidth=1, alpha=0.5)
ax.text(x_pos, target - 3, 'Target: 120 min',
        fontsize=9, color='gray')

Excel Chart Formatting

Task	How
Remove legend (one series)	Click legend → Delete
Lighten gridlines	Format → Color: light gray
Remove chart border	Format → No border
Descriptive title	Replace "Chart Title" with finding
Start y-axis at 0	Format Axis → Minimum = 0
Single color	Format bars → one muted color
Add data labels	Right-click → Add Data Labels

Common Mistakes

Mistake	Correction
Reporting only p-values	Always include effect sizes and CIs
"We proved the treatment works"	"The data provides strong evidence of an effect"
Charts without titles or with generic titles	State the finding in the title
No limitations section	Every analysis has limitations — state them
Manual data editing without documentation	Script all analysis steps for reproducibility
Red-green color coding	Use colorblind-friendly palettes
"p = .06 means no effect"	"Evidence was suggestive but didn't reach conventional significance"
Same writing style for all audiences	Adapt detail and language to the audience

Connections

Connection	Details
Ch.5 (Graph types)	Graph types from Ch.5 are now polished with design principles and accessibility
Ch.7 (Reproducibility)	Cleaning logs from Ch.7 become the Methods section of your report
Ch.13 (p-values)	p-value communication is one of the hardest challenges — now you have templates
Ch.17 (Effect sizes)	Always report effect sizes alongside p-values — the rule from Ch.17 becomes a reporting standard
Ch.20 (Decomposing variability)	$R^2$ is one of the most intuitive numbers to communicate: "explains X% of the variation"
Ch.22 (Regression)	Regression results need careful communication — slope interpretation and $R^2$
Ch.26 (Critical consumer)	The misleading techniques you learned to avoid as a producer, you'll learn to detect as a consumer
Ch.27 (Ethical data practice)	Honest visualization is ethical practice — the line between design and deception