Maths⏱ 5 min read

Correlation vs Causation: What the Difference Actually Means

"Correlation doesn't imply causation" is one of the most repeated phrases in statistics — but understanding why it matters changes how you read news, studies, and data. Here's the clear explanation.

The correlation-causation distinction is cited constantly but explained poorly. Here's what it actually means, why it matters, and how to spot the error in the wild.

What Correlation Means

Two variables are correlated when they tend to move together — when one goes up, the other tends to go up (positive correlation) or down (negative correlation).

Correlation coefficient (r): −1 to +1 r = +1: perfect positive correlation (as X rises, Y always rises) r = 0: no correlation (X and Y are unrelated) r = −1: perfect negative correlation (as X rises, Y always falls) Examples: Height and weight: r ≈ +0.7 (positive, moderate) Exercise and body fat %: r ≈ −0.5 (negative, moderate) Shoe size and IQ: r ≈ 0 (no meaningful correlation)

What Causation Means

Causation means one variable directly causes a change in the other. Causation is much harder to establish than correlation — it requires evidence of mechanism, temporal order (cause comes before effect), and ideally experimental confirmation.

Why Correlation ≠ Causation: Three Explanations

1. Reverse causation: The direction of influence is backwards from what's assumed.

Example: Countries with more hospitals have more deaths. Conclusion: hospitals cause death? No — sick people go to hospitals. More illness causes more hospitals AND more deaths.

2. Confounding variable: A third variable causes both the correlated variables.

Example: Ice cream sales correlate with drowning deaths. Cause? Hot weather increases both ice cream consumption AND swimming, which increases drowning risk. Remove the confounder (temperature) and the correlation disappears.

3. Spurious correlation: Two variables correlate by chance, especially in small datasets or when many correlations are tested.

Famous examples: Nicolas Cage films released per year correlates with drowning deaths in swimming pools. Per capita cheese consumption correlates with deaths from bedsheet tangling. These are statistical noise masquerading as signal.

How Scientists Establish Causation

The gold standard is a Randomised Controlled Trial (RCT): randomly assign participants to a treatment or control group, controlling for all other variables. Random assignment eliminates confounding variables on average.

When RCTs are impossible (you can't randomly assign people to smoke for 30 years), epidemiologists use the Bradford Hill criteria — a checklist including: strength of association, consistency across studies, biological plausibility, dose-response relationship, and temporal order.

Practical Examples of Getting It Wrong

Correlation ObservedIncorrect ConclusionActual Explanation

Organic food sales and autism diagnoses (both rising)Organic food causes autismBoth increased over same period; no causal link

Countries with more TVs live longerTVs cause longevityWealthier countries have more TVs AND better healthcare

Children with larger shoe sizes read betterShoe size affects readingOlder children have larger feet AND more education

When Correlation Is Still Useful

Correlation is valuable even without causation for prediction purposes. If shoe size predicts reading ability in children, you can use it to identify struggling readers — even though buying bigger shoes won't help them read. In medicine, correlated biomarkers can predict disease risk without being causes. In finance, correlated assets inform portfolio diversification. The key is not to intervene on the correlated variable and expect to change the outcome.

📊

Try it yourself — free

Statistics Calculator · no sign-up, instant results

Open Statistics Calculator →

← All Articles