Last changed 21 Oct 2014 ............... Length about 900 words (10,000 bytes).
(Document started on 15 Feb 2005.) This is a WWW document maintained by Steve Draper, installed at http://www.psy.gla.ac.uk/~steve/best/correlation.html. You may copy it. How to refer to it.

Web site logical path: [www.psy.gla.ac.uk] [~steve] [best] [this page]

Correlation and causation

By Steve Draper,   Department of Psychology,   University of Glasgow.

Correlation is not causation (but it sure is a hint).

The possible causal relationships

Correlation and causation: if A and B are correlated, any one of these different causal relationships may underlie it:
A ⇒ B
B ⇒ A
C ⇒ A, B
A ⇔ B, so doing either one will increase the other. Bi-directional causality.
A ≡ B. Tautology / identity. Non-causal.

Correlation is a big hint about causality, but it is ambiguous, and mistakes are frequently made. If A is correlated with B, then all five of these relationships are equally possible, given only that evidence.

  1. A causes B
  2. B causes A
  3. A third factor C causes both A and B not necessarily at the same time (the electrical discharge of lightning causes both flash and boom, light and sound arriving at different times).
  4. A and B both increase (cause) the other, as in any positive feedback loop (vicious circle). For instance, two adjacent blocks of explosive: if one goes off, it will set off the other; if person A annoys B, B is likely to retaliate; if a student's motivation is high they are more likely to learn, but if they succeed at learning their motivation will rise (so motivation is often an effect, a symptom, not a prime mover); if A sees B as beautiful A is more likely to be attracted to B, but if A loves B then A is more likely to see B as beautiful.
  5. A ≡ B. Tautology / identity. A and B have to occur together because they turn out to be the same by definition. (See section below.)

Caused by the same third factor [3]

Number 3 above may be the most troublesome.

It is particularly misleading when the time delays involved are consistent with one direction of causality, but not the other; yet a third factor is actually prior to and causing both.

What is also misleading is when these cases are reported with no statement about causality made, leading to almost all readers drawing the false, or at least unwarranted, conclusion the writer wanted.

Case 1. School children who are involved with employers (e.g. in work experience) before they leave school are more likely to end up employed. (But the factors that make a child more likely to participate in these schemes may cause both participation and then success at job seeking e.g. liking work, being stimulated by environments outside the home, not having to stay home to care for relatives.)

Case 2. Big budget movies which are promoted at the Superbowl gross about 40% more than those who don't. (But having more money for promotion predicts success; and so perhaps does appealing to the kind of audience that watches the Superbowl.)

Causation in both directions[4]

In addition to the positive and negative feedback cases mentioned, a common problem in psychology theories and papers is that they talk as if showing causation in one direction settles it. An experiment, if a sig. result is obtained, is evidence about causation in one direction. It is no evidence at all about whether or not there is causation in the reverse direction. Very few reports of experiments discuss this: and so are as at-fault as papers which allow correlation to imply causality. Both positive and negative feedback are common in nature, very common indeed in physiology, and must be considered not as possibilities but as a-priori likelihoods in psychology.

Other possible problems [5] (The possible non-causal relationships)

A ≡ B. Tautology / identity.
E.g. If my paternal grandfather's only son is called 'Martin' then my father is called 'Martin'. If the temperature is zero Centigrade then it is 32 degrees Fahrenheit. One doesn't cause the other: it is another way of referring to or describing the same thing. They will be perfectly correlated, not because of causation, but due to another kind of determination.

Conversely, there can be complete determination by definition, yet zero correlation because correlation is a linear relationship. E.g. as in the equation   y = {x   ×   x}   or   y = {x^2} .

The slogans

Correlation does not entail causation. (Or as it is more often expressed, correlation does not necessarily imply causation. Or as it is a little carelessly put even more often "correlation does not imply causation", even though in fact some of the most important scientific advances have come precisely because scientists did investigate that implication.)

As Tufte observes (following David Hume), it's more accurate to say:
Empirically observed covariation is a necessary but not sufficient condition for causality
or, colloquially:
Correlation is not causation — but it sure is a hint

Using "prediction" misleadingly

It is common practice for papers to say "X predicts Y" when they mean "correlates with", but it misleads the majority of readers to read it as "X causes Y". Statisticians may say this is a technical use of the term. (But this usage is NOT that in all other areas of science and engineering than statistics and psychology. When an engineer says an aircraft is reliable, it does not mean it crashes with a predictable frequency; it means it can be trusted for the purposes for which it was designed.)

Saying it is a technical term can only be accepted as a defence if the same person who writes "driving while drunk predicts having a car crash" will be happy to have their text corrected to read "Having a car crash predicts driving while drunk".

Meanwhile it is best to translate almost every use of "predict" in psych. papers to "correlates with" to stop yourself being misled by this "technical" use of the term.

A variation: traits (and correlations over time)

A similar tendency to faulty inference occurs around time scales and states vs. traits. Just because a property of a person (or thing) is "reliable", i.e. strongly correlated over time when you do test, re-test measures, this doesn't tell you anything about how easy it might be to change it; but the temptation is to label it a trait.
  • A person may be poor for years, but one windfall can change that forever overnight.
  • If you insist people express a preference for visual over audio materials for learning, they will do so in a moderately "reliable" way. But this is not predictive of how well they learn with each kind of material, even though that inference is drawn by large numbers of published papers.
  • For a long period in the UK, few girls studied science subjects and it was assumed that strong forces were at work. When enough pressure was applied to teachers to make them change their advice, then the numbers changed in less than a year in the schools where that pressure had been applied. It turned out there were no large forces on or within the girls preventing it, despite it being a big effect, and highly reliable up to that time.

    On the other hand:

  • Many smokers believe they can stop any time, and that the predictability and stability of their habit is misleading. The evidence is against that.
  • Human body weight is extremely resistant to dietary change: (past weight is a good predictor of future weight). Why? because there are many feedback mechanisms that adjust to cancel out changes in weight from changes in food input.
  • Contrary to what was believed at one time, sexual orientation and identity can be very resistant to even extreme external social pressure.

    Reliability (i.e. correlation over time) is no predictor of how easy or likely something is to change.

    Web site logical path: [www.psy.gla.ac.uk] [~steve] [best] [this page]
    [Top of this page]