Does correlation mean cause and effect?

Most academics who have been associated with elementary statistics teaching have their stories of spurious correlation being linked to causality.  One graduation day, I was chatting to the father of one of my project students about this, and he suggested one from his business, managing a paper mill.

Someone had noticed that the percentage failure rate of their paper fluctuated from day to day and month to month.  The same person had noted that this rate was strongly correlated with the number of soft drinks sold from vending machines in the factory.  Both increased and decreased together (positive correlation).  Does that mean that the workers were neglecting the production to drink more soft drinks?  Or were the workers seeking to cool off their annoyance at the failure rate (which was curling at the edges) by drinking more cold drinks.

As usual there is a simple explanation; while you are thinking about it, here's a drink vending machine to pad out the story.

The usual explanation for spurious correlations is that both sets of observations are linked by causality to a third set of data.  In this case it was temperature.  As the temperature rose, the paper was more liable to curl.  And the workers needed more liquid to cool down.

Last week, the Independent carried a short item on correlated data sets. 

Monday 21st November: news story: 
What's the secret to a contented retirement? One answer, according to a new study, is regular sex. The more often married over-65s have sex the more likely they are to be happy with their lives, researchers found.
The survey of 238 men and women, presented yesterday at the Gerontological Society of America, found 60 per cent of those who had regular sex said they were very happy compared with 40 per cent for whom it was a distant memory.
When it came to their marriages, 80 per cent of those who had regular sex said they were very happy compared with 59 per cent of those who did not.
Adrienne Jackson, from Florida University, said the findings should "spark interest" in helping older people deal with "resolvable issues" which prevent them having sex.
Previous research shows the main factor that limits sexual activity is a man's health, not a woman's. Many illnesses, such as diabetes and prostate cancer, affect a man's capacity to have and maintain an erection.
But therapists warn about the danger of raising expectations so people feel there is something wrong if they are not having sex. There isn't.

Tuesday 22nd November: Letters
Cause and effect
You report that regular sex is one of the secrets of a contented retirement (21 November). Could it be that the correlation is the other way about: those over-65s who are happy with their lives have more sex?
Tony Wood

When I found for the online versions of the story, I was amused that it was tagged with four labels: Biology, Schools, Sex, Higher Education.  I assume that these labels were given by some automatic process which picked up the key word "University" and created the label "Higher Education". 


  1. In my presentations on Correlations between Random Variables, I often present an example very similar to your first one (in fact, it also has temperature as the latent variable): In a beach town, both the shark attacks and ice cream sales go up in summer. Of course relating these two as cause and effect would be completely wrong.

    Now I have one more example I can use :-)

    The second example is slightly different though. Here, the cause and effect has been juxtaposed. There isn't necessarily a latent variable between "healthy sex life" and "contended retirement", don't you think?

  2. When I taught regression, I used the famous Yule data set: highly correlated time series of the mortality rate and the fraction of marriages performed by the Church of England (which I labeled the CoE market share, these being business students). Both are annual data from England and Wales. The sister of one student had been married in the CoE, so the student was able to confirm anecdotally that the ceremony involved no human sacrifice beyond the groom.

  3. Thank you Samik and Paul. Yes, there doesn't really appear to be a latent variable for retired people ... but cause and effect are not proved.

    A further example of a latent variable which I used to use was the positive correlation between the number of children living in Danish village houses and the number of stork nests on the chimneys of the house. More storks means more children, demonstrating that the storks bring babies! Of course the latent variable was the size of the house. But (to add humour to the correlation) in the 1960s through to 1980s, there was a decline in the number of storks nesting in Denmark, which coincided with a decreased birth rate, and then both started to increase at about the same time.


Post a Comment

Popular Posts