Everything is correlated

Everything is correlated

No doubt I’ve mentioned somewhere in my blog that my core focus as an economist is development. What this means is I chiefly focus on looking at how and why a country’s economy works (or doesn’t).

Now for a range of reasons, development economists tend not to fall back on the mainstream tools of policy analysis. This is likely both because there is a clear precedent for things going wrong and because development is full of tree-huggers.

To make things that much more difficult there is typically little information (or data) about things we care about, after all statistics are a low priority when you don’t have an adequate health care system.

As a result we are often trying to make new insights about stuff we know little about and we can’t directly measure. So economists have to get a little ‘touchy feely’ with their statistics, using things like ‘proxies’ and ‘correlation’ to take a ‘stab in the dark’ about what’s going on.

What essentially is meant when we two things are ‘correlated’ is that the things move together. For instance a person’s foot size (or income) and somebody’s height are likely to be correlated (big feet, tall person etc).

Alternatively, we might think that the number of stray cats that follow us during the day is correlated with the amount of milk we spilt on our clothes in the morning. More milk means more cats.

However, although spilling milk is related to crowds of cats, for two things to be correlated one does not need to cause the another. For instance, they could both occur due to coincidence, or as a result of something other than cats and your clumsiness.

A cool way to demonstrate how common correlations are for seemingly unrelated things, is by checking out google’s correlate tool.

For instance the graph below shows how closely related the number of times people search ‘bottomless trap hole’ at the same time as someone searches for ‘stubbed my toe’.

People looking for bottomless trap holes after stubbing their toe

The first thing to understand about that garbled mess above is that each of those little blue circles are a record of something happening. In this case the number of times people have searched for ‘stubbed my toe’ at the same time as people have searched for ‘bottomless trap hole’.

So on the side of the graph pointing to the sky (the Y axis) are the number of times the term ‘bottomless trap hole’ has been searched, while the number of time somebody has stubbed their toe, at the same time, is on the side of the graph which looks like it’s sitting on the floor (the X axis).

Now if there was no relationship between the two things, the above graph would look like somebody had thrown a bunch of darts at the graph. That is, when people hit their toe on the side of the bed they don’t care about bottomless trap holes.

If that was the case, the graph might look something like this:

That’s random as bro

 

That is, on average we have no idea where the blue dots are likely to be. That is,  they occur independently of each other. Or they’re unrelated to each other.

However, back to the Google example (illustrated below) we can see that the little blue circles tend to be placed in a predictable fashion, as can be seen by my MS-Paint style demonstration below.

If I asked you to guess how many times people would be interested in a bottomless hole you could probably guess it by knowing how many people searched for stubbing their toe.

But obviously, this doesn’t mean one causes the other.

So we might not want to start filling in the bottomless trap holes to avoid stubbed toes. This is both because that would be impossible (the trap hole is bottomless after all) and probably wouldn’t help cure your clumsiness.

Trap holes are also great for hiding stuff you don’t want your friend’s to know about.

But this is often how analysis begins. We have a lot of information about cats, trap holes and interest rates, and we see how they are related. Then one by one we pick the relationships which are plausible to look at in more detail.

However, lots of things are correlated and some things are even more plausibly correlated than cats and milk. Not only that, it’s rare to find something which is only related to one other thing. Cats might crowd around us not just when we spill milk on ourselves, but also when they haven’t been fed.

So how do you go about figuring out what probably causes something else? Well that’s a matter for another post. After all, too much statistics is unlikely to be a crowd pleaser.

But in the meantime I’d recommend you check out Google’s comic on their ‘Google Correlate’ website here.

Giles

I'm an economist, data geek and public speaker.

Leave a Reply

Time limit is exhausted. Please reload CAPTCHA.