It has long been recognised by those working with data that given a large enough sample size then most data points will have statistically significant correlations because at some level everything is related to everything else. The psychologist Paul Meehl famously called this the ‘Crud Factor’ leading us to believe there are real relationships in the data where in fact the linkage is trivial.
Nate Silver made the same point when he warned that the number of ‘meaningful relationships’ is not increasing in step with the meteoric increase in amount of data available. We simply generate a larger number of false positives, an issue endemic in data analytics which led John Ioannidis to suggest that two-thirds of the findings in medical journals were in fact not robust.
So if we cannot always rely on statistical techniques to cut through swathes of data to find meaningful patterns then where do we turn? Naturally, we look to ourselves. Perhaps this is implicit in the discussion about the qualities of good data scientists, being ‘informed sceptics’ that balance judgement and analysis or that the key qualities are having a sense of wonder, a quantitative knack, persistence and technical skills. However as soon as we recognise that humans are involved in the analysis of data we need to start exploring some of the frailties of our judgement for if there is one thing that behavioural economics has taught us, is that none of us is immune from misinterpreting data.