Big data and research design — two essential ingredients in the process of evidence-based service transformation

5th May 2017

Social Finance ran a workshop for analysts in greater Manchester a few weeks ago. Our guest speaker— Prof. Stephen Morris — wrote a blog post about his talk on the importance of big data and research design in the process of evidence-based transformation.

By Stephen Morris, Professor of Evaluation, Policy Evaluation Research Unit, Manchester Metropolitan University & Network Member, Manchester Metropolitan University Crime & Well-Being Big Data Centre

Big data are popularly characterised as possessing ‘Volume, Velocity and Variety’ (Chan & Moses, 2016). It seems obvious to the point of hardly worth stating, but big data are big,. They also are updated extraordinarily quickly in many cases almost to the point of being real time. Technical advances in the storage and linking of data from multiple sources means that they possess enormous variety, at least in potential. Advances in geo-coding data make it possible to link multiple sources of digital information via place or geography; for example, transportation data, mobile phone records and police incident reports.

The twin developments of big data and data science pose a more unique challenge than even these impressive features of the big data revolution imply. As Burrows & Savage, (2014) suggest, for social scientists the big data revolution which has started to set its sights on solving social and economic problems, has developed in a world alien to many of them; one of data analytics, information science, artificial intelligence and computing. So much so that some big data / data science cheerleaders such as Anderson (2014)[1] have declared the scientific method and thus social science obsolete. A less extreme view might be that substantive expertise in the form of social scientific theory and the tools and methods traditionally deployed by social scientists are less important than they once were. Big data and data science, it is said, offer a more cost-effective means of exploring the questions that were the concern of traditional social science, or even that big data provide the opportunity to pose new questions that traditional social science couldn’t even identify let alone address.

While some of this is true, I want to argue that the reality of the situation we face as analysts and social scientists is different from the picture painted by some advocates of data science. In fact, social science theory and research design have never been more important than they are today. For sure, the big data revolution offers possibilities and insights that are new and in some instances revolutionary. But as any undergraduate student who has studied quantitative social sciences will tell you — correlation is not the same thing as causation. For social scientists and analysts who are concerned primarily with evaluation of policies, programmes and interventions, correlation or description is vitally important but it is causation that really counts.

Let’s consider an example. Suppose we wish to know whether a training programme for unemployed adults improves their chances of obtaining work. It is obvious that there are many factors which determine whether an individual finds work or not besides the influence of the programme. The key question is how far does the programme contribute to improved employment probabilities independently of these other factors? To answer this question, we need to move beyond correlation and attempt to estimate the causal effect of the programme on employment probabilities. The two tools social scientists have traditionally brought to bear in addressing the problem of causal inference are theory and design. Simply amassing more data on the unemployed, their outcomes and characteristics, and applying ever more sophisticated data routines will not help you answer the causal question. Without a model or theory of behaviour and testing theory on the basis of data collected or arranged according to an explicit research design, we will not get very far.

It is not my intention to come across as too sceptical of big data. I’m excited by the opportunities they present and can see many advantages for those of us concerned with evaluation. But, like all exciting new developments there is a risk that we become entirely distracted by the possibilities and fail to recognise the limitations, ultimately undermining confidence and frustrating potential. Whilst the data pile-up and the algorithms get ever more elaborate, it is important to remember that we still need both design and theory to answer the questions that really matter. At Manchester Metropolitan University, the Policy Evaluation Research Unit and the Crime & Well-Being Big Data Centre are working together to get the best from big data with precisely these thoughts upper most in our minds.

[1] As quoted in Chan & Moses (2016)