Big Data: Basics and Dilemmas of Big Data Use in Policy-making

This article was originally published on Policy Hub.

Big Data has brought about a rather dramatic change in a number of areas related to governance. The idea that an ever-growing amount of data can impact the relationship between citizens and state services which has been present in both academic discourses and practical applications raises theoretical, mechanical, ideological and moral questions.

Characteristics of Big Data provide an opportunity to realize its potentials and dangers. Big data is defined by its unprecedented levels of volume, variety, velocity and veracity[1], and these characteristics point to its non-static nature. Increasingly, it is thought of as a mechanism governments can use to create data-driven policies. Data science is currently being used in the military, financial, commercial, and scientific sectors, among others. Generally, it is been spoken of with enthusiasm, its potential embedded within a belief in citizen-oriented democracies. However, copious amounts of potentially useful data may not automatically lead to good policy or responsible governance.


To get a rudimentary idea about the concept of Big Data, imagine an ordinary person. This person is active online, uses credit cards, cell phones, GPS devices, wireless home heaters, Bluetooth devices, bank accounts, personalized cars, smartwatches, etc. All of this activity leaves a changing, moving, and agile memorizing pile of data freely given away and stored in hundreds of petabytes. This pile of data effectively amounts to a persons’ data double. Now imagine that each person active online has their own double. Merging all those data doubles, and their interactions, leads to creation of enormous amounts of information, which is stored in data centers. This is where Big Data becomes Big Data, but also stops being incidental: the storage of Big Data is always intentional, raising questions about its use in policymaking.

There are many different sources of Big Data, ranging from corporation-collected data to social media-collected data. Figure 1[2] illustrates those that have been identified by the Data for Policy project(link is external), a joint research project by Technopolis Group, Oxford Internet Institute (OII,) and the Centre for European Policy Studies (CEPS). It is noticeable that a rather large amount of data is still administrative and statistical. However, the presence of other sources illustrates the changing paradigm when it comes to informing of the governments. A common character may be identified for all categories in this figure: no matter what the source is, data has value only when interpreted — raw data does not indicate anything in particular.

These sources of data are employed to determine citizen satisfaction, potential consumer choices, political opinions, and other areas of concern. The Data for Policy project also identified the specific areas in policymaking that data is used for. In Figure 2[3], it is evident that these are some of the more controversial policies in democracies[4]: transparency and budget being the most statistically backed.

As Helen Margetts, director of the Oxford Internet Institute, states, Big Data “potentially offers chances to make policy development processes more citizen-focused, taking into account public needs and preferences supported with actual experiences of public services.”[5]However, these goals are complicated by ethical concerns which come with Big Data being used intentionally, as well as more mechanical questions of technological capabilities of big data analysis.

People who have their hands on big data are most frequently data scientists. Through complex computational cross analyses and a vast number of sources of informing (see Figure 1), data scientists provide options for policymakers. The practices differ from country to country, but in essence, they involve a number of experts who work across professional fields in order to deliver a specific policy option. The demand for data scientists is expected to raise sharply in the following years, with Britain, for example, being expected “to create an average of 56,000 big data jobs a year until 2020”[6] . They are most often people trained in computer science, but with a unique sensibility and understanding for human behaviour[7]. These features are both necessary as data scientists are expected to understand how obtained data can fill various policy choices — as well as understand what these choices are. Currently, according to Data Science Central[8], the largest employers of data scientists are companies such as IBM, Microsoft, Google, and Amazon, with other international giants such as Oracle, Facebook, HP, and Stanford University making the shortlist too. A specific relationship between employers of data scientists — no matter their scale and profile — and governments has been established, as has been increasingly visible through government purchases[9].

Current examples

The use of Big Data in policy-making is, thus, already a practice. Regardless of their political orientation, governments have extensively built their capacities to make choices based on Big Data inputs. A recent example of the use of Big Data for policymaking can be found in China. In April 2015, Rogier Creemers of Oxford University published a translation of

“the regulations regarding the Chinese nationwide Social Credit System(link is external), attributing to each of its 1.3 billion citizens a score for his or her behavior. The system is to be based on various criteria, ranging from financial credibility and criminal record to social media behavior.”[10]

The social credit system, a government-tailored policy made in cooperation with Chinese purchasing giants such as Alibaba, uses Big Data to assess the perceived trustworthiness of individuals, aimed to “strengthen social sincerity, stimulate mutual trust in society, and reducing social contradictions (…)”[11]In practice, this policy may have direct effect on government behaviour towards citizens. Ultimately, the scores are predicted to lead to social benefits such as employment, less bureaucracy, or extensively informed wealth distribution models. However, this system is not designed to scrutinize how the government actually makes choices — leaving citizens vulnerable, and governments protected by data sets.

A similar problematic can be identified across the Western democracies. In the UK, for example, the government is currently implementing its Future Cities Catapult Initiative, which uses Big Data to “help innovators turn ingenious ideas into working prototypes that can be tested in real urban settings and once they’re proven, help spread them to cities across the world to improve quality of life, strengthen economies and protect the environment.”[12] Yet, it remains unclear what the mechanisms for choosing data are, as well as methods for identifying the problems to solve based on that data.

Similar initiatives are becoming increasingly popular throughout the European Union, although Brussels firmly supports the Privacy Law[13] introduced last year, by which large international companies’ methods of obtaining data are put under scrutiny to some extent. The extent to which governments in European Union use Big Data may hold potential to envisage the use of Big Data for making policies which do not give governments exclusive insight into citizens’ lives. However, these potentials are yet to be explored, as European Union concerns are largely market-based. “The concern,” argues Margerethe Vestager, European Comissioner for Competition, “is that huge data sets compiled by large Internet firms could give these companies an unfair advantage by essentially erecting barriers to new competition.”[14] Great Britain has certainly had success with Big Data analyses on the local level, however, it is questionable to what extent is it possible to preserve this as information gets more specific and oriented.

Potentials vs. Dangers

There is a number of questions which arise from the insight into current practices in data-driven policy-making. In fact, it appears that many of the challenges for the Big Data project in reference to policy-making all stem from the fact that governments react much slower to data than the rate at they can obtain it. The questions illustrate the essence of the debate about Big Data’s radical potential and dangers. The first set of questions is lead by the dilemma to what extent big data policy making is in accordance to values elected governments promote. The problem comes from the fact that it is difficult to point to the scope of the consent citizens may give to big data policy analysis.

In this sense, governments should be answering questions such as, if Big Data is truly analyzed for self reflection of the government, then how come this evaluation works in similar ways across different political systems and ideologies? Moreover, the citizens should be able to answer in what exact, not general, purpose, does their specific personal information enhance wellbeing, quality, or evaluative potential of a specific policy.

The second set of questions can be summarized as: to what extent big data policy making answers modern democratic challenges, especially considering current debates about inequality and transparency on a global level. This set of questions is backed by dilemmas about what do we want to achieve with Big Data, for whom, in what purpose, and, what are outcomes of the chosen approach. Big data’s transformative potential is one of the most basic and important debates occurring in the public sphere.


The use of Big Data in policy making still produces a number of dilemmas. There is a major problem in the relationship between governments and citizens which has materialized itself in the Big Data paradigm. This remains so long as governments operate on Big Data in the name of citizens most often without their actual consent. This problem poses two sets of questions on two different levels. The first set of questions deals with narratives and concepts which underlie the Big Data project, while the second set of question tries to understand its practical implications. None have been answered by governments which are open about the Big Data use. While Big Data has the potential to bring governments closer to citizens, governments are making citizens closer to them, often unknowingly. There is a rather clear accent on market exchange and profit that Big Data brings, while social benefits of it are often questioned or only used as narrative enhancers. The questions posed hint that we might witness changes in policymaking, though existing practices suggest that these may not be substantial without increased scrutiny and involvement by citizens.

[1] IBM. The 3 V’s of Big Data. IBM Big Data and Analytics Hub. <>(link is external) (page visited on 27. 2. 2016.)

[2] Bright et al. “Social reflection of new policies in the big data era”. Policy-making in the Big Data Era: Opportunities and Challenges. Data For Policy 2015. pg. 8 is external) (page visited on 27. 3. 2016.)

[3] Ibid.

[4] Freedom in the World 2015. Freedomhouse. < is external) (page visited on 27. 2. 2016.)

[5] Margets, H. “The promises and threats of big data for public policy-making”. The Policy and Internet Blog. < is external) (page visited on 27. 2. 2016.)

[6] Rapolu, B. “What’s a Data Scientist and How Do I Become One.” The Guardian. 2015. < is external)(page visited on 27. 2. 2016.)

[7] MIT EECS. MIT EECS Graduate Program Admissions. <> (link is external) (page visited on 27. 2. 2016.)

[8] Krivanek, M. “100 Best Data Science Companies to Work for in 2015. “ Data Science Central. 2015. < is external) (page visited on 27. 2. 2016.)

[9] Miller, S.”BIg Data Analytics.” Singapore Management University. 2013. <> (link is external) (page visited on 27. 2. 2016.)

[10] Creemers, R. “China Rates its Own Citizens- Including Online Behavior”. De Volkskrant, 2016. < is external) (page visited on 27. 2. 2016.)

[11] “State Council of People’s Republic of China, State Council Notice concerning Issuance of the Planning Outline for the Construction of a Social Credit System 2014–2020. “ Beijing: State Council, 2015. para. 9. is external)construction-of-a-social-credit-system-2014–2020/(link is external) (page visited on 27. 2. 2016.)

[12] About Future Cities Catapult. 2015. <> (link is external) (page visited on 27.3.2016)

[13] European Commission. “EU Data Protection Reform and Big Data”. EU Factsheet. March 2016. < is external) (page visited on 27.3.2016)

[14] GIbbs, S. EU agrees draft text of pan-European data privacy rules. The Guardian. 2015. < is external) (page visited on 27.3.2016)

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.