Understanding “Big Data” and Its Role in Global Health
Anyone who moves around in professional circles might have heard the catch phrase “big data,” but not many can coherently explain what it really means. Marketing has hyped it to the point of reverence, but it’s important to deconstruct myth from reality.
Origins of Big Data
Throughout recorded history, humans have always accumulated and stored data on objects and people of value. The Egyptians did a remarkable job of capturing the lifestyle of an epoch with their hieroglyphics. Data in clever hands can be used to accomplish very powerful things.
In our time, the so-called “Information Age” has ushered in an unprecedented amount of data, stored in the world of “bits” (binary digits represented by ones and zeros). Most of the world’s value now lives in this new domain, supported by the billion dollar valuation of Internet companies such as Google, Amazon, Facebook and others. Furthermore, the credits on all our bank statements and intellectual property all exist in databases. This data has also been compounding at an exponential rate, as our lives become increasingly digitized. We have ushered in the dawn of “big data,” but is there too much hype surrounding the term?
“Big Data” — hype or reality?
The truth is we have always had large amounts of data on things of value, even before the term “big data” was coined. The use of the word “big” is therefore misleading — you have to ask: “big” relative to what? What’s considered “big” today might not be so tomorrow, and therein lies the misguided marketing of recent years.
Very few organizations have what could rightly be referred to as “big data” in a relative sense, the standard usually being about five terabytes (TB) and higher. Google, Facebook, and a host of other internet companies clearly meet this criteria. Financial service firms (e.g. Goldman Sachs and Bank of America) and major national governments are also gatekeepers of these huge databanks. “Big data” is a concept that captures the rapidly growing volume of stored information in databases at large companies. This data has the following characteristics:
- volume: stored in servers that provide adequate memory space
- velocity: accumulated at an increasingly high rate, at least 1TB per second
- veracity: gathered with a lot of useless data “noise” that needs to be cleaned before analysis is done
A more apt term for such data would actually be “growing data,” not “big data” which implies erroneously that it has stopped. It hasn’t — in fact, it’s only just getting started.
Big Data and Global Health
The global health arena remains one of the key domains that can benefit from all this growing data. Data at its core is information, and insightful information presented effectively to the right people can lead to positive changes and greater impact. For instance, electronic medical records provide a great way to collect patient data over time and can lead to important improvements in planning and healthcare service delivery.
In places like India, for example, the government has instituted personal identity cards, called “ Aadhaar” that can be used to store patient biometric data points. This program can be scaled to other low- and middle-income countries. In my own country, Uganda, the newly instituted national ID cards for every citizen could be used to do something similar.
Of course we must consider the necessary privacy protection frameworks to guard against the potential misuse of confidential data, but the reality is that “growing data” is here to stay, and the field of global health should take notice to ensure that it is used for the highest good possible.