Graph paper, digital roads, the quirks of proteomes: the future of data

by Todd Neff

For a sense of Mona Chalabi’s work with data, you could do an author search at her employer’s site, The Guardian, where she’s the data editor. But if you wanted some really fascinating, raw insights, you could go to her Instagram page. There you’ll find handwritten, hand-colored charts, graphs and various other depictions showing such things as American nose jobs by year (which “beaked” at nearly 4 million in 2000); keyboard usage by key (“e” is the English champ); when Americans eat pizza (snacking is the surprising winner); and the top languages by (scarily depicted) native tongues. Chalabi’s interests are far-ranging, and include lots of borderline NSFW data nuggets, too (ideal penis size, farting frequency, orgasm rates and more). While her data sources are digital, her preferred tools are overtly analog: graph paper and colored markers.

That’s a long way from Hadoop and Apache Spark, and it’s this gap that got the team here at Brain Bar Budapest — where Chalabi will be featured as one of the masterminds at the festival this June — thinking about the future of data.

Data has always been integral to life — what’s a genome but a twisted mass of data? What’s the universe (maybe) but an enormous quantum computer? — and since the dawn of the internet especially, data has been a central player in our daily lives. The big changes afoot now have to do with the volumes of data being flung forth from about everything we look at, listen to, watch or buy and the way companies, governments and people in general are using it.

In manufacturing, retailing, logistics, and many other businesses, the impact of data on such things as forecasting has long been felt: enterprise resource management systems such as SAP and Oracle all have their own business intelligence databases. But data is taking center stage in new fields, too.

Take the example of self-driving cars and trucks. Autonomous vehicles still ride on the roads. But really, they ride on data. That data includes the torrents flowing in real-time from vehicle-mounted lidar, radars and cameras, and also the data sketching out 3D maps of street and road environments collected and aggregated by the likes of Here and Mobileye.

In health care, insurers such as Kaiser Permanente in the United States and the UK National Health Service are far from alone in looking at big data as an emerging pillar of effective, targeted health care. With electronic health records, all sorts of information can contribute to health care decision making. It can range from demographic and employment information (a solar installer has a higher risk of melanoma than a coal miner, but a good deal less risk of black lung), all the way down to genomic and proteomic details that can steer patients to precision therapies shown to be effective only with those who have specific mutations.

At Brain Bar Budapest in 2016, Alex Szalay explained how data is at the heart of the fourth major era in the history of science. Our scientific instruments produce hundreds of billions of data points: think Hubble Space Telescope, Large Hadron Collider, high-speed gene sequencers and so on. Finding the hidden gems requires new synthesis of statistics, computer science and historically discrete fields like astronomy, physics, medicine).

“The synthesis of statistics, computer science and basic sciences will become the fundamental language used by the next generation of scientists,” Szalay said.

There are risks. Privacy is the obvious one, but that’s just the start. As The Economist’s Kenneth Cukier has put it, “Big data and associated algorithms challenge white-collar knowledge workers in the 21st century in the same way that factory automation and the assembly line eroded blue-collar labor in the 19th and 20th centuries.”

Just as factory automation gave rise to new economic philosophies and political movements, Cukier continued, “It’s not much of an intellectual stretch to predict that new political philosophies and social movements will arise around big data, robots, computers and the Internet, and the effect of these technologies on the economy and representative democracy.”

There’s another, quieter risk: that of failing to harness the good that could come from all the data we’ve created and stored in so many places. There are already far too few Mona Chalabis available to apply human intelligence and perspective to gargantuan piles of data in real-time. In autonomous vehicles, the human is by definition out of the loop.

Maybe it’s better that way. No doctor will be able to synthesize the combination of heredity, mutations, habits, life history and so on that goes into, say, a data-driven lung-cancer treatment plan. And so Kaiser, the NHS and many others are working on clinical decision support systems capable of calculating, say, a lung cancer risk score a doctor can share with a patient during a routine check-up. Think of it as Mona Chalabi in an algorithm. She can’t be everywhere, after all — as nice as that might be.

Follow us!