Why data-driven science is more than just a buzzword
Forget looking through a telescope at the stars. An astronomer today is more likely to be online: digitally scheduling observations, running them remotely on a telescope in the desert, and downloading the results for analysis.
For many astronomers the first step in doing science is exploring this data computationally. It may sound like a buzzword, but data-driven science is part of a profound shift in fields like astronomy.
A 2015 report by the Australian Academy of Science found that among more than 500 professional astronomers in Australia, around one quarter of their research effort was now computational in nature. Yet many high school and university science, technology and engineering subjects still treat the necessary skills as second-class citizens.
Referring both to the modelling of the world through simulations and the exploration of observational data, computation is central not only to astronomy but a range of sciences, including bioinformatics, computational linguistics and particle physics.
To prepare the next generation, we must develop new teaching methods that recognise data-driven and computational approaches as some of the primary tools of contemporary research.
The era of big data in science
The great empiricists of the 17th century believed that if we used our senses to collect as much data as possible, we would ultimately understand our world.
Although empirical science has a long history, there are some key differences between a traditional approach and the data-driven science we do today.
The change that has perhaps had the most impact is the sheer amount of data that computers can now collect. This has enabled a change in philososphy: data can be gathered to serve many projects rather than just one, and the way we explore and mine data allows us to “plan for serendipity”.
Take the search for new types of astronomical phenomena. Large data sets can yield unexpected results: some modern examples are the discovery of fast radio bursts by astronomer Duncan Lorimer and the discovery of plasma tubes in the Earth’s ionosphere by a former undergraduate student of mine, Cleo Loi. Both of these depended on mining of archival data sets that had been designed for a different purpose.
Many scientists now work collaboratively to design experiments that can serve many projects at once and test different hypotheses. For example, the book outlining the science case for the future Square Kilometre Array Telescope, to be built in South Africa and Australia, has 135 chapters contributed by 1,200 authors.
Our education system needs to change, too
Classic images of science include Albert Einstein writing down the equations of relativity, or Marie Curie discovering radium in her laboratory.
Our understanding of how science works is often formed in high school, where we learn about theory and experiment. We picture these twin pillars working together, with experimental scientists testing theories, and theorists developing new ways to explain empirical results.
Computation, however, is rarely mentioned, and so many key skills are left undeveloped.
To design unbiased experiments and select robust samples, for example, scientists need excellent statistical skills. But often this part of maths takes a back seat in university degrees. To ensure our data-driven experiments and explorations are rigorous, scientists need to know more than just high school statistics.
In fact, to solve problems in this era, scientists also need to develop computational thinking. It’s not just about coding, although that’s a good start. They need to think creatively about algorithms, and how to manage and mine data using sophisticated techniques such as machine learning.
Applying simple algorithms to massive data sets simply doesn’t work, even when you have the power of 10,000-core supercomputers. Switching to more sophisticated techniques from computer science, such as the kd-tree algorithm for matching astronomical objects, can speed up software by orders of magnitude.
Some steps are being taken in the right direction. Many universities are introducing courses and degrees in data science, incorporating statistics and computer science combined with science or business. For example, I recently launched an online course on data-driven astronomy, which aims to teach skills like data management and machine learning in the context of astronomy.
In schools the new Australian Curriculum in Digital Technologies makes coding and computational thinking part of the syllabus from Year 2. This will develop vital skills, but the next step is to integrate modern approaches directly into science classrooms.
Computation has been an important part of science for more than half a century, and the data explosion is making it even more central. By teaching computational thinking as part of science, we can ensure our students are prepared to make the next round of great discoveries.