With the emergence of advances in methods in machine learning, data mining, and computer architecture, data scientific methods have become the center of attention in many interdisciplinary and upcoming fields of study. It may at first seem to be a wild assertion of many technocrats who are trying to force new technology to solve problems which are already streamlined. After all, why try fixing something that’s not broken? However, keeping the future in mind, where it’s expected that data will expand many folds, studies in informatics in any field that captures a lot of data is the need of the hour. In this article, I’ll focus on one particular area: data science in astronomy.
Astronomical observations gather a lot of data
Typically, studies in astronomy, even before astrophysics was a real thing, involved physical observations of objects in the sky. The first acclaimed astrophysicist, Johannes Kepler, with Tycho Brahe’s data, could formulate his laws of planetary motion by repeatedly observing the trajectories of planets in our solar system. It occurred to him in this process of observing that the trajectories of planets did not fit the geometric model of a circle, despite the common belief till then that all planets move in circular orbits.
Thinking about this almost four centuries later, I am forced to imagine this process of uncovering knowledge from nature to be quite data scientific. Kepler made observations, plotted them on graphs, and tried different models to explain the shape of the trajectories. At that time, the law of gravitation had not been formulated, so while there was no sound explanation of why the heliocentric model actually worked, we knew about the elliptical trajectories of planets. A century away down, when Isaac Newton established the law of gravitation, there was an immediate reconciliation of an idea that was very natural and that could be experienced around us, and a mathematical model that was established a hundred years prior.
Today, needless to say, the number of such observatories and the number of observational astronomers are a lot more. This has resulted in a huge accumulation of different kinds of data, and the volume of data is only expected to increase in the years to come.
In an era where we are aware of the upcoming surge, it would be wise to develop mathematical, numerical, and computational methods to process it.
Today, we have the technology to acquire different kinds of data
Satellites and telescopes today are able to look at stellar objects in different spectra. We are able to capture data in the optical, ultraviolet, and infrared spectra, and even radio frequencies.
Terabytes to petabytes of data are collected in all this (from surveys such as SDSS, AIS, GALEX, etc.). In October 2018, the James Webb Space Telescope, an infrared telescope, is scheduled for launch. It is expected to enable us to view the universe as never before. In order to effectively collate all the data into useful information and knowledge, data mining methods, which are formulated in close contact with physics, are required to process all this data together.
And we also have the hardware to process all this data!
Improvements in distributed computing and cluster computing, and cloud computing have boomed over the past two decades! In addition to that, there are a lot of engineers constantly working to unlock the potential of GPU processing. The companies behind such architectural advancements, such as Apache (for Spark and Hadoop), and NVIDIA (for GPUs), and others such as AWS for cloud platforms (just to name a few) are working vehemently to improve the efficacy of their frameworks, so that they can be easily used for processing intensive and big data applications.
In the process of discovering knowledge, there are five main phases which we need to go through:
- A need for discovering knowledge
- Creating tools to capture the required data
- Using the developed tools to acquire the needed data
- Processing the data
- An effective inference of ideas and facts after processing the data
In any field of study, this process is cyclic, and we’re in a constant endeavor to uncover facts about the world around us; and in this purview, we are at a crucial era of science as we’re experiencing a paradigm shift of using tools that can greatly bolster the speed and amount of computations.
Data science can greatly bolster the fourth step, and this should be viewed as a boon in furthering the field of astronomy. Greater computational infrastructure can enable astronomers to find out more about our universe by the development of responsible expert systems.
All the same, it is necessary to leave enough space for human intervention. Unlike the more common uses of ML today in assistive chatbots, object detectors, maps, etc., the systems in astroinformatics should execute methods in tandem with the human understanding of the world around us (and this is the case with studies in informatics in any field of natural sciences). Hence, the focus should be on developing systems which augment human ability than which function independently, as human imagination and the power of inference may not be programmable.