Yandex

Since 1997, we have delivered world-class, locally relevant search and information services. Additionally, we have developed market-leading on-demand transportation services, navigation products, and other mobile applications for millions of consumers across the globe.

How Yandex Helps Astrophysicists Study Red Dwarf Flares

Yandex Cloud Editorial Team
Yandex
Published in
12 min readFeb 26, 2025

--

Red dwarfs are the most common type of stars in the Milky Way. These dim, small stars are much cooler and less luminous than the Sun. Surprisingly, most planetary systems we’ve discovered orbit these small red stars.

Decades of research have provided astrophysicists with a vast amount of data on how red dwarfs behave. Modern sky surveys contain tens of billions of observations for hundreds of millions of these stars. Traditional data processing methods can no longer handle this volume of data, so it’s becoming increasingly common for scientists to use machine learning methods to analyze the data.

This year, we launched a joint project with experts from the Sternberg Astronomical Institute of Lomonosov Moscow State University (MSU), the Faculty of Space Research of MSU, and Carnegie Mellon University, who are part of the international SNAD team. The Yandex Cloud Technology Center for Society has leveraged the Yandex DataSphere cloud service to help scientists build a machine learning pipeline for detecting stellar flares, ultimately creating the largest dataset of red dwarf flares based on ground-based observations.

We'll explore how machine learning revolutionizes astrophysics and drives groundbreaking discoveries with researchers Anastasia Lavrukhina, Boris Demkov, Konstantin Malanchev, and Maria Pruzhinskaya.

What Makes Red Dwarfs So Important to Astrophysicists?

Astronomers classify all stars in the Universe into types ranging from large and hot to small and cool. In scientific terms, stars are categorized by their spectral and luminosity classes, where luminosity is the amount of energy they emit per unit of time. This is best visualized using the Hertzsprung–Russell diagram.

The main sequence, represented by a diagonal line on the diagram, consists of the stars in their longest evolutionary stage: hydrogen burning. The red dwarfs are found in the area where the temperature is ~2700–3500 kelvins and luminosity is 1/10–1/10000 that of the Sun.

While the Sun is classified as a yellow dwarf, it’s far from being the largest or hottest star in the Milky Way. There are larger and hotter stars. Red dwarfs are the most common type of stars that we know of. Despite some variation in specific numbers between different studies, it’s generally agreed that red dwarfs outnumber all other types of stars.

Red dwarfs are also known for their longevity: they burn their hydrogen fuel for tens of billions of years. These characteristics, among others, make them a fascinating subject of study from many perspectives.

Studying Fundamental Astrophysical Processes

Given their prevalence, these stars can provide valuable insights into common astrophysical processes that have piqued scientific interest. These include flares, which are characterized by sudden, dramatic increases in a star’s luminosity, accompanied by spectral changes.

The structure and rotation of red dwarfs create the perfect conditions for magnetic reconnection, where magnetic energy is rapidly converted into kinetic energy. This results in bright flares, detectable across a broad spectrum of electromagnetic radiation, from X-ray photons to visible light and radio waves.

Such phenomena help us better understand the magnetic activity of astronomical objects, including the Sun. Given the impact of solar flares on the Earth’s magnetic field, astrophysicists strive to predict their occurrence better. By studying red dwarfs, we can expand our sample of flare stars and gain a better understanding of how and where stellar flares occur.

A solar flare captured on August 31, 2012, by the Solar Dynamics Observatory. The image shows a solar flare and its associated prominence, captured in two ultraviolet wavelengths: 304 and 171 angstroms. Image source: Flickr.

Studying flares provides fundamental insights into plasma physics and helps us unravel the mysteries of stellar structure, evolution, and magnetic fields.

The Search for Extraterrestrial Life

Many of the exoplanets we’ve found — planets orbiting stars other than the Sun — are orbiting red dwarf stars.

This histogram shows the number of discovered planets in relation to stellar mass. The red line indicates the mass boundary for red dwarf stars. Any data point to the left of this line represents an exoplanet orbiting a red dwarf. Data source: NASA Exoplanet Archive.

The low luminosity of red dwarfs allows astronomers to discover and study exoplanets using all available methods, including direct imaging.

This image captured by the James Webb Space Telescope shows an exoplanet orbiting the star Epsilon Indi A. The star at the center of the image is blocked to reveal the exoplanet to the left. July 2024. Image source: NASA, ESA, CSA, STScI, Elisabeth Matthews (MPIA).

Planets near red dwarfs may be inhospitable to life due to intense flare activity. However, two factors make such exoplanets the most likely hosts of life outside the Solar System:

  • The long lifespans of red dwarfs.
  • A large number of planets orbit within their habitable zones.

These conditions provide enough time for life to emerge and evolve.

The red dwarf Trappist-1 might have the most famous planetary system so far. There are as many as three planets in the habitable zone! Image source: NASA/JPL-Caltech.

Applied Research

Studying red dwarfs helps us create more accurate models for predicting flares on the Sun. This is a crucial practical problem for our technologically-dependent civilization. Solar flares pose a significant threat to communications satellites, navigation systems, space-based scientific instruments, and astronauts both in Earth orbit and on future deep space missions.

It is known that our Sun changes its activity with a period of approximately 11 years. Physicists studying the Sun also distinguish longer periods of activity: 100 years and possibly 200 years, and radiocarbon analysis suggests the presence of a thousand-year cycle. Many climate theories distinguish cycles of solar activity as a significant factor influencing the Earth’s climate now and in the past. Therefore, it is interesting to study the cycles of other stars in order to better understand their nature. The first candidates for such studies are red dwarfs — it turned out that the closest star to the Sun, Proxima Centauri, has a magnetic activity cycle of 442 days, similar in nature to the Sun’s 11-year cycle.

Red dwarfs, being relatively easy to observe, provide valuable opportunities for studying exoplanets. For instance, the transit method is widely used to search for exoplanets: when a planet passes in front of its star, it causes a detectable dimming of the red dwarf’s light.

The dimming of a star can be detected when its flux changes: the energy received on Earth per unit time per unit area of the detector.

Data Astronomers Collect and How They Analyze It

Light curves are one type of data that astronomers analyze. Light curves are obtained by observing the changes in brightness of celestial objects over time. Astronomers use telescopes equipped with sensitive cameras to measure the amount of radiation reaching Earth from the object.

Astronomers can construct detailed light curves by analyzing numerous measurements, essentially creating a “heartbeat” monitor for celestial bodies. However, unlike medical professionals, astronomers don’t always make observations under consistent conditions at fixed time intervals, which makes the data analysis more complex. Light curves help astronomers understand how and why a star’s luminosity changes and provide insight into processes occurring at or near the star’s surface, such as stellar flares, eclipses, and pulsations.

Light curve of a red dwarf flare

Modern astronomy deals with massive amounts of data from ground- and space-based telescopes that observe billions of stars and galaxies. Processing and analyzing this data requires powerful computers and sophisticated algorithms to identify patterns and make groundbreaking discoveries.

For example, the Transiting Exoplanet Survey Satellite (TESS) has amassed a data archive of approximately 250 TB, which is the equivalent of streaming 167,000 full HD movies.

Ground-based telescopes are still crucial for observation. This is because the size of a space telescope mirror is limited by the weight capacity of launch vehicles and the need for a rigid, thermally stable structure. Ground-based telescopes don’t have these limitations and can be equipped with larger mirrors, allowing astronomers to see farther into space.

Wide-field surveys play a vital role. Unlike traditional telescopes, which are pointed to observe specific objects, survey telescopes have wide fields of view and regularly scan the sky to collect data on multiple objects at once. This observation method makes it possible to detect rare events (such as supernovae) and observe multiple variable stars.

The Zwicky Transient Facility (ZTF) is a wide-field astronomical survey at Palomar Observatory in California. It’s equipped with a wide-field camera and can survey the entire northern sky every three nights. ZTF collects terabytes of data each night, capturing hundreds of thousands of objects and their changes. With this telescope, astronomers have discovered thousands of supernovae, numerous asteroids and comets, and many other transients — phenomena that last for a limited amount of time.

Astronomy enters the era of big data, where modern telescopes produce vast amounts of information. Because of that, astronomers now employ machine learning methods to classify objects, like variable stars and galaxies, and detect rare astronomical events or anomalies. Specifically, Isolation Forest is often used to identify stellar flares. It’s a machine learning model designed for anomaly detection.

How the SNAD Team Searched for Flares

In their new research, the scientists set out to find at least 1000 flares from red dwarfs in the ZTF survey. The challenge was that they couldn’t manually review 100 million pre-selected light curves to identify flares — that was simply not feasible. That’s why scientists have turned to ML methods.

As is often the case with fundamental research, there was very little data for training ML models. The ZTF data included only 134 known flares found in the team’s previous study. That wasn’t enough to train a binary classifier, so the scientists decided to simulate light curves for flares as a positive class.

The TESS space telescope is currently active in Earth orbit, and although its primary mission is to search for exoplanets, it also collects light curves from nearby stars to help detect red dwarf flares. The SNAD researchers used those light curves to simulate flares on ZTF.

It was critical for us that TESS could detect flares of different shapes, and our classifier was able to classify most of these flare shapes accurately. We modeled the signal errors typically produced by the ZTF equipment into the simulation.

Anastasia Lavrukhina, Sternberg Astronomical Institute of MSU

To avoid the problem of class imbalance, the training sample contained approximately half of the generated light curves of “flares” and non-flares. The training dataset included about one million records.

The results were supposed to be verified by experts, so the researchers first optimized metrics related to precision.

Only about 6% of the flare candidates predicted by the first trained classifier turned out to be actual flares. The team wasn’t happy with this conversion rate because it would take approximately 1000 person-hours to find 1000 flares. Some analysis brought to light several reasons for the low conversion rate.

Asteroids

When an asteroid within the Solar System passes near a star’s projection on the celestial sphere or when an asteroid’s orbit crosses the sight line to the observed star, the light reflected from the asteroid is added to the light coming from the star. Such an event looks like a flare on the light curve.

About 15% of the 51,000 candidates produced by the first classifier were asteroids. Regrettably, there were no previously unknown asteroids among them. However, the researchers believe that some undiscovered asteroids must be in the ZTF data, so this data and the created classifier might help people who wish to discover one.

The occultation of a star by the Solar system’s asteroid

Periodic changes in stellar luminosity. The energy emitted by a star isn’t constant. These variations, sometimes periodic and intense, can mimic the appearance of a stellar flare.

This is a variable star. The classifier marks certain parts of the light curve as a flare.

The receiving equipment and the telescope introduced artifacts. Complex scientific instruments working for a long time periodically glitch. We found that about 20% of the images were taken with the telescope defocused, and another 16% had artifacts introduced by a CCD sensor.

It may look like there are three flares in the graph, but analysis of the source image revealed an artifact caused by a CCD sensor.

Other reasons. Satellite passes, cosmic rays, or clouds.

To increase the classifier’s conversation rate, the team’s scientists removed the datasets related to known asteroids and added another classifier (a logistic regression model).

It was trained on light curve metadata. Due to technical difficulties, the metadata couldn’t be used as input for the first classifier. This allowed the researchers to obtain a post-filter that increased the conversion rate for the desired signal to 40%.

We didn’t use transformers in this project, and neural networks were used only in the data preprocessing stage for machine learning.

Everything was solved with classic ML algorithms: Random Forest, CatBoost, and logistic regression as a post-filter.

Boris Demkov, Independent Researcher

The implemented classifier is trained on astrophysical and ROCKET/PCA features:

  • Astrophysical features are calculated using analytical formulas based on light curve data.
  • ROCKET applies convolution to generate 10,000 features from the time series, and then the researchers use PCA to reduce the number of features down to 47.

The researchers decided to migrate part of the workload to cloud storage because the final data pipeline required significant computing resources. The Yandex Cloud Technology Center for Society provided these resources for the project.

This center was created to implement initiatives of social importance using Yandex services and solutions. In such projects, the cloud platform takes on the role of a technology partner, assessing the implementation potential, defining the project’s IT architecture, and providing free access to technology and expert advice.

The general structure of the pipeline looked like this:

General Project Pipeline Structure with Yandex Cloud

Experts from the Technology Center for Society decided to migrate everything related to machine learning and feature extraction to Yandex Cloud, as these are computationally intensive tasks.

  1. All data is stored in the SNAD team’s S3 storage and passed through an endpoint to Yandex DataSphere.
  2. Positive and negative class simulations are extracted from S3, where the negative class represents data without significant variation (no flares). Then, preprocessing starts in the cloud, using Jupyter Notebooks, and we form a training dataset within Yandex DataSphere.
  3. The light curves that show variation are selected to make predictions based on real data. These curves are also fed into the trained model from S3.
  4. At the post-filtering stage, HTTP requests are sent to retrieve metadata from several sources: the ZTF services, SNAD, IMCCE, and other astronomical resources. This is where the logistic regression model is used.

The researchers examined a little more than 2000 candidates and got a sample of 1196 flares from the ZTF data.

Below are some of those flares.

On the left is the light curve of a flare on a red dwarf. Each point is a ZTF observation. When a red dwarf is quiescent, its brightness doesn’t change, as shown by the flat part of the light curve. Flares manifest as rapid increases in brightness, followed by an exponential decay back to the original state within about half an hour.

Flares can have a standard shape with a single peak or look more complex. One possible explanation of complexity is that multiple flares overlapped. On the right are images from the ZTF telescope associated with one of the observations.

Maria Pruzhinskaya, The Sternberg Astronomical Institute of MSU

This flare dataset enabled researchers to analyze the distribution of flare frequencies among M-dwarf stars in our Galactic neighborhood. Using the LSDB tool for massive astronomical catalog analysis, they confirmed a hypothesis that the farther an M-dwarf is located from the Milky Way’s equatorial plane, the less flare activity it exhibits.

Since “higher” M-dwarfs are likely older, this finding also supports the theory that younger stars flare more frequently. These insights are crucial for understanding stellar evolution and its impact on planetary habitability.

Results

In addition to reaching the primary goal of creating the largest catalog of flaring red dwarf stars based on ZTF data, the researchers were also able to achieve several other goals using our platform:

  • Demonstrate that data from space telescopes can be used to generate a training dataset for ground-based telescopes.
  • Try different approaches to the training process, test several ML models, and select the method and model that best fits the task.
  • Build a dataset that can be used to train other efficient models.
  • Find a rare variable star.

Out of 100 million candidates, we selected only 0.02% with the best scores. We were able to verify those 2000 objects manually, and more than half of them turned out to be the stellar flares we had been searching for. Imagine how many other exciting objects and discoveries we are yet to uncover in this massive dataset!

Konstantin Malanchev, Carnegie Mellon University

--

--

Yandex
Yandex

Published in Yandex

Since 1997, we have delivered world-class, locally relevant search and information services. Additionally, we have developed market-leading on-demand transportation services, navigation products, and other mobile applications for millions of consumers across the globe.

No responses yet