Deep Learning & Synthetic Data for Asteroid Landings

Using deep learning and synthetic images to find safe landing areas on asteroids.

8 min readSep 13, 2022

Introduction

The goal of this modeling process is to find rock-free areas of the surface of the asteroid Bennu using only synthetic images of a lunar surface for training. The use of these models could have saved over 18,000 hours of manual image review in the OSIRIS-Rex NASA mission.

Read on if you want to find out how my custom deep learning model outperforms the industry standard U-Net model with this semantic segmentation class, with only 3% of the complexity.

If you are curious about the potential scale and impact of this approach, see my previous article about the in-depth value of using these approaches.

The Data & Preprocessing

There are two datasets used for this model creation: Romain Pessia’s and Genya Ishigami’s Artificial Lunar Landscape Dataset (ALLD) and the Bennu Global Mosaic (BGM).

The ALLD is a dataset of synthetically created images of lunar surfaces with pixel-by-pixel labels of boulder, rock, sky, and background classes. This dataset is used for the initial training and testing of algorithms. There are 9,766 synthetic images of 480x720p resolute in the dataset. Each image has an associated prediction mask, which makes semantic segmentation the best approach to handle this modeling opportunity.

To match the problem I am solving, I’ve simplified the classes so that there is only a binary class — “rock” and “not rock”. I did this as we don’t want our spacecraft hitting any rock, regardless of shape and size. In addition to this simplification, I also preprocess each image by passing it through Sobel edge detection, for later use in custom models.

Sample of ALLD with original visual (left) and mask (right) image data.

Sample of preprocessed ALLD with binary class mask (left) and Sobel edge detection (right).

The BGM is a high-resolution real image of the entire surface of the asteroid, with each pixel representing 5cm distance and a total size of 15708x7854 pixels. The BGM is used to test algorithm performance on actual NASA data. I break down this image into separate individual images that are the same size as the ALLD dataset (480x720p), for easy processing after model training.

Bennu OSIRIS-REx OCAMS Global PAN Mosaic 5cm v1 | USGS Astrogeology Science Center

Zoomed-In 480x720 Sample of the Global Bennu Mosaic

Evaluation Structure

Model Scoring

To appropriately evaluate each of the semantic segmentation models, we need to have consistent metrics. Each model produces precision and recall metrics for each class across all pixels and images. I use the precision and recall metrics to generate an F1 Score for each model. The F1 Score is the harmonic mean between precision and recall.

The training and test sets consist of the ALLD dataset in a 70% training and 30% testing split. The final validation set reserved for the entire surface of the BGM. Once model training is completed, the two models with the best F1 score on the validation set are used in a final evaluation where it will predict rocks on the primary landing site “Nightingale” on the surface of the asteroid Bennu. As I do not have access to the image segmentation data on the real images, the comparison of models is based on how well the models identified the two landing areas for the mission.

High-Level Model Design

Weighted binary cross-entropy is utilized as the loss function for all neural networks. This loss function allows us to have the model focus on predicting one class over the other. In the ALLD dataset, the average ratio of positive to negative class pixels across all the images in the training dataset had approximately 10 times the number of negative classes than positive classes per pixel.

The cost of predicting a flat surface instead of a rock (a false negative) is much higher than predicting a rock and there actually being a flat surface (a false positive), so the best model is one that has a mix of the highest accuracy and recall. To combat this, the cost of misclassifying missing a rock is weighted ten times more than missing an empty area.

Each model is using an Adam optimizer, with a learning rate of 0.0001 and 480x720 pixel images and each network uses a Sigmoid activation function for the final class prediction probability.

Model Architectures

To spare us some time, we will take a look at the best two models: the standard U-Net model and my custom Y-Model.

U-Net

The U-Net architecture was originally developed in 2015 for biomedical image segmentation. The architecture’s design and rigorous testing in other research make this architecture an excellent base-line model for comparison.

U-Net architecture uses two different “paths”, called contracting and expansive paths. The paths deconstruct and reconstruct an image in a way that outputs the desired predictions while maintaining as much of the original image as possible. U-Net has just over 31 million trainable parameters. So, this is a relatively complex model as far as parameters are concerned when compared to the Y-Model. A visual of the U-Net architecture is below.

U-Net architecture as designed by Olaf Ronneberger, Philipp Fischer, and Thomas Brox

Y-Model

The Y-model architecture is shaped like a horizontal letter “Y”. The Y-Model architecture attempts to capture information useful for predictions from two versions of the input image: the raw image and the raw image with Sobel edge detection applied. This model contains about 1.4 million parameters and has 4.5% of the trainable parameters as U-Net.

What is distinct about this model is the fact that it has a branch dedicated to the ingestion of a preprocessed image. In this case, the preprocessed image has Sobel edge detection applied to it. The idea of using Sobel edge detection in conjunction with the original image, is that I can provide the model with what I know is important information about the objects to be predicted.

This means that the model wouldn’t have to learn that edges were important by itself. Instead, I know that in order to identify if something is a rock or not, it usually needs to have a solid edge to it. With this intuitive knowledge, I provide this information to the model in order to narrow the possible feature space that needs to be explored.

Custom Y-Model Architecture with a Raw Image Branch (for ingestion of original images) and Processed Image Branch (for ingestion of preprocessed images).

Results

After training for 20 epochs the model performance on the training and test sets are evaluated. The performance for training and testing sets are as follows:

U-Net F1 training score: 71.4%
Y-Model F1 training score: 59.4%
U-Net F1 test score: 73.1%
Y-Model F1 test score: 63.8%

Friendly reminder- the F1 Score is the harmonic mean between precision and recall.

By using these metrics alone, it appears that U-Net is the better model. However, these metrics are only part of the story.

Visually Reviewing Test Set Images

In addition to the metrics, I get a better sense of how the models are performing by reviewing samples of images from the dataset. In the samples, the brighter yellow the pixel is, the higher the probability the model is predicting that the pixel belongs to a rock. From the samples of the synthetic images from the ALLD, it is easy to see that U-Net clearly best matches up to the original mask.

The Y-Model prediction mask is messier due to the simpler model structure. The Y-Model also has a greater difficulty discerning the background class from rocks, as is clear across the horizon. You can see this clearly in the Y-Model prediction mask where the background is a dull green/blue. The big question is… does this behavior hold true when we look at real images?

Visually Reviewing Validation Set Images

When we compare predictions for the Nightingale landing area (which is the official landing area of the mission), we get a clear sense of the difference in U-Net and Y-Model performance in real life.

If you’ve peeked ahead to the images, you may notice that the model performance seems to have flipped.

We can clearly see that U-Net does not perform as well as the Y-Model. The U-Net model misses a significant portion of the rocks in the landing area, while Y-Model cleanly identifies the NASA landing area. This is a critical indicator of the U-Net model over-fitting on the synthetic images. This over-fitting means that U-Net has learned features from the synthetic images that are not applicable to the training set. Without this knowledge, I might have had recommended a poor-performing U-Net model.

Original Image of the Landing Site Nightengale

U-Net Heatmap Prediction of the Landing Site Nightengale

Y-Model Heatmap Prediction of the Landing Site Nightengale

Conclusion

The Y-Model provides satisfactory performance when we look at the landing site Nightengale, while U-Net performs better on the synthetic data. This is a twist if we were only looking at the training data, as U-Net could have been misconstrued as the highest performing model, given the improved performance on the synthetic data over the Y-Model.

In addition to this discovery, it is important to note that it took NASA volunteers over 18,000 hours to search the surface of the asteroid, but in order to get results for the entire surface of the asteroid Bennu from the Y-Model took about 2 minutes. To give you an idea of how fast this prediction is vs. asteroid size— Bennu is as big as the Empire State Building.

In conclusion, by using a less complex model and doing some of the image preprocessing up front, I was able to build an effective model using only synthetic images that quickly and effectively identifies rock-free areas on asteroids.

As a bonus, below is the resulting image across the entire surface of the asteroid Bennu from the Y-Model. It has landing sites Nightingale (the actual landing site) and Osprey (the backup landing site) labeled. To give you an idea of the scale of some of the rocks, I’ve labeled rock “A” — which is as big as a house.