Deep Learning for geophysical images segmentation

6 min readDec 19, 2022

This is the post about the work I’ve done as a research assistant at Stanford. I was lucky to have Tapan Mukerji as my advisor, his guidance helped me to avoid a lot of pitfalls and keep going when I felt stuck.

This post is based on my paper. The task I was working on was seismic facies classification with deep learning, and I’d like to start with a high-level overview of why this is a problem worth solving.

Introduction

Seismic data is used to understand the subsurface structure and, ideally, to quantify some properties indicative of where hydrocarbons may be deposited. Figure 1 shows what seismic data may look like. Seismic facies reflect the depositional environment which in its turn tells us where it’s better to look for hydrocarbons. Different seismic facies are distinguishable by the configuration of seismic reflections (refer to Figure 2), which is basically by their visual appearance. This is why computer vision tools are directly applicable to this task.

Seismic data is the signal reflected from subsurface heterogeneities, and the features of interest are the amplitude, signal frequency, coherency, and dip of reflections. The dataset I’ll be referring to is 3D, so it is a volume rather than a set of independent images.

From an ML perspective, the task can be formulated as an image segmentation problem. For each pixel of the input seismic image, I needed to predict a facie it belongs to. My goal was to try out different architectures and to find the one applicable on the scale of real-world datasets. A single volume of seismic data can be from hundreds of Mb to several Tb big, so the computational time is extremely important.

Data

I used 3 different datasets to perform a meaningful comparison: one synthetic and two real ones. I’ll describe one in more detail to provide context. The entire dataset was a 3D volume but I took a single line for training and another line for testing to make it manageable. Each line is basically a single image, therefore for fully convolutional models I augmented data by chopping a training line into multiple smaller pieces and flipping and adding noise to each. Some conventional augmentation techniques like random rotations, zooming in/out, vertical flipping are not applicable to seismic because of its nature. Such modification may produce geologically meaningless data or even transform data to something completely different affecting label distribution.

A huge problem with labels when working with seismic data is that the ground truth is unknown, strictly speaking. Labels are just a personal interpretation of a particular geoscientist, thus model accuracy is not a reliable estimate of model performance. In Figure 2 manual interpretation of seismic sections is shown. There are gaps in interpretations in places that I was uncertain about.

Figure 2. Manual interpretation of the seismic sections shown in Figure 1. These are the labels used for training/testing

Metrics

The metric I used for the quantitative assessment of models was plain accuracy. Another metric that was not related to quality but rather to the applicability of a solution was computational time. Apart from that, I qualitatively examined the predictions to make sure they make geological sense and overall look realistic. The qualitative examination in this case was way more important than it usually is.

For the loss, I used ordinary cross-entropy. Fully convolutional models require all pixels in their inputs to be labeled, I, therefore, needed to fill in the gaps in labels with some constant value. Then during training I made sure to zero out the loss corresponding to unlabeled pixels. The dataset was somewhat imbalanced but the training seemed to be doing fine without addressing it.

Models

I ended up experimenting with the following three architectures: 3D convnet [1], fully convolutional dilated 2D net [2], and a well-known U-Net [3]. Below is a high-level description of the key features of each.

The 3D convolutional model has a conventional architecture of several convolutional layers followed by a fully connected head. 3D convolutional layers are its main highlight. 3D convolutional layers leverage the 3D nature of input data while a fully connected head outputs a single probability value. It predicts each pixel individually, one at a time. For predicting a single point a small 3D subvolume of data around it is used to provide the model necessary context.

A convolutional dilated 2D model is a fully convolutional model meaning it doesn’t have fully connected layers and both its input and output are 2D. The feature of this one is dilated convolutional layers. In the second half of a model — sort of a decoder part — each next layer has a dilation factor double the one in the previous layer. It allowed to significantly increase the field of view of the final layers of the model.

Figure 4. Dilated fully convolutional net

U-Net is another fully convolutional net but much deeper than the previous two. Because it’s so deep, it has skip connections which allow gradients to flow directly from the decoder part to the encoder part.

My experiments involved tuning hyperparameters, including the size of an input subvolume used for a 3D architecture.

Results

As expected, a 3D model ended up being the slowest one, and prediction times were prohibitively long. It was robust though, and changes in hyperparameters didn’t affect the accuracy too much. Its accuracy was overall relatively high, and the prediction shown in Figure 5 looks geologically sound.

Figure 5. Prediction of the 3D convolutional model

Dilated net turned out to be unstable with its predictions, it produced very different results depending on hyperparameters (Figure 6).

Figure 6. Predictions of the dilated fully convolutional model

U-Net ended up being the fastest and robust. Its prediction performance was very good, on par with the dilated net but much more consistent.

Below are the tables summarizing the performance of the architectures under consideration both in terms of accuracy (Table 1) and computational time (Table 2). Training times are shown per the entire training, thus the number of epochs for each architecture is different. What we care about the most is prediction times, and this is what is used to assess models. Clearly. U-Net beats the other models being both robust and high-performing.

*Table 1. Accuracy values of the models obtained on different datasets*

*Table 2. Performance of the models measured with a Tesla V100 32GB GPU*

A fun part is visualizing predictions in 3D. In Figure 8 channel bodies predicted on multiple consecutive slices are shown. Again, the result looks geologically meaningful, and it took under a minute to produce it.

Figure 8. A geobody obtained from UNet predictions

A side note

Distribution shift is a problem I ran into and the one especially relevant for seismic data. Seismic data is spatially heterogeneous by nature as a representation of the subsurface. Therefore, if we produce a label for a particular part of a volume, it won’t be representative of the rest of the dataset. On the other hand, trying to capture the entire distribution by labeling multiple parts of a volume is prohibitively expensive. I experienced the issue by observing a significant training/test accuracy gap.

The issue is a very interesting topic as it is probably one of the major blockers preventing the wide adoption of DL in the industry. I attempted to address it in my further work by designing a semi-supervised pipeline, which is a separate topic.

References

[1] MalenoV (MAchine LEarNing Of Voxels). https://github.com/bolgebrygg/MalenoV.

[2] Pradhan, A.; Mukerji, T. Seismic inversion for reservoir facies under geologically realistic prior uncertainty with 3D convolutional neural networks. Proccedings of the SEG, 2020.

[3] Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. Proccedings of the MICCAI, 2015, pp 234–241.

Deep Learning for geophysical images segmentation

Written by Sergei Petrov