ML IN SEISMIC INTERPRETATION

Facies analysis

Getting acquainted with data

Stepan Goryachev

Published in

Data Analysis Center

10 min readApr 15, 2021

In the last few decades, machine learning is taking over all areas of our life. Just the expertise of a human specialist is not enough now — superhuman performance is what the business needs on a daily basis.

Many routines need to be automated to save experts time for more nontrivial tasks since the complexity of problems the industry faces is rapidly growing. Oil&Gas is no exception, especially its upstream segment. Readily available hydrocarbon reservoirs are almost exhausted, and new fields with complex structures introduce challenges never met before.

Source: https://www.gazprom-neft.ru/press-center/infographic

Field development always starts with exploration, and the faster and better it is completed, the greater the benefits. Exploration, in turn, is composed of many important steps — from geological surveys to deposits structural modeling. One such step is seismic interpretation — detecting different “underground regions of interest”.

The final goal of the whole process is finding porous oil-saturated structures. And when it comes to porosity in a sedimentary context, the sand appears to be your best friend. That is why detecting such things as fluvial channels and alluvial fans is essential — they are literally made of sand. And they are collectively called facies.

Porosity is actually not the only predictor of oil presence — there is also permeability. Since the article aims to deliver knowledge on a subject for more of a data-scientist type of guy rather than a geology expert, I will resort to some simplifications here and there. And it’s also not sand, but a sandstone 😊.

However, there is a somber fact of real-world data — uncertainty. The presence of oil in porous regions of the underworld depends on a set of conditions.

If you find sand, there might sit oil. Or it might not.

Defining those conditions goes beyond the scope of the current article. I’ll just note that after sandy facies are found, and theoretical conditions of oil presence are met in the desired region, exploratory drilling kicks in. That’s the way to verify oil presence.

This article is the first in Seismic Facies Segmentation series. The name is self-explanatory, I guess. We want to learn how to find facies on seismic data. But there is a long way to go. Before diving into the “stacking layers” routine, we better get at least a glimpse of data internals. Let’s start by visualizing and analyzing inputs and labels provided by human experts!

Gentle intro

First things first, interpretation starts with a seismic cube in your hands. It is never a proper cube, actually, but a cuboid 🤓. However, the cube term is more well-established in the industry. The data is a set of stacked 1D signals representing amplitudes of an artificially generated seismic wave at a given depth range for a fixed point on earth surface.

If you are completely unfamiliar with a seismic cube concept, I strongly recommend reading our introduction to seismic interpretation.

So, having a cube of amplitudes, we can now start looking for regions of interest, namely facies, in it. But what even a seismic facies is?

Basic data overview

Seismic facies are 3D bodies different from their surroundings in a seismic cube volume. Most sand facies are usually labeled by experts along horizons — boundaries between two rock layers. In other words, most such objects are composite two-dimensional polygons.

Understanding seismic horizons is essential for further reading!

Horizon is represented by point cloud in cube volume, where every (inline, crossline) pair corresponds to no more than one point. Therefore we can look at such entities “from above” and then project them on a plane orthogonal to depth axis, without losing any information. It’s a useful trick since combining both seismic data and human-made labels in three dimensions simultaneously is rather nontrivial.

Let’s load and visualize a horizon labeled by a human expert on one of the seismic cubes from our data storage. If viewed as a rectified plane, it can be visualized in two basic ways:

First and most obvious — as a map of its depths. Color encodes depth here: the greener, the deeper.
The other way is to use horizon depths as a mask to “cut out” amplitudes values from corresponding cube depths.

Note, that cube data is min-max normalized on all further figures.

As I’ve already mentioned, facies are usually labeled by experts along the horizons. It means that from a data-driven perspective (not geological!) they contain facies.

Armed with this knowledge, let’s now visualize an alluvial fan mask labeled along the horizon.

As can be seen from the figure, the fan mask is quite ambiguous. It has a complex ragged border and a structure sometimes poorly discernible in amplitudes. In some areas, it’s not clear why the expert labeled the fan that way. However, the fan at the given horizon is actually quite contrasting. Desired facies are often much less visible or take the smaller areas of the horizon they are contained in.

Mild early signs of task non-triviality

For a moment, you may think that you’ve caught the idea of how to segment regions of interest — simply choose areas from the upper amplitudes range, i.e., label the brightest parts! Actually, it depends. First of all, sometimes facies are contained in the lower amplitudes range. Just look at the horizon below. The fans here are labeled over dark areas. So that’s the first no-go.

But that’s not the only problem. Let’s mask the horizon subset, where its amplitudes are greater than some handpicked threshold, and inspect the results. Aside from several minor differences between the two masks, there is one quite important — contrary to human expert labels, the lower right horizon area has been highlighted by the threshold mask.

The expert did not label that region because it’s not an alluvial fan but a shelf. The shelf is just similar to the fan when looking at amplitudes, and it is not the only one of misleading structures. There are actually plenty of various amplitude anomalies that have nothing to do with desired sand facies.

Therefore, just clipping data by some threshold is nowhere near a good solution for the facies segmentation task. And even skilled human experts can make mistakes while solving it. Actually, there are multiple caveats here:

Poor data quality due to noise and cube assemble errors.
Anomalous tricky structures (that are not actually sand facies, like shelves).
Low reflectivity of some regions of interest (e.g., due to its thinness).

These nuances make double- and triple-checks obligatory, which entails greater time for task completion.

Alternative views on data

Seismic cube data is basically all the information that one can operate during interpretation (considering exploratory drilling data is not yet available). However, there are possible various views of that data obtained by different transformations. In theory, it can be basically everything if you are creative enough, and it helps to “highlight” regions of interest.

In practice, if one builds a predictive model, uses some data representation as input, and aims to obtain reproducible results, used input better make some sense.

Views of cube data are usually called geological attributes. In data science, these are known as features. The most commonly used ones are instantaneous amplitude and phase. Both are obtained via a Hilbert transform of some signal. It results in an array of complex numbers. One needs to calculate each number's modulus in an array to get its instant amplitude. And the angle gives an instant phase.

In Python, you can do it like this:

And that’s how a result for a randomly generated signal looks like:

Those not familiar with signal processing might already wonder why even make these transforms on our data. And unfortunately, there’s no simple and short explanation for that. So, for now, I’ll limit myself to the saying that instantaneous amplitude and phase generalize the concepts of constant amplitude and phase correspondingly for aperiodic signals.

Calculating instantaneous attributes can also be seen as extracting information about local signal properties. That is an example of feature that makes sense.

So, each cube trace can be transformed the same way we did with the sample signal above. After transformation, values along horizons form the following pictures for each attribute correspondingly. Each in its own manner changes the visibility of desired alluvial fan. These data representations can’t do all the work for us, but for sure, they help.

What is also important is that these representations are interpretable — at each horizon point, attribute value describes local properties of the trace that goes through this point. It means that one can, in a way, “peek” both above and below the horizon while staying in two dimensions. That is nice.

Facies gallery

Now, when you are familiar with various views on task data, let’s enrich our dataset a bit and visualize all features you’re already familiar with (except for instantaneous phases, since the visibility of facies on this attribute is weak).

And here come the fluvial channels! I’ve mentioned them already earlier, in the introduction. They are also sand facies, but unlike fans, have ribbon-like structure. You might notice that their visibility on horizon attributes is sometimes even more questionable than the fans’.

Facies rarity

Now, when we looked at our data in a rather straightforward manner, we can think a bit about its relative properties. For example, let’s see how various attribute distributions look like for fans and channels against the same attribute distribution of horizon they contained in.

Instantaneous phase histogram is mode-centered on all figures below to eliminate the phase shift effect, which results in “rugged” plots. Also, the range of values is kept non-normalized to empathize the sense of the attribute.

You can see that both amplitudes and instantaneous amplitudes distribution along the fan tend to stay in the distribution tail but cover more than half of the horizon histogram domain. That is exactly the property of the data, from which follows the fact we’ve checked already above:

You can’t just clip cube values along the horizon by some threshold and expect the result to be a decent facies mask.

As for the instantaneous phase, fan data distribution resembles the one of its horizon, just scaled. That doesn’t give us much insight and is related to the fact we’ve already seen above — on instantaneous phases attribute, the facies are not really visible. And that also perfectly makes sense because while labeling reflective surfaces, experts aim to keep their phase constant. So it’s more of a horizon quality map than the feature highlighting potential facies.

It’s hard to miss that fluvial channels attributes distributions are roughly the same as alluvial fans. But a bit more smeared and uncertain, actually 🤷. Even without knowing yet how computer vision models perform on given data, one might suspect that doing segmentation on fluvial channels data is harder than on fans.

Summary

As shown, seismic facies are not easy to segment with a simple “rule of thumb” with decent quality. This procedure requires a more compound approach, and it is not trivial even for a human expert.

There are no textures here, at least in common sense. So that a no-go to all hopes that ImageNet SOTA models will turn over the tables here. Moreover, labels made by experts might contradict each other sometimes. That’s the problem one also needs to deal with if they hope to obtain a suitable dataset for deep learning model training.

Оne must agree that the task of semantic segmentation on seismic data has little in common with conventional real-world photo segmentation.

But the hand-labeling procedure's complexity is not the only flaw of manual interpretation. Human expert labels desired facies along the given horizon, which results in a one-pixel-thick surface. It lacks information about the structure's depth range because, as mentioned earlier, seismic facies is a 3D geological body.

To be continued

That’s where machine learning kicks in. Our team came up with a whole set of solutions for facies segmentation. Training the most complex of designed models takes no more than half an hour, and making predictions for every horizon takes just a few seconds. Contrary to the expert’s hand-made labels, the model’s output is a 3D object. Using volumetric facies data in field structural modeling allows more accurate estimates of oil reserves and future drilling parameters.

In the following articles of the series, we will share our approaches to the facies segmentation task itself.

Stay tuned!👋