Automatic lithology description: a history of failure

Alexey Kozhevin
Data Analysis Center
8 min readJan 27, 2023

--

Unfortunately, not all launched projects are destined successfully. Sometimes difficulties arise unexpectedly which make it impossible to solve the problem, at least for the moment. We will walk you through our attempt to produce an automatic lithology description and highlight the major difficulties that did not allow to find a solution.

In our opinion, sharing about the “failed” project is none less important than shouting about the successful ones. Such stories enable readers to learn about feasible approaches and deep dive into the task without any commitment. This way future researchers can be aware in advance what things to consider at the very beginning and this article will help to draw conclusions that will save your time and effort.

Core analysis as a part of oil exploration

Seismic exploration is one of the first steps in field exploration. Then comes the stage of a more detailed study of the internal structure
To do this, it is necessary to drill exploratory wells to make log measurements and extract core samples.

Example of a core 1 meter long

A core is the only source of ground truth information about the entire field structure, but drilling several wells and conducting such studies is comparable in cost to the entire seismic survey. For this reason, it is important to choose the correct location for the exploration wells and conduct the most accurate analysis of the well data. Lithology description is a part of that analysis.

Lithology description

The problem of automatic core description attracted our attention because it fits ideally into classification or semantic segmentation task: image as input, lithology class as target. It is enough to get photos of the core, their annotation by experts, and train vanilla ResNet/UNet. What can go wrong?

We’ve got a dataset of 163 wells from 21 fields. Each well includes daylight (DL) and ultraviolet (UV) photos of core, a CSV file with depth binding, and depth-wise lithology description. UV photos are a significant source of information because hydrocarbons glow in ultraviolet light.

DL and UV photos of core

Total, 10 kilometres of annotated core photos! Sounds great to train a neural network, doesn’t it? Let’s dig in!

Challenge 1: Unstructured description

We expected to see several classes of lithologies (sandstone, mudstone, etc.) and their properties (e.g., color and grain size), but the description looks like this:

Sample from initial unstructured description

Unfortunately, each description is unique and does not fit any structure to extract the main lithology and its properties. So first we have to figure out how to extract lithology classes. There were two possible ways:

  • make an algorithm to structure the description
  • structure the description manually

We took the second path, since the development of the algorithm could take a significant amount of time, and the repetition of this procedure is unlikely. It was decided to extract the formation, its color and grain size and, if exists, the same properties for secondary and tertiary formation.

The same structured description

Vigilant reader can notice that it has errors (see Color2 in the first line). At first it is enough to keep only prime properties to predict (Formation1 and Grainsize1) and convert meters to centimeters for convenience. All descriptions and core photos we import to format which allows to work with data in our open-source library PetroFlow.

Lithology description in PetroFlow format

We concatenate lithology and grain size to make classes and also merge some small classes. Totally we’ve got 13 classes (wait… maybe that was the reason for the failure?)

Statistics of the transformed annotation where FORMATION_MERGED is a resulting label of the class

Nice, now we can start training. Let’s stack DL and UV photos into an array of shape height x width x 6 where 6 is the doubled number of RGB channels to use as model inputs. The task can be formulated in two ways. The first one is a classification task where crops of a fixed size belong to one class or another.

Classification task scheme

The second one is a segmentation task where each image is classified depth-wise and the target/prediction is a 1D mask with class label for each depth.

Segmentation task scheme

Needless to say that even though both approaches were tested the results were very poor still (e.g., macro f1-score 0.126). Next thing that came to mind is researching the data that revealed a number of problems.

Challenge 2: DL and UV photo mismatch

Good pair of images must correspond to each other like that:

But we note that there are a lot of flipped, stretched or shifted images:

It turned out that we were training the model on data that has significant errors. So the data set has to be filtered somehow. Since flipped and stretched core photos still look like core photos, we can only detect a discrepancy between the DL and UV.

It is important to note that such filtering does not guarantee that there are no problems in the remaining images. For example, both images could be flipped relative to lithology at the same time. However, assuming that there are relatively few such cases, even a pessimistic researcher at this stage hopes to get a good result.

To find all pairs of DL and UV core photos which don’t correspond to each other perfectly we manually labeled “bad” pairs in the dataset of 768 pairs of photos and trained ResNet18 to classify photos. The resulting model allowed to remove obviously problematic images from the dataset of 26083 images. We trained the model again and found out that the quality is still very low.

Challenge 3: Controversial annotation

A hypothesis was born that there could be errors in the lithology description. To confirm it, experts labeled 150 correctly classified and 150 incorrectly classified photos. This is what the confusion matrix of the results looked like:

As you can see, two experts can label the same data very differently! At the same time, we have got the results of the granulometry. It is a result of a measurement of different grain sizes from cylinders cut out of the core columns. Based on these numbers, the sample can be attributed to one of the types of lithology.

Example of granulometry. Each column is percentage of each grain

There are criteria that make it possible to attribute a sample to one or another class of lithology by granulometry. We’ve investigated granulometry points with tSNE and get the plot:

Spatial axes are 2D tSNE of granulometry. Colored labels are lithology from granulometry.

By design, points are ideally divided into classes. But if one takes lithology from the initial description, then the result will be much worse:

Spatial axes are 2D tSNE of granulometry. Colored labels are lithology from lithologucal description.

This implies the conclusion that the lithological description has a significant discrepancy with granulometry data. At this stage, we realized that the historical dataset is not suitable for training a quality model.

The first way out of the situation is to develop lithology labeling criteria and perform this labeling by several experts in order to aggregate their results and get an accurate average ground truth. The second is to predict grain size directly and then use it to get a lithology class in sample points by some criteria.

Challenge 4: Image resolution

A historical huge dataset includes low-quality images, approximately 75 dpi. The fact is that the data was quite old, and the quality of the photo on more modern projects is much higher. Together with the experts, it was decided that the available resolution is not enough to predict such a detailed characteristic as the lithology. If you start labeling lithology data again, wouldn’t it be better to do it on high-quality photos? The team managed to get access to several kilometres of an annotated core with a resolution of more than 200 dpi. We were even able to find experts willing to take over the annotation process. However, it turned out that there are no convenient tools for labeling the core.

Challenge 5: Annotation tools

Such markup can be done either with specialized proprietary software or by manually filling tables with depths and classes. Then came the realisation that it is better to spend some time developing a special tool, and only then use it to label. Since we already had such experience in our medical projects, the technology stack was clear: JSX/React frontend + python backend.

Interface of core annotator

In addition to developing a convenient tool for setting boundaries on the core, we have added the ability to select formation and grain from a predefined list of classes to make structured lithology description in PetroFlow well format out of the box.

Now there is high-quality data, a high-quality annotation tool, it remains to label data and return to training models. Hope it happens someday.

Conclusion

When the dataset of the annotated core photos was at our disposal, we expected that solving the problem of automatically compiling a lithological description would be possible at least within a year. It will only be necessary to experiment with model training procedures: deal with sample imbalance, choose model architectures and optimizers, post-process and aggregate predictions, etc.) However, it turned out that it was impossible to finish the project, because

  • the existed description is unstructured and can be subjective
  • there are obviously defective photos in the dataset
  • low image resolution which is not enough to train an accurate model
  • it would be possible to create a new annotated dataset, but there is no convenient tool to do it quickly, uniformly, and accurately

These problems can also be encountered in other tasks. For example, we went through similar stages when solving the problem of seismic interpretation.

A well-known principle says “Garbage in — garbage out”. However, it is not always immediately obvious that there are problems with the input data, especially when their volume is large.

The failure of this case and the success of some others have taught us that it is often better to spend time and resources to create a high-quality dataset and then jump into model training.

--

--