Oil-Storage Tank Instance Segmentation with Mask R-CNN

Georgios Ouzounis
11 min readJul 19, 2021

--

Oil-storage tanks in very high resolution satellite imagery

Introduction

Mapping oil and gas related assets is a topic of high interest as it is the first step towards generating business intelligence for a sector that drives the global economy. Geo-politics, armed conflict, climate change, access to resources and other factors are making this sector increasingly volatile and any change, small or big, may have a ripple effect that propagates into production, storage and availability of fuel for the world’s markets.

In this article I will focus in oil-storage tank mapping but the same technology that will be discussed here, can be extended to other relevant assets (oil-tankers, offshore platforms, etc).

The mapping of oil tanks and subsequent measurements (shape, size, tank height and floating roof height, etc.) can help answer questions such as:

  1. how many tanks are in any given geography?
  2. how much oil is being stored in any given geography?
  3. what is the distance between any two tanks? Are safety standards being followed?
  4. where is a suitable location in the oil-storage field to install the next tank in line?
  5. what is the proximity of any given tank to a logistics facility (port, railway, road network, pipeline, etc.)

An interesting article presenting oil-storage tank capacity estimation can be found here:

Relevant Technology

Oil-storage tanks can be identified in satellite or aerial imagery. Finding oil tanks can be done using object detection algorithms which further to classification provide localization information too. Yet it is the extraction (segmentation) of the assets that adds value since precise shape information can be obtained. Typically, and for answering questions 2–5 above, one would expect accurate tank segmentation as a prerequisite.

Being a single class problem (assuming the types of tanks do not matter), one may argue that since they do not overlap in overhead imagery, this is a challenge for semantic segmentation. This is true, though the availability of satellite or aerial images captured at a near-nadir angle cannot be always guaranteed, meaning that overlap may be witnessed under certain conditions. This is rather unusual and more likely to happen in aerial imagery rather than satellite. The only reason I chose to approach this as an instance segmentation problem is to experiment with the most granular form of segmentation.

The solution presented in this article employs the Mask R-CNN algorithm for instance segmentation, which is an evolution of the Faster R-CNN (designed for object detection). I will not elaborate on the theory behind Mask R-CNN as this is beyond the scope of this article, yet the reader is encouraged to read more, if needed, here:

The Code

The code (Google Colab notebook) presenting the end to end solution that is described in this article can be found in the author’s Github repository:

Please note that this is a highly technical article with most of the following being code snippets from the original notebook.

The Data

The data used in this exercise was made available by Airbus Defense and Space Intelligence, and it is a sample image data-set titled Airbus Oil Storage Detection Dataset that can be found in Kaggle.com. Please review the License here and the Disclaimer here.

It consists of a number of satellite images (RGB bands only) showing oil-storage tanks in various geographies. The annotation data is given in a CSV file containing the bounding boxes of all tanks that are visually distinctive. Given that training masks are not available, a considerable stretch of this article deals with data (image) engineering before we get to go through the deep-learning implementation.

Before we start, get a copy of the data-set in your preferred working environment. The notebook shows how to download the contents of the this set directly, by connecting to the Kaggle API.

Let us start by reviewing the contents of the annotation file.

code snippet 1: print the head of the original annotation file
the head of the original annotation file

As it can be seen, for each tank in the image data-set there exists a separate line that lists the image ID (filename — no extension) in which the tank is to be found, the class label (oil-storage-tank uniformly) and the bounds which are the x-y coordinates of the start and end points of the respective bounding box. The latter are given inside a parenthesis. There are a total of 13592 training instances.

Re-format the annotations

The first task is to convert the bounds into a conventional bbox representation (compatible with OpenCV) which is in the form of:

[start_point_x, start_point_y, bbox_width, bbox_height]

For simplicity we create a new data-frame and update it accordingly:

code snippet 2: new dataframe creation
There are a total of: 13592 annotated tank instances
and 6 columns

This data-frame can be used for object detection exercises — see YOLO for example, and it is compatible with the MS COCO annotation format.

Create a new data-set

Next we want to use this information for creating a new data-set; one in which each tank will be contained in a small image chip and for each chip there will be a binary mask containing a single disk that overlaps with the respective tank. All chips are to be of the same size.

The motivation for doing this, is:

  1. to reduce the size of images going into the network without sacrificing spatial resolution, i.e. by re-sizing the input;
  2. fit multiple chips into the GPU, and thus speed up the training

We will create one more data-frame for this purpose but before doing so let us get some info on the tanks and declare some image parsing functions:

code snippet 3: get the max tank width and height from all instances

The max width and height obtained above are (109, 115) thus setting the square chip dimensions to (128, 128) would suffice.

code snippet 4: create the new data-set directory structure

Using the annotations data-frame we will now:

  1. cut-out same-size instance chips from the training images;
  2. create training masks by drawing maximal disks within the bounds of each instance bounding box
  3. create unique names for all chips and store them along with the masks in a new directory
code snippet 5: generate new data set / data frame
example of a new chip and the corresponding mask

Split the image chip set to training and validation sets

Next in line is the splitting of the newly created data-set into a training and a validation subset (9:1 ratio). We use SciKitLearn’s train_test_split() function:

code snippet 6: train and validation split

After completing this, the two sub-directories, train/ and val/ are populated with the selected images and images/ can be safely deleted.

Annotate the chip masks

In this section we will create an MS-COCO compatible annotation file (.JSON) for each of the two sets.

To do so please download the relevant code file which is a slightly altered version of https://github.com/chrise96/image-to-coco-json-converter by chrise96.

# get the relevant file - annotation code
!wget https://raw.githubusercontent.com/georgiosouzounis/instance-segmentation-mask-rcnn/main/annotation/mask2image.py -O /content/mask2image.py
snippet 7: the annotation function

Let us now create a JSON annotation file for the training set:

code snippet 8: create vector annotations for the training set
Created 12231 annotations for training images

and one more for the validation set:

code snippet 9: create vector annotations fro the validation set
Created 1360 annotations for validation images

Model Training

In this section we will configure our data IO (images & masks) driven by the annotation files we created before, create and customize our model, and start training! But before doing so let us set:

ROOT_DIR = "/content/" # in case of the Google Colab notebook

and get:

  1. the modified Mask R-CNN repo (Matterport) for TF2 support, and
  2. the weights of the pre-trained model on the MS COCO data-set for transfer learning.
code snippet 10: get code and weights.

Import the relevant libraries:

code snippet 11: libraries relevant to model training

Define the custom data-set reader. Note that we are using modules from the pycocotools library as it is easier for manipulation JSON files.

code snippet 12: custom data-set reader

Next, we load the annotations into memory using the above parser and display a chip sample to confirm the functionality:

code snippet 13: load annotations
an example of a tank chip and its masked annotation

The image above (left) is a random selection from the annotation file. The image on the right is generated by masking the chip contents (gray scale) using the respective binary mask.

Now we need to customize some augmentation so as to enrich the views of the tanks and hopefully reduce the chances of over-fitting:

code snippet 14: image augmentation

We display the augmentation results on the random chip selected before:

image augmentation example

Next, we customize the configuration of Mask R-CNN. This is one of the most crucial steps. This is because given the default options for a backbone model, resnet50 or resnet101, one can appreciate that the complexity of both, for segmenting such simple shapes (disks) may easily lead to model over-fitting.

code snippet 15: model customization
customization summary

Time to train the model! Note that training even a basic model can take a few hours. In the following we will compute a series of training sessions.

  • In session 1 we will focus on the heads of the model as we are excluding the end layer COCO weights for the purpose of transfer learning. Loss is expected to be a bit bumpy. We will use twice the LR to accelerate computation.
  • In section 2 we will train the full network to fine tune it to our data. We will use the original LR. If no over-fitting occurs we expect a smooth and progressive decline of both the training and validation loss.
  • In section 3, and aiming for bringing the validation loss to about 0.2 we will continue training but with a smaller LR to allow the network to register small changes in weights.
  • Follow-up sessions can be added if the objective is not met.
code snippet 16: model training: session 1/3
summary of training session 1/3
code snippet 17: model training, session 2/3
summary (selected part) of training session 2/3
code snippet 18: model training session 3/3
summary (selected part) of training session 3/3

Model training was completed in 4h using the GPU Runtime.

Brief model evaluation

Here the task is to get a glimpse on how did the training go with respect to the training and validation loss and select the best set of weights.

Lets first take a quick look at the training history:

code snippet 19: training history
a listing of the training and validation losses per epoch

Making this more user-friendly, let us instead plot the history in a graph

code snippet 20: loss history graph
Loss history plot

From the plot above one can see a smooth reduction of training loss which is a good indication regarding the model’s complexity. I.e. no evident signs of over-fitting can be seen. Fluctuations in the validation loss can be due to many reasons. Overall and if one was to fit a polynomial, between the edge points of the orange line, the result would be a curve following the trends of the blue line.

Note that increasing the number of epochs by running an additional training session may not help much as the model seems to be saturated, i.e. it cannot be improved further. It this case re-adjustment of the hyper-parameters is needed followed by a repetition of the training sessions.

Let us now select the best weights and retrieve the corresponding file:

code snippet 21: get the best weights
Best Epoch: 21 0.30374497175216675
Found model /content/Mask-RCNN-hacked/log/oiltank20210712T0647/mask_rcnn_oiltank_0021.h5

Model Inference

In this section we will create an inference model configured with the best weights identified above and compute some predictions on previously unseen images.

code snippet 22: create the inference model and load the best weights

Let us now see an inference demo on randomly selected chips from the validation set; ground-truth vs prediction:

code snippet 23: inference demo
results of the inference demo

Lastly, lets experiment with a small set of test chips to visually assess the performance of our model.

# get the test_chip set
!wget https://github.com/georgiosouzounis/instance-segmentation-mask-rcnn/raw/main/data/test_chips.zip -O /content/test_chips.zip
%cd /content/
!unzip /content/test_chips.zip
%rm /content/test_chips.zip
code snippet 24: process the individual test chips
segmentation results

The results are aggregated in the above figure.

Conclusions

Reviewing the results shows that the model performs rather well but in some cases has hard time separating accurately the the oil-tank perimeter from the tank itself. This is referred to as leakage and is primarily due to the very little information on the background. Recall that all chips are focused on the tank with some minor background seen at the corners of the bounding box and the rest of the chip content is black. This calls for a better annotation strategy.

The second image from the right — top & bottom rows, show examples of leakage.

Regarding the successful detections, they appear to be proper though it is evident that some additional improvement in the model would have been appreciated.

Lastly, we see one False Negative (bottom right image) and that is on a type of tank that is not represented sufficiently in our data-set. More training instances of this kind would help.

About the author

Georgios K. Ouzounis is an executive in the high-tech industry with focus on deep-learning and computer vision. He is an invited professor at the Kaunas University of Applied Sciences, Lithuania, delivering courses on Machine/Deep Learning. Georgios has worked in different academic, government and industry settings across 7 countries and built a rich and diverse portfolio of innovative technologies for remote-sensing and medical image analytics. His research interests are in very high resolution satellite- and 3D medical- image segmentation using hybrid algorithms from graph theory and deep learning. His personal interests are in geopolitics, geography, history and philosophy.

Contact Me

If you found this article to be interesting please remember to hit a ‘like’. You may find me in LinkedIn using the link below. Happy to discuss this and other related matters in greater detail.

--

--

Georgios Ouzounis

Technologist, visionary and thought leader with over 22 years of experience in the front lines of scientific research, engineering and entrepreneurship.