Building data science capability using research weeks

Thomas Redfern
Aug 27, 2020 · 11 min read

This blog was co-authored by Thomas Redfern, Rachel Keay, David Stephens, Cate Seale and Pascal Philipp.

Image for post
Image for post
Photo by Chris Pagan on Unsplash

Here at the UKHO, we are always looking for new ways to apply the latest in machine learning and big data research to the marine data domain.

To enable our Data Science team to push the boundaries of what is possible and experiment outside of our current project deliverables, we’ve found success in introducing a designated ‘research week’ each quarter. This research week provides allocated time for our data scientists to try new ideas, techniques and technologies to find innovative solutions when handling marine geospatial data.

In this blog post, five of our team members outline their findings from research week and how these might tie into future projects.

Using weakly supervised machine learning to automate image segmentation label generation: Dr Thomas Redfern

Image segmentation problems require full segmentation masks as labels. However the creation of these labels is time consuming and sometimes uncertain — so our team were interested in finding techniques that could help speed up this process.

In this research week, I investigated the use of weakly supervised machine learning for training an image segmentation model. Rather than using a full segmentation mask as a label, I used a single binary label of 0 or 1 (denoting the presence or absence of water) to label satellite images.

I modified the UNET fully convolutional architecture so that it could be trained with image level binary 0/1 labels. The final convolutional layer that would normally provide the final sigmoid or softmax class probability estimate across a whole image was replaced by a flatten layer, and the outputs were passed to several fully connected dense layers with alternating drop out layers. The final dense layer had a sigmoid activation function and binary cross entropy was used as the loss function. The model was trained with 5,000 randomly sampled image chips (sampled from 16 Sentinel-2 images) and each chip consisted of 12 Sentinel-2 image bands and had a corresponding single value label: 0 — the chip contains no water and, 1 the chip contains water.

Once the model was trained, I used the learnt features of the first convolutional layer in the UNET architecture to visualise what the model had learnt to “look at” in determining whether an image chip contains water or not. Some features activate when water is present, whilst others are sensitive to land. I processed these activations and, using a thresholding technique, was able to generate an image segmentation mask that closely resembles manually labelled data.

In the future, this technique could be used to speed up the labelling process, or to pre-train an image segmentation model on a wide variety of input images for subsequent fine-tuning with more detailed full segmentation masks.

Cloud Detection in Sentinel-2 imagery with random forest and U-Net: Rachel Keay

The challenge with optical satellite imagery is that clouds obstruct or contaminate surface object reflectance, often causing problems with automated land and sea classification and analysis. An average of 67% of the earth is covered by clouds at any one time, so the ability to identify cloud is an important pre-processing task for any satellite image application. Current methods to mask cloud using the European Space Agency (ESA) Sentinel-2 Scene Classification (SC) band has some inaccuracies where bright targets such as sand, snow and open fires can often be misclassified as cloud, therefore eliminating these pixels from analysis. So the aim of this research week work was to apply supervised machine learning techniques to detect cloud within Sentinel-2 satellite imagery to replace the use of the SC band within data pre-processing.

By using eight labelled images from a study from Baetens et al., and downloading the corresponding Sentinel-2 level 1C images, I trained a random forest model and U-Net model for multiclass classification to find low cloud, high cloud, cloud shadow, land, water, snow and no data. Both models use spectral information from RGB, near infra-red and shallow wave infra-red range for classification, but the U-Net will also capture spatial information finding edges and blobs from the convolutions.

For successful prediction with random forest, I performed (1) a grid search with cross validation to optimise hyperparameters, (2) feature selection to identify the best features for this particular data set to remove noise, and reduce computation time and (3) balanced the data set, so the class pixels had similar counts. U-Net did not undergo any pre-processing to balance the input classes. It was trained with sparse categorical entropy, Adam optimiser (with a learning rate of 0.001) for 10 epochs.

So, let’s check out the results on the validation images:

Image for post
Image for post
Image for post
Image for post

The confusion matrices are showing the true positive rate (aka recall): TP/TN+TP. High clouds in random forest were often misclassified as land or low clouds, whereas U-Net did much better with high cloud but struggled with cloud shadow, getting it correct only 41% of the time. Both did well with low cloud, scoring above 90%.

Using the random forest and U-Net models to predict on a final test image demonstrates the improved cloud classification in comparison to the Sentinel-2 SC image:

Image for post
Image for post

The true positive rate for cloud detection is much higher for U-Net and random forest than the SC band with 85% and 79% (compared to 63%) which provides evidence that machine learning for cloud detection is likely to result in a better mask for pre-processing.

To improve this research, I would spend more time on the U-Net trying out data augmentations, input random data batches, and run more epochs! I would also want to improve the confidence of the model by running predictions on several test images.

Pixel-weighted loss for image segmentation: David Stephens

The aim of semantic image segmentation is to partition an image up into one or more different classes. We achieve this using a convolutional neural network ‘deep-learning’ type model such as U-Net, which is trained against pre-labelled images. One problem can be that if the segments are very complicated and have intricate boundaries (think of a complicated coastline), it can often be hard to get the model to precisely delineate the feature of interest.

During the recent research week, I explored ways to improve model prediction at segment boundaries. The approach is to modify the loss function used in model training. The loss function is some function of the difference between the model prediction and the true values, which we are trying to minimise in the model fitting process. In this experimentation I used the Sørensen-Dice loss (SDL) which is derived from the coefficient of the same name and commonly used in segmentation problems. The modification I implemented involves weighting certain pixels when calculating the loss, so pixels towards the edges of a segment will contribute more to the overall loss value than pixels at the centre of the segment. This boosts the importance of these pixels and ‘focuses’ the model on reducing the loss of these edge pixels.

I added an adjustable parameter to the loss, theta, which allows the user to specify the weighting given to edge pixels over non-edge pixels. If theta=1 then all pixels have an equal weighting and the result is equivalent to using standard SDL. theta=2 means that edge pixels contribute twice than that of non-edge pixels to the loss score and so on. This is illustrated below:

Image for post
Image for post

I trained models for the three values of theta given above. The results for a single image of the test set are shown below. The top left is the true label, the other three are the predictions from the models trained using the three theta values. In this example, the higher theta value model does a better job of delineating the complex features. In general, the results from my limited experimentation were promising and it is something we will continue to explore within the team.

Improving imperfect labels via bootstrapping: Pascal Phillip

For image segmentation tasks with imperfect labels, it is possible that some of the machine learning model’s predictions turn out to be better than the assigned labels — a human labeller may have missed a patch of the object of interest in a test image, and the well-trained classifier may get it right. If the model’s predictions have become slightly better than the labels, why not use those predictions as labels for the next round of training? To outline this approach, denote the training data by X and the initial labels by y0:, y0);

y1=model.predict(X);, y1);

y2=model.predict(X); …

Even if the increase in the quality of labels is only very small at each individual stage, if we iterate the process, we should eventually obtain high-quality labels, right? Well, the problem is that there will also be instances of predictions that are not as good as the labels, and this kind of difference may amplify through the iterations. Overall, it seems like a very delicate process with a risk of spiralling out of control.

The aim of this research week project was to trial the bootstrapping approach just described for cloud detection on Sentinel-2 imagery. The provided SCL bands will be used as the initial labels (this way, we have an unlimited amount of initial labels available). The evolution of the labels for two samples is shown in the following:

Image for post
Image for post

The top row shows the image (left), the SCL classes (right) and initial binary labels (y0; centre) derived from the SCL classes. The bottom row shows the labels y1, y2, y3.

Image for post
Image for post

In the second sample, note how some of the detail that dropped out from y0 to y1 is recovered by y2.

The overall conclusion of this brief trial is that there are some positive aspects, but the approach is very delicate. For example, a small change in one of the parameters may cause detections to grow bigger and bigger or to shrink through the iterations. However, the iterations here were carried out completely unsupervised — no gold standard set was used to check the changes — and incorporating a gold standard set or semi-automatic techniques to guide the process could be promising.

Edge detection-based loss function: Catherine Seale

My research week focused on choosing a loss function for semantic segmentation of satellite imagery.

My current task is detecting the boundary between water and land on Sentinel-2 satellite imagery. This is possible using semantic segmentation by considering the image pixels to be in two classes — water and land, and labelling individual pixels with 0 and 1 to correspond to the two classes.

I thought there might be a better alternative than the commonly-used categorical cross-entropy loss, and I wanted to try a loss function that incorporated edge detection. For this task, I was looking to improve the predictions in areas where we have previously encountered misclassifications, they were:

· The detection of small features on the shoreline (like jetties and groynes)

· Features in the intertidal zone (like rocky platforms and mud banks, which may be wet as the tide goes out)

· Misclassification of shadow in images with a low sun angle (from cliffs, tall buildings and trees) as water

I was inspired by the ‘perceptual loss function’ proposed in this paper on single image super-resolution. The perceptual loss function incorporates VGG19, a pre-trained neural network, to extract feature maps from the predicted and the ground truth images. The feature maps, taken from the last activation layer of VGG19, are used to calculate a loss, defined as the Euclidean distance between the feature map of the predicted image and its ground truth counterpart. This proves to be effective for super-resolution, as optimisation is based on the content of the images (as extracted by VGG19), features such as edges, shapes and objects that are more relevant for the task than the value of individual pixels.

With this in mind, I wanted to try to formulate a loss for detecting water boundaries that focused on some aspect of the boundary itself, rather than the error at a pixel level. In this loss function I applied a Sobel filter, commonly used for edge detection, to the predictions and the labels, and returned the Mean Squared Error between the edges detected on each.

To test this, I trained three basic U-Net models for 5 epochs, with three different loss functions:

· Categorical Cross Entropy

· Sørensen-Dice Coefficient

· The new Sobel loss

Examples from the test set predictions found that very small, thin features were recognised by the model trained with Sobel loss. Also shown are the predictions from a previous model we had trained to detect coastline (Coastline v3).

Image for post
Image for post

The Sobel loss function was found to penalise misclassification of shadow as water, as in this example where tall trees are casting deep shadows.

Image for post
Image for post

Complicated areas, such as deep channels in the intertidal zone, are predicted with more intricate details.

Image for post
Image for post

There were cases where the model trained using a Sobel loss function was able to predict thin water features, such as rivers, with apparent improvements.

Image for post
Image for post

The Sobel loss function appears successful and is ready to try on larger data sets and with different training regimes!


Developing new techniques to solve complex problems is an important part of being a data scientist, as off-the-shelf algorithms and techniques only get you so far when it comes to solving new data problems.

Knowing where and how to spend time developing new approaches can be difficult, especially when deadlines are looming. But by protecting time each quarter for our data scientists to explore new ideas and techniques without the pressure of immediate project deliverables, we’ve found that our understanding of complex problems has increased, our delivered solutions have improved, and the overall skill and knowledge level of our team has increased.

We’d recommend the approach is taken up by other data science teams — no matter your domain.

UK Hydrographic Office

A leading marine geospatial information agency and…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store