Seeing barges through the lens of low resolution imageries

Counting barges and its challenges using low resolution imageries

Fiona Chow
Bird.i
6 min readOct 4, 2017

--

co-authored by Athanasios Sdralias and Fiona Chow

©DigitalGlobe

An essential ingredient for Machine Learning and Deep Learning to work is data. In this context, it is satellite imagery. Landsat images are one of the oldest source of observation imagery which have been around for more than 4 decades, dating back to 1972. Its 30-meter low resolution (low-res) multispectral imageries form the backbone of Remote Sensing and Machine Learning studies and applications such as vegetation monitoring.

Availability of very high resolution (high-res) satellite imagery of less than 50-centimeter that we have today are in their nascent stage as a result of U.S. government relaxing the restrictions on commercial sales of such high-res imagery to non-U.S. government customers in recent years. This has paved the way for problems such as poverty prediction and disaster relief to be solved using Machine Learning.

Comparison between frequency and resolution for various use cases

A key factor influencing the satellite image resolution is the temporal frequency in which change can be perceived.

Changes in consecutive images of a location is perceptible depending on their respective use cases. For example, in temporal images of a forest, changes would not be perceptible if the images were 1 or maybe even 6 months apart. Similarly, changes in crops would not be perceptible if images were a day apart. However, in very dynamic environments such as parking lots or airports where cars and airplanes are always on the move, changes are perceptible even an hour apart.

The more dynamic a scene is, more frequent revisits are required. For example, in the case of crops, changes can be perceived in weeks, hence it would require weekly satellite images to be taken for analysis. This criterion limits the level of resolutions that can be used, as higher resolution imageries tend to have lower revisit times. As a result, most studies and application relating to crop monitoring or yield prediction use low-res imageries.

In our case, we are interested in gleaning insights from the transportation of grain commodities over a river.

Barges — such as the one on the left - are commonly used to haul large volume goods and heavy bulky items via waterways.

Through our research, we found that tracking the locations and movements of barges along rivers had some degree of correlation with the actual grain production numbers. To-date, there are only a handful of countries in the world who have published comprehensive information on grain production as open data. Even if they were published, a time lag is usually expected from weeks to months before such information is made available to the public. This poses as a blocker to people, such as commodity traders, who require such information in time. Thus being able to glean insights from barges could potentially bridge the gap between availability of information and the timeliness of it.

Due to the dynamic nature of barge movements on rivers, we opted to use low-res images to train our models which would then be able to predict the number of barges along the river terminals over time. This problem is tackled using a Machine Learning technique known as Object Detection.

Example of Object Detection in action

Object Detection is one of the key Machine Learning tasks carried out on images. In simple words, it is the process of locating all instances of a specific object (or objects) in a given image. This may sound like a trivial task for a human, but it is rather challenging for a machine. This is because the number of objects, their size and aspect required to be detected is unknown and they vary from image to image.

Traditionally this challenge is solved with Computer Vision techniques. A classic approach involves using sliding windows to extract features from the image. To-date these techniques are still considered powerful tools but they are computationally intensive, requiring a significant amount of time to make predictions. Thus, they are not an ideal solution for solving problems at scale.

The Machine Learning approach of Object Detection involves the use of an extended Convolution Neural Network that generates bounding boxes to localise the object and then finds the best fit of the bounding box around the object. A logical extension to Object Detection is to count the number of objects found in the image.

A model is only as good as the data it is fed with. In this case, data refers to images with bounding boxes drawn over barges. This task is in fact the most time consuming and difficult part of training.

We rely on the boundaries between objects to draw the boxes. Marking barges with bounding boxes poses an additional level of complexity as compared to cars, ships or even airplanes because barges are found attached together while the latter objects are not. Hence, if lower resolution images were used, the latter objects would still be individually distinguishable and able to be marked without a doubt for training.

The following images illustrates this. On the left, we can clearly identify the ships and their boundaries. Despite having an extremely low-res image on the right, a human eye can distinguish the airplanes, their shape and boundaries.

Low-res image of ship port and airport ©PlanetLabs

In our specific case of counting barges (see below), boundaries between barges are clearly visible in the high-res image (left), but that is not the case for the low-res image (right).

High-res vs Low-res image of the same river terminal ©DigitalGlobe ©PlanetLabs

In the low-res image, the finer details are not captured. With this loss of information, barges look nothing more than a larger rectangle. This brings about an uncertainty which needs to be addressed — how do we mark the individual barges when the human eye is finding it difficult to distinguish the boundaries between them.

Well, we know that barges are often found attached together on river tows. So, an approach to overcome this uncertainty would be to mark bonded barges as a whole — 1, 2, 3, 4 etc rather than trying to mark them individually. A flaw to marking attached barges as a whole is that in some occasions the model may mistaken reflective rooftops to be attached barges. Since barges are only found on the river, land features can be removed by creating a mask over the land portions of the image. This allows the model to focus on locating barges which are only on the river.

In addition, a combination of Computer Vision techniques including a classic method of segmentation — thresholding is applied to turn non barges to black followed by a series of morphological processing to discard small residual noises. Given that common barges have a standard size, calculations is then be made on the remaining barge portion to estimate the counts.

These are just some of the many solutions that can be applied to this class of problems. What we do here at Bird.i is extremely exciting and challenging. We believe in breaking through boundaries and making problems which seem impossible and complex — possible. We have only just begun.

Join us as we continue uncovering the world through our work in Machine Learning and Observation Imagery. So do drop us a message, we would love to hear from you! Alternately, if you are facing a problem that involves Observation Imagery and scale, we can solve it for you.

We are Bird.i and our mission is to empower everyone to use images from above, and beyond.

--

--

Fiona Chow
Bird.i

A tiny ninja working in an awesome place called Bird.i. Combining satellite imageries with machine learning to discover the world!