How Deep Learning Can Help Us Detect Fish

The shallows of the North Sea serve as a home for Ammodytes marinus, a species of fish more commonly known as the sandeel. The lives of sandeels revolve around patches of sandy seafloor: they burrow into them at night and form shoals above them in the day.

If you’ve ever eaten fish, chances are that fish ate sandeels. On aquatic farms, fish are fed sandeel-laden fishmeal. In the wild, fish species you may be more culinarily familiar with — cod, haddock, mackerel — rely on sandeels as a crucial food source.

Because of their use in producing fishmeal and other products, sandeels are fished extensively in the North Sea. Sandeel landings have historically been large. Following a peak of over one million tonnes caught in 1997, however, they began a precipitous decline as previous fishing practices were no longer sustainable. Regulators in countries such as Norway have thus established fishing quotas to protect the stock of sandeel.

Officials rely on survey data collected in the North Sea, especially acoustic survey data, to come up with a quota for a given year. Since fish reflect sound waves, a combination of trawling and echo sounding allows officials to identify and gauge the abundance of certain types of fish. In Norway, the Institute of Marine Research collects acoustic survey data on sandeel in the North Sea, which are used to assess the health of sandeel stocks and draft recommendations for fishing quotas.

To reach such conclusions about sandeels from acoustic data, researchers must be able to distinguish between sandeel shoals, other fish, and natural features. You can’t estimate the abundance of sandeels in the North Sea without first knowing where the sandeels are in your data. Thus, the problem for researchers is to accurately identify sandeel shoals from an acoustic dataset — a task which, as a group of Norwegian researchers led by Olav Brautaset have investigated, could be quickened by using deep learning methods. We’ll explore their work here.

The Data

Survey data is collected using ships fitted with sonar arrays. In the case of the Institute of Marine Research, survey ships operated with multifrequency echosounder systems. Observations for acoustic backscattering (the measure of how much emitted sound is reflected back to the ship) are taken continuously over time for different sound frequencies and represented in a two-dimensional image format. An instance of such data is pictured below, where time is represented on the x-axis and depth is represented on the y-axis.

Figure 1: An example of echosounder data (i.e., an echogram) taken over a span of 20 minutes. Graphs (a), (b), (c), and (d) represent observations taken at the frequencies of 18, 38, 120, and 200 kHz, respectively. Graphs (e) and (f) are labeled data (sandeel are labeled red). Graphs (g) and (h) are predictions. The line at the bottom of each individual graph represents the seabed.

Once the echosounder data has been collected, labels must be assigned to distinguish between sandeels and other species. This typically involves an operator who manually classifies certain values of backscatter as indicative of sandeels or of other species (see Figures 1e and 1f for examples of manually assigned labels). It is here that we might consider a potentially more efficient and accurate method of classification. Manual classification takes time. It might incorporate the personal biases of operators. Why not automate the process?

The Deep Convolutional Neural Network

Imagine you’ve found a blip in your echosounder data. How would you determine if it was a sandeel shoal or not? You might start out by evaluating the image based on various features — backscattering strength or the shape of the blip, for instance. Using these features, you would be able to draw conclusions about the data or train a traditional machine learning algorithm to do it for you.

What you’ve done, however, is create a predefined feature space. You have manually selected some features from the data that you believe have predictive power in order to make some prediction about whether or not that blimp is a shoal of sandeels.

In a deep convolutional neural network, the extraction of features from the data occurs without human intervention — there is no need for a predefined feature space. While features may have to be predefined to train other machine learning algorithms (e.g., support vector machines), the process of identifying relevant features is accomplished by the deep convolutional neural network itself. Relevant features are learned from the data. A deep convolutional neural network accomplishes this through successive layers of matrix operations performed on an input image. The structure is akin to the human brain, hence the neural in neural network.

Deep convolutional neural networks are composed of both layers with convolution operations and layers with pooling operations. As the network is trained, it learns through a technique called backpropagation, which allows it to find and update weights of effective filters. In convolution, chunks of an image (represented by a matrix) are each ‘multiplied’ by a smaller matrix called a kernel in order to extract important features. Convolution produces an output matrix where each element is the sum of all elements in a certain chunk weighted by the elements of the kernel (i.e., the sum of all elements in the Hadamard product between the chunk of the image and the kernel). The values of elements in a kernel are randomly sampled from some distribution. Through the training process, the network learns the kernels that are most effective.

In pooling, the output of a convolutional operation is reduced in size. A matrix is divided into separate chunks of a certain size, and the maximum or average value in each chunk becomes a value in the pooled matrix.

After enough successive convolution and pooling layers, we reach the final, “fully connected” layer which outputs a list of probabilities for possible classifications. A blip thus becomes sandeel, or background, or other.

Implementation and Outcomes

To train a deep convolutional neural network, the data should be split into three categories: training data, validation data, and testing data. Training and validation data are both used in the construction of the network. The process is iterative, with the network repeatedly being trained on the training data, evaluated using the validation data, and adjusted. After this process, the predictive ability of the network is confirmed through assessing its performance on the testing data.

Figure 4: Adjustment of original annotations to exclude background pixels within annotated portion.

The approach of the Norwegian researchers was to assign acoustic data from 2011–2016 to training and validation, and data from 2007–2010 and 2017–2018 to testing. Input images used in training were crops of the original echograms. The labels from the original data were also modified to exclude background pixels (see Figure 4). Moreover, the training data was adjusted to have an equal distribution of crops containing only the seafloor, crops containing sandeel, crops containing other species, and crops containing various combinations of these features. Predictions produced by the network are shown below.

Figure 5: From top to bottom — echogram input image, image labels (red marks represent sandeels), predicted sandeels.
Figure 6: An example of a false-positive prediction. Here, the network has mistakenly classified parts of a layer of zooplankton as sandeels.

Conclusion

What Brautaset et. al. accomplished was the creation of a deep convolutional neural network for classifying unconventional image data — that is, acoustic data in an image format. More generally, their work represents the expansion of data science and machine learning techniques within the field of marine science. These techniques have the potential to expedite previously burdensome tasks involved in data analysis and make important insights on marine ecosystems more readily available (more on that here).

Scientists and policymakers alike look to a continuous stream of ecological information — sea conditions, samples, acoustic trawl surveys — to make decisions about the health and regulation of fisheries. They must, however, make sense of that information first. Analytical tools like deep convolutional neural networks occupy a critical niche in this system. In other words: they help ensure that sandeel populations remain abundant, that sandeels continue to feed cod, haddock, and mackerel, and that you will continue to enjoy fish.

--

--