How We Used AI-based Video Analytics to Identify Mosquito Larval Species

Teckwee

Published in

DSAID GovTech

12 min readSep 14, 2021

Project Leads: Dr. Chew Kian Hoe (NEA), Zhao Wenxin (NEA), Deng Lu (NEA), Phoebe Tang (NEA), Jazlyn Ang (NEA)

AI Engineer: Dr. Chua Teck Wee (GovTech)

Engagement & Project Manager: Ruth Cheng (GovTech)

Source: “Mosquito Zone” by frankieleon — Under Creative Commons license

Still itching from that mosquito bite? Mosquitoes are not just annoying; they could also be life-threatening, causing more than one million deaths worldwide yearly.

Fun fact: Did you know there are more than 180 species of mosquitoes in Singapore [1]? That’s more than the United States, which has 176 mosquito species.

According to the National Environment Agency (NEA), the most common mosquito groups in Singapore are Aedes, Culex and Anopheles, of which some species from each group can transmit diseases. For example, Aedes aegypti and Aedes albopictus can both spread dengue and chikungunya while Anopheles sinensis and Culex quinquefasciatus can spread malaria and filariasis respectively. Mosquito species identification is an integral part of a mosquito surveillance programme — done to enable targeted pre-emptive intervention measures.

In this article, we share how GovTech and NEA worked together to explore the feasibility of using Artificial Intelligence — (AI) based video analytics to identify mosquito larval species.

Before we dive into the technical details, can you guess how many mosquito species there are in the images below?

The answer is four, and to be more specific, here are the species of the mosquito larvae:

where ‘^’ denotes lab-reared samples while ‘*’ denotes samples collected from field-collected samples.

Note — these are the exact images ‘seen’ by the AI model.

As you can see, mosquito larval species identification is not an easy task. Some species, namely Ae. aegypti and Ae. albopictus, may show considerable similarities in their morphological characteristics, which makes differentiating them difficult. The problem is compounded by intraspecies feature variation (size, colour, body texture) and stark differences between lab-reared and field-collected samples, due to their breeding habitats.

How is mosquito species identification done today?

Currently, the identification process requires trained analysts to examine taxonomical characteristics of mosquito larvae through a microscope. A mosquito larva observed through a microscope would look like this:

Anatomy of mosquito larvae. (Source: OECD Publication)

A mosquito larva typically has an ovoid head, thorax, and abdomen of nine segments (I-IX). Analysts look for morphological features to identify different species. For example, Ae. aegypti and Ae. albopictus can be differentiated by examining the hooks on the sides of the thorax and VIII segment comb scales.

Comparison of hooks on sides of thorax. Ae. aegypti (left) has strong black hooks, whereas Ae. albopictus has small or no hooks:

Source: Florida Medical Entomology Laboratory

Comparison of VIII-segment comb scales on both species. Ae. aegypti (left) has pitchfork scales; Ae. albopictus (right) has thorn-like scales:

How do researchers approach the mosquito classification problem?

With recent advances in deep learning, Convolutional Neural Network (CNN) — a specialised type of deep neural network for vision tasks — has been explored for mosquito classification.

Different CNN networks are used to classify adult mosquito images. In comparison to using image-analysis methods for adult mosquitoes, studies on mosquito larvae are relatively scarce. Analysing mosquito larvae remains a challenging task due to the high resemblance of various species, especially those within the same groups. The distinctions among different species are difficult to identify even for human experts with the naked eye, as subtle distinctions can only be observed through microscopic scrutiny. Early methods of larvae image analysis mostly rely on manual work, which can be labour intensive and prone to human error. For example, in [2, 4] different body segments of larvae are manually annotated and measured, to study the morphology of various species. More recently, Ortiz et al. [5] proposed using CNN to classify Aedes and non-Aedes species, based on microscopic images of mosquito larvae focusing on VIII segment.

How do we approach the problem?

Inspired by several studies [2–4] indicating that different mosquito species may exhibit different movement patterns, the team explored the use of video analytics to classify larvae of various species. The exploratory work focused on classifying the four most common vector species of mosquitoes found in Singapore: a) Aedes aegypti, b) Aedes albopictus, c) Culex quinquefasciatus, and d) Anopheles sinensis, by observing larval movement patterns.

Workflow Overview

We approached the problem as a sequence classification problem. The processing pipeline is as follows:

Video frames are sampled at fixed intervals, then put through a pre-processor for cropping and straightening. Specifically, the pre-processor contains a detector to detect the larva body, head, and tail. With the detected bounding boxes, the image is rotated such that the head-body-tail position is kept vertical, with the head kept at the top and the tail kept at the bottom. Such normalisation aims to ensure that larval movement can be modelled effectively with a fixed reference position. The rotated image is then cropped to a fixed size of 500x500 pixels. Sequences of cropped images are then fed to the Neural Network model for classification.

The neural network is based on CNN-LSTM encoder-decoder architecture. The CNN module acts as an encoder for feature extraction of image sequences. In contrast, Long Short-Term Memory Network (LSTM) — a type of Recurrent Neural Network (RNN) module — acts as a decoder for sequence (motion) modelling. Such a model is suitable for problems that have spatial-temporal inputs (e.g. sequence of video frames).

The end-to-end processing pipeline took about 11 seconds to process a 5 seconds 120 FPS full HD video clip using a consumer RTX2070 GPU. Most of the processing time (88%) was spent on pre-processing, which includes object detection and normalisation, while only 12% of the processing time was spent on classification. A total of 1285MB of GPU memory was required to run the inference software. Hence, multiple software can run in parallel to speed up the inference on RTX2070 GPU with 8GB memory.

Dataset

NEA provided the team with a high-quality video dataset, consisting of 6,799 short video clips of mosquito larval movements. The videos were captured using a consumer-grade mobile phone under a controlled environment with consistent illumination intensity, Angle of View (AOV) and focus distance. The video resolution and frame rates are 1920x1080 pixels and 120FPS respectively, to preserve maximum image and motion details. It is worth mentioning that the mobile phone’s built-in macro lens can capture a fair amount of morphological details. The video length varied between 5 and 60 seconds.

Example of full resolution video frame and 100% crop of mosquito larva. (Source: National Environment Agency)

Given the sheer size of the dataset and tedious sample collection process, the dataset was provisioned in batches progressively. Early batches came from lab-reared samples and later batches from field-collected samples. The table summarises distribution of the dataset and split for AI model development.

About 92% of the dataset belonged to lab-reared samples; only 8% belonged to field-collected samples that were harder to obtain. There are fewer intraspecies feature variations (size, colour, body texture, movement patterns) within lab-reared samples as the larvae are grown in a controlled environment. In contrast, field-collected samples have diverse features that depend on their breeding habitats (flower pots, clogged roof gutters, tree holes, blocked drains, sunlit brackish pools with algae, etc.). The following figure shows some of the differences in our dataset:

Comparison of lab-reared and field-collected larvae. (Source: National Environment Agency)

Other than the training videos, NEA also provided 80 field samples (20 samples for each class) for a final blind test.

Pre-processing

When recording videos with a mobile phone at the maximum magnification level, we noticed that a mosquito larva typically only occupies about 3% of the video frame. This meant that feeding full resolution image sequences to the neural network model would lead to unnecessary computation. Furthermore, larvae can move in arbitrary directions, which makes the comparison of larval movement difficult. Therefore, we devised a pre-processing pipeline to overcome these challenges.

Firstly, an object detector was trained to localise the body, head and tail of the larvae. A total of 1378 images (981 from lab-reared samples and 397 from field-collected samples) had been annotated, of which 1067 images were used for training, 195 images were used for validation, and 116 images were used for testing. Transfer learning was used to fine-tune the network initialised with COCO pre-trained weights. Subsequently, the larva body region was cropped, and the cropped image went through affine transformation such that the head was always on top of the tail. During the transformation, the image border was padded with background colour. Finally, a fixed size of 500x500 pixels area centred around the larva was cropped. The cropped image served as the input to the CNN encoder in the later stage of the processing pipeline.

The following figure shows the processed consecutive image sequences for different mosquito larval species during a power stroke cycle. It may not be easy for humans to discern the larval species by observing the locomotion patterns, due to high variability of intraspecies samples (living habitat, larva developmental stage, and induced movement strength) and low variability of interspecies samples (different species from the same group); fortunately with the proposed CNN-LSTM architecture, we were able to model the locomotion by learning from a large number of samples.

Examples of pre-processed input for CNN-LSTM classifier. (Source: National Environment Agency)

Model Architecture and Training

The figure above depicts the CNN-LSTM model architecture for mosquito larva classification. The model takes in a sequence of 10 consecutive images. ResNet-18 model is used as the CNN encoder. The network is initialised with pre-trained weights trained on ImageNet dataset.

Generally, CNN features are more generic (lower-order feature representation such as edges, corners) in early layers, and more dataset or task-specific (higher-order representation such as parts of objects or objects) in later layers. As the nature of the mosquito larva dataset is very different from the ImageNet dataset, some parts of the neural network have to be retrained, as depicted by the green layers in the Model Architecture diagram. There are close to 9 million (8,953,728) trainable parameters in the CNN encoder. Once CNN had extracted the visual features, the features are then fed into the LSTM-based RNN decoder. There are 1,086,084 trainable parameters in the LSTM decoder — all trained from scratch. The final Softmax output layer has four nodes, where each node represents a larva class. As a video consists of multiple short sequences, the entire video is classified based on majority voting of the results from each of the sequences.

During model training, Cross Entropy loss function was used to evaluate the model on mini-batches of inputs at each iteration. The equation is given as follows:

where M is the number of classes, log is the natural log, y is the binary indicator (0 or 1) if class label c is the correct classification for observation o, and p is the predicted probability of observation o falling under class c. The mini-batches were generated by randomly applying colour jitters (brightness, contrast, saturation, and hue), as shown below. This was done to improve the robustness of the classifier to handle camera white balance and lighting changes. Note that the image transformation must be consistent for all images in the same sequence (i.e. same colour jitter settings are applied to ten images).

Random colour jitter is applied to image sequences on-the-fly during CNN-LSTM classifier model training phase.

The training batch size was set to 100. The ADAM optimiser was used with learning rate set at 0.001. All hyperparameters, including the choice of CNN backend, the input sequence length, the number of nodes in trainable layers, were determined empirically. The training was terminated when the number of epochs reached 15, and the model with the best validation accuracy was selected. The training required about 30GB GPU memory to run.

Model Performance

In the early stage of model development, where only lab-reared samples were available, we investigated how well the model performed on lab-reared samples only. The model consistently achieved more than 99% weighted-F1 test score for consecutive batches of data. This shows that not only can the model accurately classify species from different groups (Aedes vs. Culex vs. Anopheles), but it also managed to differentiate seemingly lookalike species (Ae. aegypti vs. Ae. albopictus) from the same group. The consistent high performance could be attributed to the homogeneity of the lab samples, even though the dataset was provided in different batches.

Subsequently, the lab-sample trained model was tested on the field-collected samples to validate its generalisability. To our disappointment, the weighted average F1-score reduced from 99.8% to about 89.6%! There was confusion between two species from the Aedes group. It was evident that the models trained using only lab samples did not perform equally well when evaluated against field test data.

Our analysis revealed that there were distinct intraspecies variations in the field-collected samples that affected the inference outcomes. For example, the appearance of lab and field Ae. aegypti looked very different. Despite our efforts in capturing the variability of larvae by randomly adding colour jitter, the adjustment effect was global (i.e. applied to entire image). Such global colour jitter was unable to capture local colour and pattern differences. Subsequent experiments showed that introducing more field samples as the train/valid data helped to improve the classifier accuracy significantly. Following that, the model was retrained by augmenting existing lab samples with partial field samples. The final model achieved almost 100% weighted-F1 test score on the final blind test samples from the field.

Colour and pattern differences in thorax and abdomen between lab-reared sample (left) and field-collected sample (right) of *Ae. aegypti*. (Source: Source: National Environment Agency)

Conclusion

Together with NEA, we demonstrated the viability of using neural network to classify the four most common mosquito vector larvae species with good results. By training a concatenated CNN-LSTM neural network model using both lab-reared and field-collected data samples, we managed to achieve close to 100% weighted-F1 test score in this project. However, running the concatenated model with an elaborate data pre-processing method is computationally expensive. The model is proven to be able to recognise the four mosquito larvae species effectively, but we are uncertain how it will handle new species that are not in the training set. The current neural network model will likely have to be retrained for every addition of new mosquito larvae species in the modelling. Nonetheless, we believe that this work has set the stage for developing complex AI models to handle challenging object classification problems.

References

Lam et al., “Mosquitoes (Diptera: Culicidae) of Singapore: Updated Checklist and New Records”, Journal of Medical Entomology, vol. 56, no. 1, pp. 103–119, 2019.
D. Strickman, “Biosystematics of Larval Movement of Central American Mosquitoes and Its Use for Field Identification”, Journal of the American Mosquito Control Association, vol. 5, no. 2, pp. 208–218, 1989.
L. Eleanor, H. Kim, and R. Jeffrey, “Distinct Navigation Behaviors in Aedes, Anopheles, and Culex Mosquito Larvae”, The Journal of Experimental Biology, vol. 223, 2020.
M. Burrows, M. Dorosenko, “Rapid Swimming and Escape Movements in the Aquatic Larvae and Pupae of the Phantom Midge Chaoborus Crystallinus”, Journal of Experimental Biology, vol. 217, pp. 2468–2479, 2014.
A.S.-Ortiz, A.F.-Radilla, A.A.-Jalife, M.C.-Hernandez, M.N.-Miyatake, D.R.-Camarillo, V.C.-Jimenez, “Mosquito Larva Classification Method Based on Convolutional Neural Networks”, in Proc. of International Conference on Electronics, Communications and Computers (CONIELECOMP), pp. 1–6, 2017.