Instance Segmentation for Tree Species with Low-altitude Drone imagery

Yu Kai Him Otto
Forestree
Published in
9 min readJan 16, 2024

In this pilot study, a machine learning instance segmentation model for local tree species with low-altitude drone imagery has been developed for ecological survey purposes. The instance segmentation includes individual tree crown delineation and species classification. 20 tree species with their associated scientific names have been trained and collected with the drone images for the machine learning process. To evaluate the accuracy of the ML model, semi-supervised segmented images will be cross-referenced with the deployed ML model.

Introduction

Tree species segmentation can be achieved in two major ways, Spectral reflectance pattern analysis and Machine learning modeling (ML). Spectral signature analysis is a common approach for hyperspectral imaging to extract and segment out by the leaf pigment reflectance. The machine learning and modeling approach is to learn the pattern captured by a large-scale dataset. In general, Spectral signature requires higher sensor requirements, so the hyperspectral camera is more costly than the Machine learning approach.

The drone industry’s rapid development fosters the forestry remote sensing to collect and capture data in an ultra-high spectral resolution (around 5 mm to 10 mm Ground Sampling Distance, the GSD may change with the terrain and flying altitude). Machine learning for tree species identification and segmentation can become a lower-cost method than spectral reflectance pattern analysis. At the same time, Meta has introduced a brand new way of image segmentation with a pre-trained CNN (Convolutional Neural Network), called Segment-Anything Model. Then, the data preparation for machine learning is now becoming easier and faster for deep learning customization. Therefore, Machine learning for tree species classification and segmentation has become a more data-centric task instead of a field-work-based task in forestry remote sensing.

By combining low-altitude drone imaging and machine learning, we have developed a new way for instant segmentation across the images with a deep learning model. The model covered 20 tree species in Hong Kong, which include the native and introduced (exotic) species in our forest, such as the Acacia confusa (台灣相思), Leucaena leucocephala (銀合歡), Casuarina equisetifolia (木麻黃) and Ficus microcarpa (細葉榕).

The machine learning model is used to enhance the ecological survey in the countryside and roadside in Hong Kong. With the use of machine learning in the field of optical remote sensing, more features from nature can be quantified and evaluated, enriching the understanding of our roadside and countryside ecological profile.

Terminologies

Instance segmentation

According to IBM, instance segmentation is a deep learning computer vision task that predicts the precise pixel boundaries of each individual object instance (target class) in an image. Instance segmentation is a subset of the larger field of image segmentation that provides more detailed output than traditional object detection algorithms. It combines the process of object detection and segmentation.

Machine learning

Machine learning (ML) refers to the computer vision algorithms that identify target patterns from the media (such as images and videos). ML was data-centric and case-specified since the ML model was based on the training samples. In remote sensing, ML is popular to use for feature detection with neural networks.

Low altitude drone imageries

The definition of the low altitude is between 30 to 90 meters (98 to 295 feet), generally, the ground sample distance is between 7 to 25 mm. In this study, the drone images were in 7 to 18 mm GSD and flew between 30 to 55 meters.

The target species was selected based on the common species along the roadside and hillslope in Hong Kong. The pioneering species introduced in the 60s, such as Acacia confusa, Melaleuca leucadendron, and Pinus elliottii are now in the decaying stages, so segmenting out those introduced species can enhance the slope stability and the process of tree risk assessment. For the invasive species, like Leucaena leucocephala, which was widely spread across the hillslope, it is better to know and segment them for ecological surveys and studies. The native species were selected in this pilot study, such as Ficus, Schefflera and Macaranga denarius.

List of target species

Data capture and processing

Sensor specification

The training sample was taken by the DJI Mini 4 Pro and in between 30–50 flying height (with respect to the take-off/on-site). Most of the image was taken in near (vertical) format, the camera gimbal was 90 degrees to the ground. Some of the sample images were taken in off-nidar (oblique) format, and the camera gimbal was at 65 degrees with respect to the ground.

Training sample preprocessing

In the machine learning pre-segment stage, the captured drone image was in .jpg (4:3, 8064 x6048) format, to enhance the quality of the segmentation, those images were tiled into a square (1:1, 2048x2048) format. The square tiles can enhance the training process efficiency and proficiency. Since the neutral networks need to scale down (sub-sample) the rectangle images into a square format in the 2048 x 2048. Then, to tile the captured 4:3 image into 1:1 square is a better way than sub-sampling them from 4:3 (full resolution) to 1:1 square (sub-sampled approaches). The trade-off between training time and resolution is ML the higher the resolution, the longer the processing time, then tiling images is an alternative to reduce the processing time, but keeping the original high resolution.

Flowchart of the Binary classification from the input drone imageries

Segment targets preparation

Tree species segmentation has major two parts needed to prepare for ML model training, (1) knowledge of the tree crown and leaf characteristics and (2) crown delineation for the individual tree segmentation. For the knowledge of the tree crown and leaf characteristics, following the API (air photo interpretation) and general plant taxonomy will identify the tree by the leaf shape, texture, tone, shadow, and surrounding associations. The tree species generally can be classified as board-leaf (such as Bauhinia sub.) or needle-like classes (Casuarina equisetifolia). Then, by the leaf color and tonal variations, we can future interpret the leaf with their associated characteristics. For example, both Acacia confusa and Casuarina equisetifolia have similar leaf characteristics, but in terms of the leaf shape and arrangement, they are different. The Acacia confusa has a slightly greener color than the Casuarina equisetifolia (more in yellow and light color).

Upper figure from the drone imagery with two target species, Casuarina equisetifolia (left) and Acacia confusa (right). It was manually classified with the tonal variation across the leaves, since both species have the needle-like texture, then we need to further interpret by the color, tone and zoom into the details.Bottom figure shows the point seeding contours of the tree Casuarina equisetifolia (left) and Acacia confusa (right).

The API will be based on the manual tree species interpretation from the drone images, the segment outline and contour were by point seeding with the Meta Segment-Anything (SAM) model. Point seeding is one of the segmentation approaches with Meta’s SAM model; it first defines the target areas from the image, and then the model will automatically outline those target areas by the similar tone, texture, and color from the seed point. At the same time, tree crown delineation will also be applied with the SAM model for individual tree segmentation.

Machine learning model and approaches

Machine learning approaches referenced the Phenetic classification method from the plant taxonomy, which is based on overall similarities in terms of a phenetic relationship based on data from all available sources (drone imagery).

Phenetic classification

Phenetic classification is a method of classifying plants based on their overall similarity, it focuses on quantitative measurements of observable characteristics rather than relying on the specific evolutionary relationship or the genetic information. In the process of phenetic classification, a set of plant feature attributes and characteristics will be selected and assigned numerical values based on the feature. These values will be used to compute the similarities (distance) between pairs of the observed and database. It is a process to create clusters that reflect the overall similarity pattern among the plants.

Deep learning model deployment

Deep learning models can be deployed by referencing the plant taxonomy’s phenetic classification method. The target of training contains two major aspects, (1) to segment by contour and (2) crown delineation. The ML model will be based on the YOLOv8 Instance segmentation architecture, the model characteristic will follow YOLO as well. In YOLOv8, the model will predict the center of an object directly, instead of predicting the offset from a known anchor box, known as an anchor-free model. Predicting without a bounding box enhances the speed of machine learning progress (such as Non-Maximum Suppression).

Crown delineation and segmentation details

For the crown delineation, since this study aims to segment individual trees, so in the machine learning period, the crown was delineated in a semi-supervised format (Meta SAM). Then, the ML model will be trained from those segment pieces. Some potential problems and variances that may be involved in the crown delineation are (a) segmented correct species, but wrongly delineated tree crown, (b) delineated correct crown, but wrongly segmented tree species, and (c) segmented incorrect species and wrongly delineated tree crown.

Model application and API

The model was trained and uploaded on the internet via (roboflow), with the python package, users can classify and export the classification result with json format or image format.

Sample case and discussion

The trained ML model can be used across the pioneering and exotic species, performance on both hillside and roadside has around 80% accuracy (above 50% confidence level). There is a minor problem in the instance segmentation model due to the limited sample size and perspective error. The perspective error can be solved for a relatively higher altitude drone flying in the aerial surveying process (such as 55 to 60 meters, around 24 mm per pixel and 27 mm per pixel respectively), on the other hand, the segmentation crown delineation error can be solved with a larger training dataset.

In the instance segmentation, there are tradeoffs between segment accuracy and target classes, the more the target classes, the lower the accuracy. Therefore, striking a balance between the ML model accuracy and targeting diversity would be a concern. On the other hand, the data was site and species-specific. For instance, some of the species have a higher confidence level and accuracy because they have been dominated in the training data. Balancing the numbers of target species training data would be an issue to improve for further development.

Hillside performance

Generally, the hillside has a high accuracy of the pioneering species in Hong Kong, especially the Acacia confusa and for the invasive species Leucaena leucocephala has also a great segmentation performance. The overall segmentation accuracy of the hillside is 88% (in average 75.6% confidence) and the crown delineation has middle accuracy with respect to the segmented contour and manual interpreted (with field observation) records.

Roadside performance

The overall segmentation accuracy of the roadside is 91.25% (in average 78% confidence) and the crown delineation has relatively higher accuracy with respect to the segmented contour and manual interpreted (with field observation) records.

Accuracy and evaluation

Overall, the ML model accuracy has its limitation of 75% due to the 20 classes targeted in this study, but still, a limited training sample of each target species was given. Instance segmentation with smaller target species and more balanced sampling strategies can enhance and improve the model accuracy in the future. The low-altitude drone image can roughly segment tree species and delineate the tree crown for tree counting in both roadside and hillside (forested areas).

Recommendations

The low altitude drone imageries tree species segmentation can be scale in a larger scale and relatively higher altitude for future machine learning progress, around fewer target species (such as 5–10 species) for instance classification can achieve higher accuracy than more targets.

Acknowledgments

This is a volunteering pilot study from the Team of Forestree, Remote Sensing and Forestry, used to study the close-ranging photogrammetry, image processing and computer vision.

--

--

Yu Kai Him Otto
Forestree

Student from Hong Kong, studying in Land Surveying and Geo-informatics, PolyU.