Automating Tree Health Monitoring from Images with Machine Learning

Jan Dirk Wegner
EcoVisionETH
Published in
6 min readJun 10, 2020
Examples of learned image features (CNN activation maps) for a defoliated tree: (left) input image, (second from left) vertical gradient indicating the distribution of branches, (third from left) separation of tree from sky background, (right) delineation of empty space in the canopy.

Trees seem to be the superheroes of the climate crisis. Forests provide a vast array of ecosystem services, store large amounts of carbon, and by continued sequestration, they strongly contribute to the terrestrial carbon sink (Pan et al., 2011), essential to mitigate global warming. Monitoring tree health is important to manage our forests sustainably and to strengthen their resilience to adapt to local and global climate change.

How can we effectively monitor tree health?

Tree defoliation is one of the main indicators of tree health, with 0% defoliation corresponding to healthy trees, while 100% defoliation indicates dead trees. This indicator is usually assessed manually by experts that survey tree stands visually in the field at selected sites across the whole country (e.g., a 16×16 km grid in Switzerland; Dobbertin and Brang, 2001).

Example of different defoliation levels, from left to right: 0%, 10%, 25% and 55%.

Our goal is to automate this process to ease analysis at a large scale of tree health; we hope this helps understanding how our forests are threatened by various factors such as land use change, air pollutants, and climate change.

Although much effort has gone into large-scale forest monitoring to understand the impact of environmental pollution and a changing climate on tree health since the 1980s (Lorenz, 1995), most efforts are still based primarily on field surveys. However, manual inspections of field plots are very laborious leading to sparse data in terms of spatial and temporal resolution.

We thus advocate developing deep learning tools that enable automated monitoring of vegetation health at a large scale automatically.

Tree defoliation estimation from images

Our idea is to automate tree defoliation mapping by using crowd-sourced data, where volunteers acquire images of trees (e.g., while hiking) that can be processed with a mobile app.

Such an app would have a deep learning back end that automatically interprets the images and sends geographic coordinate and defoliation level to a central database. This database collects all observations and puts them on a map, where the data can be combined with other valuable variables like a terrain model, climate data, overhead imagery from space-borne and airborne sensors as well as species distribution models.

Analyzing trees in the wild is tough

Visual inspection of trees is very challenging due to their vast variation in species, shape, height, texture and color. Although species-specific models could potentially predict tree stress more accurately, we lack sufficient training data per species to train such a model. We thus build a single, species-agnostic model to predict individual tree health from ground-level images.

Designing hand-crafted rules for the tree health analysis from images is rather impossible. This problem can be overcome by applying a deep learning approach, where the features and decision rules are directly learned from existing reference data. In general, convolutional neural networks (CNNs) are the perfect choice for our task of assessing tree defoliation, since by design, they implicitly compensate for the most disturbing effects in our images caused by lighting variations, objects that are off-center, and cluttered scenes. We choose to use an adaption of the ResNet architecture (He et al., 2016) as it is a network that is proven to perform well in a large variety of tasks.

Tree Defoliation Data

To evaluate the feasibility of our approach, we rely on reference data collected by professional arborists. This setup also allows us to train the CNN model and validate the performance of our approach by comparing it to the assessment quality of human experts.

Example photos of the dataset showing trees with different defoliation levels.

Human experts visit individual trees in forests distributed all over Switzerland, acquire one photo per tree with an off-the-shelf camera and assign a defoliation value.

Our Swiss dataset contains 2,108 images with assigned defoliation values, acquired between the beginning of July and the end of August. This is because deciduous trees lose their leaves in winter, thus visual health indicators like the die-back, leaf discoloration, and crown transparency are invalid outside the growing season.

Photos are mostly well centered on a single tree and captured with an appropriate zoom level. However, due to dense, complex forest scenarios, the tree of interest can be partially occluded, dense forest may appear in its background, and lighting conditions vary largely. In general, samples of high defoliation are rare whereas most samples show low defoliation.

Examples of difficult cases from left to right: 100%, 100%, 100% and 80% defoliation.

Results

We perform 5-fold cross-validation for our experimental evaluation in order to avoid any train-test split bias. We ensure a roughly equal distribution of defoliation values across all five folds. We use the mean absolute error (MAE) to measure the performance of our CNN model on the test data. Because we run 5-fold cross-validation, we provide the average mean absolute error of the 5 individual MAE values. In order to indicate the performance spread across the test data sets, we also provide the corresponding standard deviation.

We plot predicted defoliation values versus human assessment for the same images, where the color represent the number of images that fall into an interval. Our model achieves a 7.6% mean absolute error with a standard deviation of 0.3%.

Results of 5-fold cross-validation for training and predicting on our dataset of 2,108 images.

Comparison to human expert performance

How good do human experts agree when estimating tree defoliation? In order to compare the performance of our CNN-model to human experts, six different experts repeatedly judged the defoliation of the same trees on a subset of the data by looking at the images.

We plot the individual expert assessments against the average defoliation value across the six experts serving as a ground truth (right Figure below). Similarly we plot the CNN predictions against this ground truth (left Figure below). Astonishingly, the results do not show a big difference between the performance of the CNN and of the human experts. In fact, the human average mean absolute error of 4.6% is only 0.9 percent points lower than our CNN approach which yields 5.5%.

(Left) Results with our CNN-model: 5-fold cross-validation on a subset of 384 images compared to (Right) Repeated assessments of six different human experts on the same dataset. In both cases, the average defoliation value of all six human expert assessments per tree acts as ground truth. Note that the color bar for human assessments (Right) has a different scaling because each image is assessed 6 times (versus 1 time by the CNN in (Left)).

How can we put our solution into practice?

The results show that there is a lot of potential in our proposed approach. We see two major directions that need to be followed. First, more quality reference data should be collected to improve the CNN’s generalization capability. Second, a mobile app-based crowd-sourcing campaign will lead to an increased spatial and temporal coverage of defoliation assessments, beyond the limited number of sites that can be visited by experts.

We hope that our approach will lead to better monitor and maintain tree and forest health under a warming climate.

For the interested reader, you can find all the details in our publication:

Kälin, U., Lang, N., Hug, C., Gessler, A., Wegner, J.D.: Defoliation estimation of forest trees from ground-​level images, Remote Sensing of Environment, vol. 223, 2019, pp. 143-​153.

References:

Dobbertin, M., Brang, P., 2001. Crown defoliation improves tree mortality models. For. Ecol. Manag. 141, 271–284.

He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778.

Lorenz, M., 1995. International co-operative programme on assessment and monitoring of air pollution effects on forests-ICP forests. Water Air Soil Pollut. 85, 1221–1226.

Pan, Y., Birdsey, R., Fang, J., Houghton, R., Kauppi, P., Kurz, W., Phillips, O., Shvidenko, A., Lewis, S., Canadell, J., 2011. A large and persistent carbon sink in the worlds forests. Science 333, 988–993.

--

--