Taking the Snow Test
Using ground optical data to validate snow monitoring with satellites
Monitoring snow cover extent is an important activity for the outdoor tourism industry, for the estimation of snow-water equivalent in the energy sector, for research into climate change and much more. And we can do it at a global scale using satellite data to classify points on Earth as having snow or not. But how do we know if our classification is correct? How can we validate it? To accomplish that, we need to make snow monitoring a more down-to-earth task. Literally.
SLF: Our Earthly Snow Sentinel
Enter SLF — the Swiss Institute for Snow and Avalanche Research. With hundreds of weather stations throughout the Alps, solid research publications, and cutting edge algorithms for all things snow, they make the perfect partners for validating snow data.
In a collaborative effort with the European Space Agency and WeGaw, SLF has installed new snow monitoring stations with HD cameras to build binary maps of snow presence in the Dischma Valley region near Davos, Switzerland. There are five such stations spread throughout the region in a way that optimizes — among other factors — the field of view of the system as a whole, allowing them to create a comprehensive snow map of the region.
SLF then uses a snow classification algorithm on the HD images to create a binary raster whose pixels indicate either the presence or absence of snow. Then they project it to a geographically referenced raster image that we can use for comparison and validation of our own satellite-based snow monitoring product — DeFROST.
Validation 101: How we evaluate DeFROST
Attention: nerdy content ahead.
The validation process is very straightforward. We use the binary raster provided by SLF as ground truth for the presence of snow in the valley and compare it to our classification over the same area. We then tally up the matches and mismatches into four categories: true positive and true negative when DeFROST correctly classifies snow status and false positive and false negative when incorrectly classified (see Table 1). Once we have those counts we can then calculate the metrics to evaluate our performance, namely accuracy, precision and recall. They are defined as follows:
Significance: Why these metrics matter
These ratios are standard ways to measure the performance of binary classification systems and they help us benchmark our product against well-established algorithms, in this case, SLF snow presence classification.
We can also use these values to fine-tune some parameters in our algorithms, keeping in mind that there is always a trade-off to be made between the precision and recall rates. That means that increasing the precision will usually decrease the recall and vice-versa. To find the right balance between the two we consider the F1-score which is a special ratio between precision and recall. The F1-score ranges from zero to one and a score of one means that both precision and recall were perfect. But that is rarely the case so the F1-score is a nice meter of parameter adjustment.
In addition, we can use these metrics to compare new algorithms with the same SLF benchmark and see if they outperform our current methods. That way we can make educated decisions when considering different approaches to solve the snow cover problem.
Results: Did DeFROST ace the test?
DeFROST recall rate performed perfectly in nearly all our tests and the precision was very high most of the time (see Table 3).
With these very high validation rates we feel at ease knowing that our snow cover extent monitoring system performs outstandingly when benchmarked against well-established algorithms from reputable institutions like SLF.
What’s Next for DeFROST
This initial validation shows that DeFROST is off to a great start. We will keep on evaluating the product throughout the year, paying special attention during the in-between seasons when snow cover is more dynamic. We are now working on evaluating the accuracy and precision of our snow depth product. That problem does not fall into the binary classification category and the meaning of these metrics changes dramatically. But that’s the subject for another lesson.