On Data fusion for daily Snow probability maps

Daniel Meppiel
Wegaw

--

It’s Autumn in the Alps: this period of the year is understood by most to be the beginning of snowfalls that make many trails slippery, hard and therefore dangerous — or just not enough to make a good descent on skis. There is just too much uncertainty on such conditions to make the trip worth it for most people. However, for the locals with enough experience, autumn is still the best hidden secret for great outdoor activities.

Data on snow is today (1) scarce, (2) too coarse and (3) does not reach the public in a simple enough, understandable manner.

At WeGaw, we are solving that problem with global snow maps at 20 meters resolution generated every day. But not without overcoming important challenges!

Satellite data for snow detection

The European Space Agency unleashed a new era in space data analytics thanks to its Copernicus programme, launching the Sentinel Earth Observation satellite series from 2014 — including optical (Sentinel-2) and radar (Sentinel-1) sensors. Raw data from these satellites is publicly made available for free (e.g. at ESA or AWS).

While optical satellites record light reflectance across the entire spectrum, radar satellites record the return of energy in different radar-frequency bands. These are key data sources for the monitoring of the Earth, allowing among many other applications to detect snow after several layers of pre- and post- data processing algorithms are applied. For example, in the case of optical data:

The Normalized Snow Difference Index is found by using the visible and shortwave infrared reflectance values
Light reflectance scene (left) and after NSDI processing (right). The right index value has to be used as a threshold to filter out actual snow from the scene (usually 0.4). Clouds on the lower left side obstruct the view.

However, extracting, assembling and packaging such snow data together is challenging mainly because:

  • Different satellites have a different revisit time over a given location on Earth. In addition, they also have different resolutions ranging from 20m to 500m. How to ensure an homogeneous, rapid daily update?
  • Satellite operators typically preprocess scenes for atmospheric corrections, accurate georefencing and mapping data projection — but this takes an average of 7 hours and up to 24 hours after sensing. Near real time data is made available in under 3 hours, but such complex preprocessing is left for the end-user to be done.
  • Satellites send back raw data in chunks, called scenes, that need to be merged together to conform any map for a wider area. Scenes may overlap together and their readings might differ for the same given location.
  • Optical satellite sensors cannot help with finding snow under cloudy areas. Radar sensors can — but only melting snow in a reliable manner, at least for now.

Probability theory to handle uncertainty

At any given date, it might be we are not able to get an accurate, deterministic reading from a satellite sensor for a given point on Earth due to clouds, satellite revisit times, coarser resolutions or boundary NDSI index levels as you read above.

At WeGaw, we decided to fuse data based on probabilistic logic.

The aim of a probabilistic logic (also probability logic and probabilistic reasoning) is to combine the capacity of probability theory to handle uncertainty with the capacity of deductive logic to exploit structure of formal argument.

DeFROST.io snow probability map for 28th November 2019. Only pixels above 75% snow chances are displayed.

Every pixel in our world map is assumed to have a probability of having snow. Multiple data sources are used to influence this probability up or down every day. The degree of influence a specific data source has will depend on its quality, recency and resolution. This allows us to ingest and fuse data not only from satellite observations, but also from weather nowcasting models (e.g. temperature, precipitation) and create an homogeneous snow cover map. Because every data source has more value than no data at all!

The key issues to solve when working out such probability model are:

  • How is data quality defined in this context, and how significant are the different sources of data?
  • By how much should we increase or decrease a probability value based on the data input quality and significance?
  • How do we handle pixels with equal or very low probability for both snow and no snow values (undetermined pixels)?
  • Where do we draw the line for determining a clear snow or no snow value for our users?

The above points can be tackled with a well-defined ruleset that needs to be adapted to the type of data being fused, while extensively validating the output (said Machine Learning?).

Conclusions

DeFROST is delivering daily global snow data via an API and a Mapping Service based on thresholding the probabilities of our own snow probability model working out of fusing data from 5 different satellites and a weather nowcasting model — with a latency of under 15 hours from satellite sensing time to map delivery.

The two daily snow maps delivered by DeFROST.

While internally we handle and deal with probabilities, both maps are released in binary format: either snow or no snow in a single color code, with at least 75% and up to 100% probability in all cases. Except for advanced and science users, simplicity is king.

The resolution of these maps can drop up to the lowest resolution of the different data sources used, depending on weather conditions and revisit times at any given location. The system we have built will only get better as (1) we keep on adding more fine-grained sources of data and (2) learns by itself with high-resolution training data for ML algorithms. We are now pushing the boundaries for the most up-to-date, comprehensive and rapid mapping of snow globally ever built.

Similar to weather forecasts, our data cannot be taken as the full, complete ground truth (for now) — but as a pretty good indicator for outdoors enthusiasts on what to expect out there and a decission-making tool to better plan each outing. No matter the season.

--

--

Daniel Meppiel
Wegaw
Editor for

Building products powered by Satellite Data | CTO & Co-Founder at WeGaw