Uncertainty Quantification in Google Earth Engine using Conformal Prediction

12 min readJan 22, 2024

In this post, I will cover the need for Uncertainty Quantification (UQ), the benefits and drawbacks of using Conformal Prediction (CP) for UQ, and will demonstrate how to use conformal prediction in Google Earth Engine (GEE) to quantify uncertainty for Google’s Dynamic World — A near-real-time global land cover dataset provided at a 10m spatial resolution.

JavaScript Code Repo: https://code.earthengine.google.com/?accept_repo=users/geethensingh/conformal

Earth Engine App: https://ee-geethensingh.projects.earthengine.app/view/conformaluq

Research paper: https://arxiv.org/abs/2401.06421

The need for Uncertainty Quantification (UQ)

All (human or machine learning) model predictions contain inherent uncertainty. Communicating this uncertainty can be beneficial for data creators and data users. Data users may rely on model predictions for decision-making. Being able to understand the confidence associated with predictions is important to prevent erroneous decisions, and manage risk by reducing the overreliance on low-confidence or low-quality predictions and encouraging analytical thinking around the data-generating process.

Data creators can use uncertainty to identify patterns in the uncertainty that shed light on systematic errors, bias, and scenarios that the model finds challenging. This may prompt targeted labeling efforts, the correction of incorrectly labeled data, limiting the operational domain of a model, or iterating on the models and architectures to improve the accuracy and reduce the uncertainty.

In Earth Observation data, this uncertainty may stem from sensor noise, systematic errors (for example, the scanline error in Landsat-7), partial data acquisition (for example, the sparse coverage of GEDI), labeling errors, or from the stochastic optimisation stage in the modeling process.

The benefit of conformal prediction to quantify uncertainty

Conformal prediction, a relatively recent Uncertainty Quantification (UQ) framework has been introduced with numerous benefits over all previous uncertainty quantification techniques. These advantages include;

Coverage guarantee with the validity property— The main benefit of CP methods is its ability to guarantee the inclusion of the correct value in the produced prediction regions with a high probability. For example, if the user specifies a confidence level (1-alpha) of 0.95, the produced prediction regions will contain the correct/true value with a 95% probability or 95% of the time. This coverage can be verified during the evaluation of the calibrated conformal predictor and has been shown to hold for spatial data despite spatial autocorrelation.
Efficiency — Being able to meet the coverage guarantee by returning very wide prediction regions that include all candidate landcover classes is not useful. Therefore, the produced prediction regions should also be narrow to be informative.
Adaptable — Some CP methods can provide wider prediction regions for more difficult prediction instances and narrower prediction regions for easier predictions.
Distribution-free — Unlike many other methods such as Bayesian methods and some bootstrapping methods that make very strict distribution assumptions, conformal prediction does not require any distribution assumptions to be made. Moreover, other techniques such as quantile regression only satisfy coverage under strict data characteristics. Commonly used ensemble predictions only capture epistemic uncertainty attributed to the stochastic components during model fitting.
Model agnostic — CP supports traditional models and neural networks, including Random Forest, XGBoost, NGBoost, Convolutional Neural Networks, Transformers, or even diffusion models. Moreover, it does not require any changes to the model and can be performed post-hoc.
Widely applicable — CP supports an expanding array of tasks including, classification, regression, object detection, generative models, time series forecasting, or reinforcement learning.
Simplicity — The most common CP methods are easy to understand, simple to implement, and scalable. Its use has been demonstrated for local-global spatial datasets with less than 1500 instances or more than 110 million instances (refer to research paper for more details).
Actively researched — While research into other UQ methods seems to be slowing down or very limited, research into CP seems to be showing evidence of ramping up. This suggests that the current drawbacks of CP will be minimised or removed. CP will likely become more efficient with new approaches and easier to use over time as more software packages and frameworks support it.
Pixel-wise — CP can provide spatially-explicit pixel-wise uncertainty unlike Design based area estimation and bootstrapping methods that only provide population-level confidence intervals.

The drawbacks of conformal prediction

Some of the drawbacks of CP that are also the topic of ongoing research efforts (*):

Post-processing — if the CP method relies on, for example, the model output probabilities that are not adjusted after performing post-processing (for example, erosion and dilation to remove isolated pixels during classification), the prediction regions are likely to be over-conservative and less efficient/informative.
*Distribution shifts — CP may fail to meet coverage if there are covariate shifts or label shifts, for example, attributed to changing atmosphere conditions or applying a conformal predictor calibrated in one area or period for a different area or period. The only assumption of CP is exhangeability between the calibration and inference nonconformity scores. This implies that the instances can be switched without changing their distribution.
*Data issues — Missing data and labeling errors are some common data problems that may pose challenges to maintaining the validity and efficiency properties of CP.

Conformal prediction in GEE

As part of recent research, my collaborators and I implemented some common CP methods natively in GEE JavaScript and Python for classification and regression that support both image (raster) and feature (vector) collections and have made the code publicly accessible.

What is required for Conformal prediction in GEE?

To quantify uncertainty using CP in GEE a trained regressor, classifier, or clusterer together with reference data not yet used during model fitting is required for calibrating and evaluating a conformal predictor.

During classification, the fitted model is used to output the probability-like scores for each instance in the calibration and test set. These probability-like scores are combined with the reference labels using a non-conformity function to estimate how a test instance conforms to the training data. A simple but popular nonconformity function is the hinge loss or complement of the probability-like score for the reference class i.e., 1 — p(ref class).

During Regression, the fitted model is used to provide estimates of the target variable of interest (for example, canopy height or above-ground biomass density) for each instance in the calibration and test instances. This prediction is combined with the expected/reference value to provide a nonconformity measure. A simple but popular nonconformity function for regression tasks involves computing the absolute residual (|reference-prediction|).

There are already many resources (including the research paper) that explain how these conformal predictors work therefore this will not be covered here (refer to linked resources below)

Youtube video explaining conformal prediction

Introduction To Conformal Prediction With Python

A Short Guide For Quantifying The Uncertainty Of Machine Learning Models

christophmolnar.com

Practical Guide to Applied Conformal Prediction in Python | Packt

Take your machine learning skills to the next level by mastering the best framework for uncertainty quantification …

www.packtpub.com

A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

Black-box machine learning models are now routinely used in high-risk settings, like medical diagnostics, which demand…

arxiv.org

Guided Example

In this guided example, I demonstrate the steps involved in quantifying uncertainty for Dynamic World. Dynamic World is a near-real-time global landcover dataset with 9 landcover categories produced by Google. This dataset is readily available in GEE and is used to produce an image with 10 bands. 1 band for each of 9 candidate landcover classes containing the output softmax scores/probability-like scores and a ‘label’ band containing the landcover label that corresponds to the highest probability-like score. This example corresponds to the demo examples for the Conformal image classifier in the JavaScript repo (accessible here: https://code.earthengine.google.com/?accept_repo=users/geethensingh/conformal)

The Google team has released a validation dataset (reference labels and estimated probability values from the Dynamic World Neural Network model) that was not used to train the model. I have uploaded this dataset to GEE already and will use it for calibration and evaluation of the conformal predictor.

Step 1: Data preparation

The first step involves preparing the validation dataset for conformal prediction. To prepare the label bands, we first select the reference label band (as opposed to the predicted label band). Next, we remove the unlabelled regions and remove an extra class (seems like an error in the provided validation dataset). Thereafter, we format an image property so that we can join it with the probability bands for the same region.

// Data preparation

// Create a single polygon with a global extent
var globalBounds = ee.Geometry.Polygon([-180, 90, 0, 90, 180, 90, 180, -90, 10, -90, -180, -90], null, false);

// List of probability band names
var bands = ee.List(['water', 'trees', 'grass', 'flooded_vegetation',
'crops', 'shrub_and_scrub', 'built', 'bare', 'snow_and_ice']);

var dwl = ee.ImageCollection('projects/nina/GIS_synergy/Extent/DW_global_validation_tiles');
var dwLabels = dwl
  .select([1], ['label'])//Select reference label band
  .map(function(img){//remove unmarked up areas and extra-class
    return ee.Image(img.updateMask(img.gt(0).and(img.lt(10))).subtract(1).copyProperties(img))
    //hacky method to edit image property
    .set('joinindex', img.rename(img.getString('system:index')).regexpRename('^[^_]*_', '').bandNames().getString(0));
  }).randomColumn({seed: 42});//add random column
  
var dwp = ee.ImageCollection("projects/ee-geethensingh/assets/UQ/DW_probs");
var dwp = dwp.map(function(img){//rename bands, mask 0 pixels
    return ee.Image(img.rename(bands).selfMask()).copyProperties(img)
    //hacky method to edit image property
    .set('joinindex', img.select([0]).rename(img.getString('id_no')).regexpRename('^[^_]*_', '').bandNames().getString(0));
  });

Step 2: Join reference labels and probability bands

We combine the reference label band images with the probability band images based on the common id (‘joinindex’) that we created in the previous step. We will then pass this combined image collection to the calibration and evaluation functions in GEE.

// Join label collection and probability collection on their 'joinindex' property.
// The propertyName parameter is the name of the property
// that references the joined image.
function indexJoin(collectionA, collectionB, propertyName) {
  var joined = ee.ImageCollection(ee.Join.saveFirst(propertyName).apply({
    primary: collectionA,
    secondary: collectionB,
    condition: ee.Filter.equals({
      leftField: 'joinindex',
      rightField: 'joinindex'})
  }));
  // Merge the bands of the joined image.
  return joined.map(function(image) {
    return image.addBands(ee.Image(image.get(propertyName)));
  });
}

var dwCombined = indexJoin(dwLabels, dwp, 'probImage');

Step 3: Calibrate Conformal predictor

To calibrate a confomal image classifier, we need to specify the error/tolerance level (ALPHA). A tolerance level of 0.1 corresponds to a confidence level of 0.9 and will ensure that 90% of instances during prediction will include the correct class within the produced sets.

We specify a SCALE value of 10 corresponding to the spatial resolution of Dynamic World. A pixel represents a 10x10 m area in reality. A coarser SCALE value will reduce the computational overhead but decrease the statistical efficiency of the conformal predictor.

SPLIT — This corresponds to the portion of the data that will be used for calibrating the conformal predictor. In this case, 80% of the image chips. The remainder will be used during the evaluation phase.

Label — This corresponds to the name of the reference label band.

This output of the calibration phase is a feature with a version, qLevel and qHat property. ‘version’ is a user-provided string to identify the run. ‘qLevel’ is the quantile level after performing a finite-sample correction. ‘qHat’ corresponds to the threshold probability score used to determine if a class is included in a set or not.

//import conformal classifier calibration functions
var calFunctions = require('users/geethensingh/conformal:calibrateConformalImageClassifier.js');

// Configuration parameters
var ALPHA = 0.1; // 1-ALPHA corresponds to required coverage. For example, 0.1 for 90% coverage
var SCALE = 10; // Used to compute Eval metrics
var SPLIT = 0.8; // Split used for calibration and test data (for example, 0.8 corresponds to 80% calibration data)
var LABEL = 'label'; //band name for reference label band
// ****************************//

var result = calFunctions.calibrate(dwCombined, bands, ALPHA, SCALE, SPLIT, LABEL, 'demoDW_15112023');
print(result);

Step 4: Evaluate the conformal classifier

To evaluate the conformal classifier, we need to specify QHAT by copying the ‘qHat’ value computed during the calibration stage. This is used to create sets for the evaluation subset and evaluate the quality of the conformal predictor.

Two metrics are used to evaluate the conformal predictor. The first metric, empirical marginal coverage, is used to evaluate the coverage (validity guarantee), and the second, average prediction set size, is used to evaluate the efficiency. To evaluate the empirical marginal coverage, each prediction set is tested to check if the actual reference label is included. Since we specified an ALPHA value of 0.1, 90% of instances/pixels should include the reference label.

For the average prediction set size (efficiency), a value of 1 is ideal and corresponds to an efficient conformal predictor and a very confident prediction model. Values below 1 and greater than 1 (closer to 9 or the maximum number of classes) both correspond to less confident prediction models and less efficient conformal predictors. values less than one are attributed to many predictions having empty sets (length = 0) i.e., none of the classes are greater than QHAT (the ~90th quantile of nonconformity) suggesting the instance/prediction is very different/ does not conform with the training set. Values greater than 1, and closer to 9 mean that many pixels have been assigned more than 1 landcover class. This may be due to mixed landcover pixels and spectral mixing effects.

//import conformal classifier evaluation functions
var evalFunctions = require('users/geethensingh/conformal:evaluateConformalImageClassifier.js');

// Configuration parameters
var QHAT = 0.06067845112312009;

// Evaluate conformal classifier
var result = evalFunctions.evaluate(dwCombined, bands, QHAT, SCALE, SPLIT, 'demoDW_15112023');
print(result);

Step 5: Quantify uncertainty for a new scene (Inference)

The result is a multiband (10-band) image containing a binary image for each of the 9 candidate landcover classes. A value of 1 corresponds to that class being included in the prediction set. 0 corresponds to exclusion. A ‘setLength’ band is also included and corresponds to the length of the prediction sets for each pixel.

// Import conformal classifier inference functions
var infFunctions = require('users/geethensingh/conformal:inferenceConformalImageClassifier.js');

geometry = ee.Geometry.Polygon(
        [[[-124.58837360718884, 42.24132567361335],
          [-124.58837360718884, 32.1623568470788],
          [-113.95360798218884, 32.1623568470788],
          [-113.95360798218884, 42.24132567361335]]], null, false);

// Data preparation - Spatio-temporal filtering
var dwFiltered = ee.ImageCollection("GOOGLE/DYNAMICWORLD/V1")
.filterDate('2020-01-01', '2021-01-01')
.filterBounds(geometry)
.reduce(ee.Reducer.firstNonNull()).aside(print)
.rename(bands.add('label'));

// Select probability bands
var dwp = dwFiltered.select(bands);
// Select label bands
var dwl = dwFiltered.select('label');

// Uncertainty Quantification
// inference function is mapped over every dynamic world image in the collection
var result = infFunctions.inference(dwp, bands, QHAT);

Step 6: Visualise results

An example of uncertainty (set length) for a Dynamic World validation patch. Purple corresponds to set lengths of 1 and brighter, yellow colours correspond to higher uncertainty (set length closer to 9).

In the code snippet below, we visualise the highest probability prediction from Dynamic World, the set length (uncertainty), and all the pixels that include water in their prediction sets. A legend is also provided.

// Visualise results
// Map.centerObject(geometry, 7);
var palettes = require('users/gena/packages:palettes');
var palette = palettes.matplotlib.viridis[7];
var dwPalette = [
    '419BDF', '397D49', '88B053', '7A87C6', 'E49635', 'DFC35A',
    'C4281B', 'A59B8F', 'B39FE1'
  ];
var dwVisParams = {
  min: 0,
  max: 8,
  palette: dwPalette
};
// Visualise DW label image
Map.addLayer(dwl.reduce(ee.Reducer.firstNonNull()).clip(geometry),dwVisParams,'DW first non null labels');
// Visualise Set length (Uncertainty)
Map.addLayer(result.select('setLength').clip(geometry),{min:0, max:9, palette: palettes.matplotlib.viridis[7]}, 'Set lengths', false);
// Visualise Potential water pixels for entire year of 2020
Map.addLayer(result.select('water').clip(geometry), {min:0, max:1, bands: ['water']}, 'Potential water set', false, 1);

// Create and display legends(no changes required)
// set position of panel
var legend = ui.Panel({
  style: {
    position: 'bottom-left',
    padding: '8px 15px'
  }
});
 
// Create legend title
var legendTitle = ui.Label({
  value: 'Dynamic World',
  style: {
    fontWeight: 'bold',
    fontSize: '18px',
    margin: '0 0 4px 0',
    padding: '0'
    }
});
 
// Add the title to the panel
legend.add(legendTitle);
 
// Creates and styles 1 row of the legend.
var makeRow = function(color, name) {
 
      // Create the label that is actually the colored box.
      var colorBox = ui.Label({
        style: {
          backgroundColor: '#' + color,
          // Use padding to give the box height and width.
          padding: '8px',
          margin: '0 0 4px 0'
        }
      });
 
      // Create the label filled with the description text.
      var description = ui.Label({
        value: name,
        style: {margin: '0 0 4px 6px'}
      });
 
      // return the panel
      return ui.Panel({
        widgets: [colorBox, description],
        layout: ui.Panel.Layout.Flow('horizontal')
      });
};

function ColorBar(palette) {
  return ui.Thumbnail({
    image: ee.Image.pixelLonLat().select(0),
    params: {
      bbox: [0, 0, 1, 0.1],
      dimensions: '200x20',
      format: 'png',
      min: 0,
      max: 1,
      palette: palette,
    },
    style: {stretch: 'horizontal', margin: '0px 8px'},
  });
}
// Make Legend//
function makeLegend(low, mid, high, palette, title) {
  // Legend title
var legendTitle = ui.Label({
  value: title,
  style: {fontWeight: 'bold'}
});
  var labelPanel = ui.Panel(
      [
        ui.Label(low, {margin: '4px 8px'}),
        ui.Label(mid, {margin: '4px 8px', textAlign: 'center', stretch: 'horizontal'}),
        ui.Label(high, {margin: '4px 8px'})
      ],
      ui.Panel.Layout.flow('horizontal'));
  return ui.Panel({widgets:[legendTitle, ColorBar(palette), labelPanel], 
  style: {
    position: 'bottom-left'
  }})}
 
//  Palette with the colors
var palette = dwPalette;
 
// name of the legend
var names = [
    'Water',
    'Trees',
    'Grass',
    'Flooded vegetation',
    'Crops',
    'Shrub and scrub',
    'Built',
    'Bare',
    'Snow and ice',
];
 
// Add color and and names
for (var i = 0; i < 9; i++) {
  legend.add(makeRow(palette[i], names[i]));
  }  
 
// add legend to map (alternatively you can also print the legend to the console)
Map.add(legend);
Map.add(makeLegend(0,5,9, palettes.matplotlib.viridis[7], 'Uncertainty'));

Closing remarks

Keep in mind that reporting uncertainty and error is complementary. Data providers should make class-wise probability outputs available as this will allow data users to quantify uncertainty irrespective of the availability of the underlying models and training data. I look forward to the future of conformal prediction and its wide adoption in Earth Observation (EO). For a more detailed look at uncertainty in EO have a look at the linked research paper below and encompassing references.

For more information:

Uncertainty quantification for probabilistic machine learning in earth observation using conformal…

Unreliable predictions can occur when using artificial intelligence (AI) systems with negative consequences for…