Crop classification via satellite image time-series and PSETAE deep learning model

Published in

GeoAI

9 min readMay 10, 2023

In modern agriculture, crop classification plays a crucial role in identifying and mapping different crop types. However, this task poses a significant challenge for policymakers due to the complexity involved in differentiating between crop types. Nonetheless, accurate crop classification provides essential information for government authorities and farmers to make informed decisions regarding crop management. The growing accessibility to satellite imagery with high temporal and spectral information and advancement in machine learning methods paves the way for automated monitoring and management of agricultural production and land use on a large scale. Traditional deep learning models like UNet typically use multispectral imagery from a single time period, while PSETAE takes into account both multispectral and temporal features of satellite imagery. By analyzing the changes in crop characteristics over time, PSETAE can effectively classify different crop types in satellite image time series.

In this blog, we will focus on using the modified version of PSETAE architecture, a transformer-based deep learning model originally developed by Garnot et al. (2020), for our task of pixel classification of satellite image time series into different crop types.

Satellite image time series

Earth observation time series is referred to as a collection of satellite images of a location from different time periods, stacked vertically resulting in a 3D structure. The collection has a common projection and a consistent timeline. Each location in space-time is a vector of values across a timeline as shown in Figure 1 [2]. The combination of a temporal component with spectral information allows for the detection and classification of complex patterns in various applications such as crop type and condition classification, mineral and land cover mapping, and so on.

Figure 1. Time series of satellite imagery

Imagery and labels

The training sample polygons were generated from a cloud-free image sequence of 2018, consisting of nine monthly mean images with seven bands (Blue, Green, Red, NIR, Red edge 4, SWIR 1, SWIR 2) each, acquired using Sentinel-2 satellite imagery at a resolution of 10 meters. The Export Training Data for Deep Learning tool in ArcGIS Pro is used to export training data for the model. The input satellite time series is a composite of rasters or multidimensional rasters for required time periods or time steps. The Create Multidimensional Rasters from a Collection of Images video demonstrates the steps to create a multidimensional raster from a collection of images.

Training labels can be created using the Label objects for deep learning tool available on the Classification Tools drop-down menu. The Classification Tools menu is found in the Image Classification group on the Imagery tab. All tools described above require an Image Analyst and Spatial Analyst license. Eleven crop types or class labels were retrieved from the USDA cropland (30m) layer during the labeling process. Pure pixels are labeled into different classes, based on the available information from the cropland layer. The labeling of crop types is shown in Figure 2.

Imagery and labels were collected in California due to the wide distribution of crop types.

Project template consisting of multi-dimensional imagery, labels for training, and inferencing. https://geosaurus.maps.arcgis.com/home/item.html?id=c8f4d9f6e68549498fa2f059159e246d

Figure 2. Samples of labels from different classes overlayed on a time-series raster.

Time-series classification using PSETAE

PSETAE was recently added to the arcgis.learn module of ArcGIS API for Python. This blog will explore the workflow showing how the model can be initialized, trained, and deployed.

PSETAE workflow in arcgis.learn

PSETAE architecture is based on transformers, originally developed for sequence-to-sequence modeling. The proposed architecture encodes time-series of multi-spectral images. The pixels under each labeled polygon are given by a Spectro-temporal tensor of size T x C x N, where T is the number of temporal observations, C is the number of spectral channels, and N is the number of pixels. The model uses this tensor with unique spectral and temporal profiles to infer the type of class.

The architecture of PSETAE consists of a pixel-set encoder, temporal attention encoder, and classifier. Refer to

How PSETAE model works and Satellite Image Time Series Classification with Pixel-Set Encoders and Temporal Self-Attention for details on the architecture.

Export and prepare training data

First, use the input raster and training labels to export the raster chips in the RCNN Masks metadata format using the Export Training Data for Deep Learning tool. This tool is available in ArcGIS Pro as well as ArcGIS Image Server. Set the following parameters in the tool:

· Input Raster — Add a composite or multi-dimensional raster consisting of a cloud-free image sequence, with nine monthly mean images with seven bands (Blue, Green, Red, NIR, Red edge 4, SWIR 1, SWIR 2) each. Follow the steps in the Create Multidimensional Rasters from a Collection of Images video to create a multidimensional raster from a collection of images.

· Input Feature Class — Add the training labels.

· Tile Size X and Tile Size Y — 256.

· Stride X and Stride Y — 128.

· Meta Data Format — RCNN Masks.

· Environments — Set the optimum cell size and processing extent.

The resulting path from the export tool is provided to the prepare_data function in arcgis.learn:

data = prepare_data(path=r"path/to/exported/data", n_temporal, min_points, batch_size, n_temporal_dates, dataset_type="PSETAE")

where,

· n_temporal — Number of temporal observations or time steps or number of composited rasters. This is optional for multidimensional rasters but required for composite rasters.[JD1]

· n_temporal_dates — The dates of the observations will be used for the positional encoding and should be stored as a list of date strings in YYYY-MM-DD format. This is optional for multidimensional rasters but required for composite rasters.

· min_points — Number of pixels equal to or multiples of 64 to sample from each labeled region of training data, for example, 64, 128, and so on. This is optional.

· batch_size — Suggested batch size for this model is around 128. This is optional.

· classes_of_interest — List of classes of interest. Only specified classes will be considered for dataset creation. This is optional.

· channels_of_interest — List of channels or bands. Only specified channels or bands will be considered for dataset creation. This is optional.

· timesteps_of_interest — List of time-steps. Only specified time-steps will be considered for dataset creation. This is optional.

· dataset_type — Type of dataset in sync with the model. This is required.

The size of the tensor is T x C x N, where T is the number of temporal observations, C is the number of spectral channels or bands, and N is the number of pixels.

Visualize temporal profiles of classes from your training data

The resulting visualization is the temporal profile of seven bands for each crop type. It can be seen that each crop type has a unique temporal profile

· rows — Optional integer. number of pixels to be sampled for each class.

· spectral_view — Optional Boolean. If set to True, this shows spectral curves of sampled pixels.

Train PSETAE model

The following sections describe how to train the model.

Load PSETAE model architecture

Initialize the PSETAE model object as shown below:

model = PSETAE(data, gamma=2, dropout=0.2)

The following model parameters can be passed using keyword arguments:

· gamma — Gamma parameter of the focal loss. By default, it is set to 1. This is optional of type integer.

· mlp1 — Dimensions of the successive feature spaces of MLP1. default set to [32, 64]. This is optional of type list.

· Pooling — Pixel-embedding pooling strategy can be chosen in (‘mean’, ‘std’, ‘max’, ‘min’). By default, it is set to mean_std. This is optional of type string.

· mlp2 — Dimensions of the successive feature spaces of MLP2. By default, it is set to [128, 128]. This is optional of type list.

· n_head — Number of attention heads. By default, it is set to 4. This is optional of type integer.

· d_k — Dimension of the key and query vectors. By default, it is set to 32. This is optional of type integer.

· dropout — dropout value. By default, it is set to 0.2. This is optional of type float.

· T — Period to use for the positional encoding. By default, it is set to 1000. This is optional of type integer.

· Mlp4 — Dimensions of decoder mlp. By default, it is set to [64, 32]. This is optional of type list.

Default values for optimal performance are set for the model’s hyperparameters.

For more information about the API, see the API reference.

Tune for optimal learning rate

After you prepare the data, you will find an optimal learning rate using lr_find() and store it in variable lr.

Fit the model

To train the model, use the fit() method. To start, train the model for 100 epochs.

· Epochs — Number of cycles of training on the data.

· Lr — Learning rate to be used for training the model.

Here, the model is trained for 100 epochs. The initial 20 epochs are shown above.

Visualize results in the validation set

The rows parameter is the number of rows of results to be displayed. This is optional. The resulting visualization shows the classification of temporal profiles into the predicted classes.

Here, the model is trained for 100 epochs. The initial 20 epochs are shown above.

Compute model metrics

The performance of the trained model is evaluated over various classes by computing per-class metrics. It is observed that there are differences in scores because certain crops, such as other Hay/Non-alfalfa, idle cropland, and winter wheat, are difficult to classify, while others, like grasslands, cotton, and almonds, have been predicted with significant accuracy. It should be noted that since the labels were obtained from the USDA cropland layer, there may be some variations between the labels and the ground truth.

Save the model

As metrics and losses look satisfactory, save the trained model as a deep learning package (.dlpk format) for large-scale inferencing. The deep learning package format is the standard format used to deploy deep learning models on the ArcGIS platform.

Use the save() method to save the trained model. By default, it will be saved to the model's subfolder in the training data folder.

model.save(r'psetae_ts_model_e200')

Model inferencing to a larger extent

After training the time-series classification model and saving the weights, use the Classify Pixels Using Deep Learning tool, available in both ArcGIS Pro and ArcGIS Enterprise, for inferencing at scale.

For Input Raster, add a time-series raster that is a composite or multi-dimensional raster created using nine monthly mean imagery with seven bands each. Follow the steps in the Create Multi-dimensional Rasters from a Collection of Images video to create a multi-dimensional raster from the collection of images.

Shown below is the predicted raster by the trained model.

Figure 3. Crops predicted by the trained model

Conclusion

In this post, you have seen how you can train the PSETAE deep learning model available in arcgis.learn module of ArcGIS API for Python to classify various types of crops in a satellite image time series.

It’s a challenging task to detect and classify patterns in satellite imageries such as Sentinel-2, Landsat, and more with good temporal and spectral resolutions. As demonstrated, time-series classification models like PSETAE are computationally efficient models to overcome this limitation.

References

See the following to learn more:

· Garnot, Vivien Sainte Fare, Loic Landrieu, Sebastien Giordano, and Nesrine Chehata. “Satellite image time series classification with pixel-set encoders and temporal self-attention.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12325–12334. 2020.

· Simoes, Rolf, Gilberto Camara, Gilberto Queiroz, Felipe Souza, Pedro R. Andrade, Lorena Santos, Alexandre Carvalho, and Karine Ferreira. “Satellite image time series analysis for big earth observation data.” Remote Sensing 13, no. 13 (2021): 2428.

· Data Preparation Methods

· https://nassgeodata.gmu.edu/CropScape/