Earth Observation 4 Land Degradation Neutrality — Part 2: data characteristics

Elke Hendrix
8 min readFeb 24, 2020

--

Image from ESA

Earth observation data comes in a whole variety of forms i.e. processing levels, resolution, coverage and data formats. In this blog about data characteristics in the Earth Observation 4 Land Degradation Neutrality series the most important characteristics to take into account when searching for suitable earth observation data are explained.

Raster vs vector data

There are two major spatial data types: raster and vector, both can be useful for specific purposes. Raster data is also called grid data and represents cell-based information where each cell is called a pixel and contains a value. Imagery derived from satellites or other airborne sensors are usually in the raster format. The information stored in a raster dataset can represent magnitudes, heights, spectral values or categories.

  • Height data: elevation, aspect, slope and waterflow.
  • Categorical data: land use, soil types, vegetation types and land cover.
  • Spectral values: percentage of reflected light in the Near InfraRed, green, blue, red or Radar spectrum.
  • Magnitudes: air pollution and noise.

Vector data is based on longitude and latitude values. One pair of longitude and latitude values form a point. Two or more pairs form a line and when all the points are connected it forms a shape called “polygon”. Point data can represent the location in which a certain species is seen or a location of a supermarket. Line data is often used for road maps or river maps. Polygon data can represent vulnerable areas or borders of countries and provinces.

Temporal coverage and updates

When monitoring a phenomenon it is important to know how often the dataset is updated. For example for monitoring land use change monthly or yearly data might be sufficient. But if you are monitoring vegetation growth it might be necessary to measure growth daily. There are datasets that are updated multiple times a day, daily, every 8-days, monthly or even yearly. It is also important to think about the temporal coverage. Temporal coverage is the amount of time multiple datasets of the same source cover. For example when a satellite is launched in 2002 and has been working ever since the temporal coverage is 2002–2019. When measuring anomalies (deviations from the mean) you need long time series but if you are only interested in one growing season a few months of data will fulfill the job.

Spatial resolution

The spatial resolution of a dataset explains the amount of detail in the imagery. For vector polygon data the spatial resolution is for example on municipality level, meaning that all the information within the municipality is aggregated. The spatial resolution for vector point and polyline data can be expressed as precision, when the precision is 10 meters the “actual” location of the feature can be in a range of 10 meters from the mapped location. For raster data the spatial resolution is based on the subdivision of a grid of cells. It is tempting to think that the higher the resolution the better your outcomes is going to be, but this is not always the case. A few drawbacks of high resolution imagery is the computation time and storage space needed. If you want to analyse air pollution in Europe a resolution of a few kilometers can give you a great overview of the most polluted areas. Higher resolution imagery can give an advantage if you want to analyse crop growth on a plot level. Often the choice of spatial resolution is a compromise of processing time and the amount of detail.

Low resolution to high resolution imagery

Geographic coverage

It is important to check whether a dataset covers the area of interest. Some datasets cover the whole earth while others only cover a specific country or continent. The coverage of satellite images are determined by their orbits around the earth. When a satellite is in a geostationary orbit (36,000 km from the equator) the sensor always views the earth from the same angle. This means that frequent measurements can be taken but only of a small area of the earth. Satellites with a polar orbit are the only satellites that can cover the whole earth with measurements but that comes at the cost of less frequent measurements. A sun synchronous orbit is an orbit where the satellite crosses both poles but this orbit is not stable and it slightly changes every day so the angle to the sun is always the same. Because satellites in a sun synchronous orbit always catch sun light from the same angle this orbit is very suitable for passive sensors. There is a whole variety of orbits e.g. an equatorial orbit only covering the equators.

Sometimes datasets only cover specific areas because they are from commercial parties who sell the data for a specific area. An example of this is the Sattelietdataportaal from the Netherlands Space Office that buys RapidEye, SPOT6 TripleSat and PlanetScope that are freely available for Dutch citizens covering the Netherlands only.

Sun synchronous orbit and geostationary orbit

Processing level

The processing level of earth observation imagery is very important to take in mind when using raster imagery. Most earth observation imagery needs some processing before they can be used, the lower the processing level the harder it is to use the data. NASA defined five processing levels that give a systematic overview of the different levels. Unfortunately not all agencies use the same format so it is always recommendable to check what the different processing levels include.

  • Level 0
    Raw satellite data is very hard to process and is usually not even distributed by the different agencies. A lot of processing steps need to be taken before the data is usable.
  • Level 1
    The unprocessed raw satellite data is corrected for the following:
    Georeferenced: the imagery is linked to the real location on earth
    Time- referenced: the time on earth during the capturing of the measurements is added to the imagery.
    Radiometric correction: the electromagnetic radiation captured by the sensor is disturbed by gasses or particles in the atmosphere, slight difference in the angle of the sun and other biases. The imagery is systematically corrected for these biases. This correction is very important for the analysis of time series.
    Geometric correction: because the earth and the sensor are moving during the measurements geometric distortions occur. During the geometric correction these distortions are removed.
  • Level 2
    The electromagnetic radiation from level 1 products are transformed into geographical information e.g. soil moisture, temperature and height.
  • Level 3
    Level 3 products contain the same information as level 2 products but the ease of use is increased because missing values are interpolated and the separate tiles are combined resulting in a worldview.
  • Level 4
    Level 4 products show proxy variables derived from level 2 and level 3 products. This means that models use the geographical information to derive information that was not directly measured by the sensor.

Data format

Earth observation data comes in a whole variety of formats, some datasets can only be viewed and others can be downloaded for analysis. Therefore it is important to determine beforehand if the data is only needed for visual inspection or for further analysis. If the data is used for analysis it is also important to note that not all formats can be read by the preferred GIS software. ESRI offers a list with explanations per geographic data format, here only the most frequently used data formats are discussed.

A few data formats are rendered online meaning that the data is not stored on the computer that you are working on.

  • Web Map Service (WMS)
    A WMS is a map that is hosted on a server somewhere else. This means that the data can only be accessed when your computer is connected to the server. The main advantages of a WMS are that the maps already show a combination of multiple data sources (raster and vector e.g. land use, rivers, houses, roads) and the data is automatically updated. The disadvantage of using a WMS is that the data is loaded as an image and the separate layers that are the basis of the map are not rendered.
  • Web Feature Service (WFS)
    A WFS is similar to a WMS because the map is hosted on a server. However, a WFS also renders the different vector layers that construct the map. The disadvantage of a WFS is that it takes more time to load the different layers from the server which makes the analysis slower.
  • Web Coverage Services (WCS)
    A WCS is the same as a WFS but for raster data. Again the separate data sources can be loaded but this comes at the cost of slow rendering.

The other data formats are stored on the computer and can therefore be used in an offline GIS software environment.

  • Vector data formats
    The most popular vector data format is called the shapefile. A shapefile stores the geographic location, the shape of the attribute (polygon, point or a line) and the attributes of the feature stored. The attributes per feature are stored in a table called attribute table. Shapefiles are stored as .shp, .shx, .dbf, . prj, .xml, .sbn or .sbx files. Less used vector data formats e.g. ArcInfo Coverage, Spatial Database engine, Digital Line Graph are explained here.
  • Raster data formats
    Images are raster datasets where the separate pixels store information about brightness values of the different types of electromagnetic radiation. Images are usually satellite images with a low processing level. Grids are a raster data format where both discrete features (e.g. buildings, roads, land uses) and continuous phenomenon’s (e.g. rainfall, temperature) are stored. Grid datasets are already processed to tangible information. A grid cell can store information about the land use, for example the information “deciduous forest”. For continuous phenomenon’s a grid cell can store the amount of rainfall in mm for the month February. Both Images and Grids are often stored as .tiff, .img, .HDF, .NETCDF, or .jpg.

--

--