Dealing with Geospatial Raster Data in Python with Rasterio

Michael Mommert
7 min readSep 16, 2020

--

This article is meant to provide a quick introduction into how to use the Python package Rasterio for common tasks related to geospatial raster data. This is mainly a collection of things that took me too long to figure out myself, so I would like to share them with anyone who might be interested.

Some basics

A map of any kind consists of features (representing, e.g., roads, buildings, different land types, surface water, etc.) that are projected in some way to a (typically planar) surface. These features can be represented as vector features (or generally shapes, e.g., lines describing roads, polygons describing building outlines) based on coordinates provided in some coordinate reference system (crs).

Consider a (badly drawn) raster map within a set of bounds, defining its limits to the left, bottom, right, and top. The bound coordinates are defined in terms of the coordinate reference system (crs). The orientation and scale of the raster map relative to the crs is defined by an affine transformation that consists of two parts: a “rotation matrix” and the origin coordinates of the raster map (see below for details).

raster data can be thought of as a picture of a map in the form of an array of pixels. Every pixel is mapped to a geospatial location based on an affine transformation and the underlying crs; the value of each pixel corresponds to some measure, e.g. the brightness of the area covered by the pixel in a specific wavelength range (as provided by some remote sensing instrument). The edges of the raster map in a given crs, are defined as the bounds of the raster map.

Note that the representation of the raster data, its bounds, underlying transformation and crs are closely interrelated. Typically, if you plan on changing one or more of these parameters, this will affect the others, as well.

Rasterio helps you with this process.

Datasets

GeoTiff is a typical image format for raster data. In case you don’t have a GeoTiff file handy, you can download one here; the following examples are based on this file. We can read in a GeoTiff file into a dataset, rasterio’s main data structure, using the following code:

import rasteriodataset = rasterio.open('sample.tif')

data is called a dataset object, which carries some useful information, including:

  • the underlying coordinate reference system:
dataset.crs
CRS.from_epsg(32631)

EPSG:32631 is a local reference system that is used for parts of Europe. http://epsg.io provides information on most available coordinate reference systems. Crs coordinates are typically labeled x and y; in the case of EPSG:4326 (which is the same as WGS84) coordinates are provided in degrees in the common longitude and latitude definition.

  • the bounds of the raster data included in this dataset:
dataset.bounds
BoundingBox(left=590520.0, bottom=5780620.0, right=600530.0, top=5790630.0)

The numbers provided use the units as the dataset’s crs; in this specific case (EPSG:32631), the numbers define the map’s bounding box in units of meters relative to some origin that is defined by the crs. Other crs definitions may use other units, for instance degrees.

  • the affine transformation used in the current raster map representation:
dataset.transform
Affine(10.0, 0.0, 590520.0,
0.0, -10.0, 5790630.0)

This transformation, implemented as an Affine object, defines how a change of 1 pixel in either direction (row or column) translates into crs coordinate changes using 6 parameters that are (in this order and all in the same units as the transformation’s crs): the change in x as a function of the change in pixel column (+10m for +1 pixel), the change in x as a function of the change in pixel row (0 in this case), the x coordinate origin (590520.0m), the change in y as a function of the change in pixel column (0 in this case), the change in y as a function of the change in pixel row (-10m for +1 pixel), and the y coordinate origin (5790630.0m here). You can consider the transformation to consist of two parts: a “rotation matrix” and the dataset’s crs coordinate origin (see the sketch at the top of this article).

  • the number of bands or channels available for this raster map:
dataset.indexes
(1, 2, 3)

(In this case, 3 different bands are available; each band represents a grayscale map for a specific wavelength region. Note that rasterio starts counting bands at 1, unlike Python, for some reason.)

We can read a specific band from the dataset into an array using

img = dataset.read(i)

where i is one of the indices provided by data.indexes. img is really just a numpy array; note that all available band arrays have the same shape (and thus also orientation and scaling) since they are sharing the same crs, transformation, and bounds.

Let’s plot our example image. We can use the rasterio.plot methods adjust_band, which normalizes the array values to a range from zero to one, and show, which creates and displays matplotlib axes. Note that we provide imgdata with the reverse index (or channel) order since dataset holds BGR channels, but we want to plot RGB:

import numpy as np
from rasterio.plot import show, adjust_band
imgdata = np.array([adjust_band(dataset.read(i)) for i in (3,2,1)])
show(imgdata*3) # factor 3 to increase brightness
Plot of our sample.tif GeoTiff file using RGB colors.

Transformations between image coordinates and crs coordinates

Since we are dealing with geospatial data, we would like to be able to convert image coordinates to crs coordinates, and vice versa.

For single pixels, you can go from image coordinates to crs coordinates using

x, y = dataset.xy(row, column)

where x and y refer to the crs coordinates of the center of the pixel located in rowand column of either of the bands in the dataset (note that row and column are not limited to integer values). Another method would be to simpy multiply your image coordinates with the actual transform of the dataset:

x, y = dataset.transform * (row, column)

Keep in mind that both methods will give you slightly different results: the first method provides pixel center coordinates, whereas the second method gives you coordinates relative to the origin coordinates of the transform.

Finally, you can go from crs coordinates to image coordinates using

row, column = dataset.index(x, y)

Coordinate transformations between different coordinate systems

If you have a set of coordinates in some coordinate system (defined by dataset.crs) and would like to transform them to new coordinate system:

from rasterio.warp import transform
from rasterio.crs import CRS
new_crs = CRS.from_epsg(4326) # standard WGS84 coordinatesnew_coo = transform(dataset.crs, new_crs,
xs=[590530], ys=[5790650])
new_coo
([4.326399074221181], [52.25878168834648])

Transformations of vector features between different coordinate systems

Vector features can be defined as shapely objects. Let’s assume we are interested in defining a square region with an edge lengthedgelencentered around some coordinates x and y (based on the respective crs, its units, and the raster bounds). We define our region of interest (roi) as a shapely Polygon object:

from shapely.geometry import Polygonx, y, edgelen = 597667, 5787216, 1000  # based on EPSG:32631roi = Polygon([(x - int(edgelen / 2), y + int(edgelen / 2)),
(x + int(edgelen / 2), y + int(edgelen / 2)),
(x + int(edgelen / 2), y - int(edgelen / 2)),
(x - int(edgelen / 2), y - int(edgelen / 2))])
print(roi)
POLYGON ((590030 5791150, 591030 5791150, 591030 5790150, 590030 5790150, 590030 5791150))

If you print a shape, it is presented to you using the Well-known-text representation of geometry.

Let’s now assume that we defined roi (i.e., x and y coordinates and edgelen) based on crs1 but we would like to transform the Polygon to crs2:

from rasterio.crs import CRS
from rasterio.warp import transform_geom
crs1 = CRS.from_epsg(32631)
crs2 = CRS.from_epsg(4326)
roi2 = transform_geom(crs1, crs2, roi)
roi2
{'type': 'Polygon',
'coordinates': [[(4.319208814194372, 52.26335783407589),
(4.333857524152045, 52.26319323507531),
(4.3335878545153586, 52.254205111266934),
(4.31894210402294, 52.254369657266636),
(4.319208814194372, 52.26335783407589)]]}

Cropping raster data

In order to crop raster data, rasterio.mask.mask masks those areas of the image that you want removed and then removes them for you (if you set the keyword argument crop=True):

from rasterio.mask import maskcrop_img, crop_transform = mask(dataset, shapes=[roi], crop=True)

shape denotes here a “GeoJSON-like dict or an object that implements the Python geo interface protocol” (see documentation) and defines the area you are interested in. We can simply use our roi polygon here again.

Let’s see which part from the image has been cropped (we use crop_img[::-1] to reverse the original channel order in our sample file, BGR, to RGB):

crop_img = adjust_band(crop_img)
show(crop_img[::-1])
Our cropped image shows some colorful tulip fields.

Resampling raster data

Sometimes you have to resample your raster data, for instance, in order to match the resolutions of different imaging channels. Resampling is best performed on raster data that is available in a file, so it actually does make sense to write any data that you might have in your memory to a file first and the apply the following code (which is adopted from the rasterio documentation):

from rasterio.enums import Resamplingresample_factor = 0.5 # >1: upsample, <1: downsample

with rasterio.open('sample.tif') as dataset:

imgdata = dataset.read(out_shape=(dataset.count,
int(dataset.height * resample_factor),
int(dataset.width * resample_factor)),
resampling=Resampling.bilinear)

transform = dataset.transform * dataset.transform.scale(
(dataset.width / imgdata.shape[-1]),
(dataset.height / imgdata.shape[-2]))

This example code will resample all bands in the dataset and derive a new transform. For more information, please refer to the relevant rasterio documentation. This is the resampled image:

While this image looks very similar to the original sample.tif, you can tell from the axis ticks and labels that its extent is only half the extent of the original image since we downsampled it by a factor of two.

Reprojecting raster data

Reprojection means to find a new representation for your raster data based on a different crs (i.e., other coordinate system), a different transform (e.g., to rotate the raster), or different bounds. This may include rotations and distortions, making reprojections rather complex

The following code simply reprojects the given dataset into a new crs (new_crs) with any alterations to the underlying transformation. The code reads in a dataset, derives a new transformation, raster height and width, creates an array for the reprojected data (new_imgdata) and finally performs the reprojection:

from rasterio.warp import calculate_default_transform, reprojectnew_crs = CRS.from_epsg(4326)  # WGS84with rasterio.open('sample.tif') as dataset:
new_transform, width, height = calculate_default_transform(
dataset.crs, new_crs,
dataset.width, dataset.height, *dataset.bounds)
imgdata = np.array([dataset.read(i) for i in dataset.indexes])
new_imgdata = np.zeros(imgdata.shape)

reproject(source=imgdata,
destination=new_imgdata,
src_transform=dataset.transform,
src_crs=dataset.crs,
dst_transform=new_transform,
dst_crs=new_crs,
resampling=Resampling.nearest)

This is our reprojected sample.tif file:

This is sample.tif reprojected to the WGS84 coordinate reference system. The image is compressed along the vertical axis and slightly rotated, all of which is a result of the reprojection onto different crs.

References

--

--