Understanding Satellite Image For Geo-spatial Deep Learning

A satellite image (not unlike a standard RGB image) can easily be more than 1gb in size. So, we should know how to handle it efficiently without losing out on the advantages it comes with. Here we will play around with a satellite image obtained from spacenet challenge datastet.

Fractal AI Research
13 min readMay 18, 2022

“Once a photograph of the Earth, taken from outside, is available …a new idea as powerful as any in history will be let loose” — FRED hoyle , 1948

Introduction

Around mid-August 2020, a series of wildfires(350+) were ignited across central and north California due to 700+ cloud-to-ground lightening strikes. Striking night-time visuals of forest fires were captured by NASA’s Suomi NPP Satellite when it was overhead those locations. Satellites have been immensely useful in detecting wildfires, not only that but they also track the spread of smoke and contribute to determining the extend of damage.

With billions of activities happening everyday, the world is getting increasingly interconnected and we are developing at unprecedented pace. Satellite images are going to play a critical role in tracking these global changes. The application of satellite data is only limited by our imaginations. The data is being used by both private and government agencies at scale. Monitoring disasters, tracking agricultural developments, analysing shipping activities at ports, observing the growth of urbanisation are few of many applications.

Satellite imagery has lot of advantages over the standard RGB images. It is not just limited to visible spectra but also use other EM waves such as infra-red to create bands apart from RGB. Also with that, imaging can be done even at night or during bad weathers. These images are most commonly stored in geotiff file format. This format allows geospatial information to be embedded in the file along with the image data. In this blog we will learn how to work with this file format by working on an image from the dataset provided for this problem statement link. The goal is to detect the building footprints in the provided satellite data. While exploring one of the satellite image, we will aim to answer the following questions:

  • How to explore the different geospatial information embedded inside the tif file ?
  • How to convert the building boundaries from one CRS to another CRS?
  • How to generate mask from the given image using the building footprints provided in the geojson file?
  • How to prepare satellite images stored in a non conventional format TIF for consumption into AI models — creating tiles out of the image?
  • How to obtain a lower resolution version of the original image ?

Don’t worry if you don’t know any of the terms yet. We will discuss them as we proceed through the blog.

Overview of concepts

Input :

  • Tif file for image and geojson file containing the coords of the buildings footprints.
  • TIF contains the image data and also meta-data for the image which is stored in the profile attribute
  • The couple of sample files used in this blog can be found here link

Geojson:

  • Json file with three primary keys — type, crs and features
  • Features key contains list of features where each feature is a dict containing info’s on the buildings footprints such as the coords of the footprints
  • Crs key mentions the crs used while storing the coords of the footprints

Libraries used :

Since the TIF images are large files and contain important meta data, we will be using libraries designed specifically to handle and manipulate these geospatial data.

  • Rasterio: Read/Write TIFs, accessing profiles and other metadata. Creating tiles and storing them as TIFs
  • Aeronet: Efficiently reading, parsing and storing information in geojson. Reprojecting the annotations from one CRS to another

Geographic vs. Projected CRS

  • Geographic coordinate systems: coordinate systems that span the entire globe (e.g. latitude / longitude).
  • Projected coordinate systems: coordinate systems that are localised to minimise visual distortion in a particular region (e.g. Robinson, UTM, State Plane)

Why multiple CRS?

CRS is optimised to best represent the:

  • shape and/or
  • scale distance and/or
  • area

CRS are either optimized for shape, distance or area and no one CRS is great at optimizing all the three elements. Some CRS’s are also optimised for particular regions — for instance the United States, or Europe.

There are numerous formats that are used to document a CRS. Proj4, WKT (Well Known Text) and EPSG. EPSG is widely used. We will learn about it next.

A brief on EPSGs

  • EPSG codes refers to different CRS described in a format provided by EPSG
  • Each CRS defines the origin it uses and the shape of the earth it chooses to describe the earth and also the unit system which it will be using to refer a coordinate on that reference system
  • Example, WDS84 or WGS 1984 uses the intersection of equator and prime meridian as the origin and uses degrees(lat longs) as the unit to describe the coordinates
  • EPSG:4326 is a type of unprojected geographic CRS while espg:32327 used below is a type of projected CRS.
  • As the name suggests an unprojected CRS treats earth as 3D and doesn’t project it onto a 2D surface while projected CRS uses a 3D to 2D mapping system. There are number of ways in which this 3D to 2D conversion can be done. And there is no way to perfectly map it hence there is always some distortion(or trade off) in area, shape and scale link
  • Here is a link to convert from one a coord from one CRS to another link

Rasterio — Processing Satellite image.

import os
import json
import random
import rasterio
import numpy as np
import aeronet.dataset as ds
import matplotlib.pyplot as plt
from PIL import Image
from glob import glob
from tqdm import tqdm
from fastcore.all import *
from typing import Union
from rasterio.features import geometry_mask
from rasterio.windows import Window
from rasterio.warp import calculate_default_transform, reproject, Resampling, transform_geom

We are using a sample image from spacenet dataset

src_path = "buildseg/train_tier_1/train_tier_1/znz/33cae6/33cae6.tif"

Exploring the geospatial information present in the TIF file

> data = rasterio.open(src_path)
> type(data)
[Out] rasterio.io.DatasetReader
  • Rasterio provides us a very useful dataloader. We can easily access various geospatial information present in the data. Lets go through the important ones

Profile contains informations like:

  • Affine transform : helps in converting from crs to pixels and pixels to crs
  • CRS: Coordinate reference system Width and height in pixels
  • blockx-size and blocky-size and tiled: are used to define internal tiling. Something we will implementing.
  • nodata: whether there are missing pixels in the data
  • count: number of bands

while creating the a TIF image its important to provide the profile along with the image data. This point is important and will be revisited while creating tiles

> data.profile
[Out] {'driver': 'GTiff', 'dtype': 'uint8', 'nodata': None, 'width': 37113, 'height': 34306, 'count': 4, 'crs': CRS.from_epsg(32737), 'transform': Affine(0.0774800032377243, 0.0, 531847.0,
0.0, -0.0774800032377243, 9367848.0), 'blockxsize': 512, 'blockysize': 512, 'tiled': True, 'compress': 'jpeg', 'interleave': 'pixel'}
  • Bounds gives us the lower left most and upper right most coordinates of the image in the given CRS system. This image uses epsg 32737 code and whose unit is in metres
> data.bounds
[Out] BoundingBox(left=531847.0, bottom=9365189.971008927, right=534722.5153601617, top=9367848.0)
  • We can also get the long lat of the centre of the image
> data.lnglat()
[Out] (39.30061294996244, -5.731031019254337)
  • We can also get the resolution at both vertical and horizontal level in terms of metres/pixel
> data.res
[Out] (0.0774800032377243, 0.0774800032377243)

Reducing the resolution of the image

  • The original image has a high resolution and decreasing the resolution will make it faster to process. Extreme high resolution is not necessary for doing analysis as it slows down the processing steps.

Now we will create the equivalent tif with a resolution of 0.1

> dst_res = 0.1
  • out of all the below -co parameters — compress, tiled are important

> dst_path = "random2.tif"

> command = f"gdalwarp -co COMPRESS=JPEG -co TILED=YES  -co NUM_THREADS=ALL_CPUS -r bilinear " + \
f"-tr {dst_res} -{dst_res} {src_path} {dst_path}"
> os.system(command)
  • The resolution has decreased so now each pixel covers more area hence we notice a reduction in height and width
> img = rasterio.open(dst_path)
> img.profile
[Out] {'driver': 'GTiff', 'dtype': 'uint8', 'nodata': None, 'width': 28755, 'height': 26580, 'count': 4, 'crs': CRS.from_epsg(32737), 'transform': Affine(0.1, 0.0, 531847.0,
0.0, -0.1, 9367848.0), 'blockxsize': 256, 'blockysize': 256, 'tiled': True, 'compress': 'jpeg', 'interleave': 'pixel'}
> img.res
[Out] (0.1, 0.1)

Visualising the TIF image

  • We have been provided with four bands. We would select the RGB bands and visualise it. The images looks magnificent. We can easily slice the image and look closely but we will do that later by dividing the image into tiles.
plt.figure(figsize=(22, 22))
plt.imshow(img.read([1,2,3]).transpose(1, 2, 0))
plotting the entire tif using rasterio and matplotlib
visualizing RGB channels of satellite image using matplotlib.

Exploring geojson

  • We will explore the geojson of the corresponding tif file
  • We will be fixing the mismatch in epsg codes between the image and the geojsons
  • Using the data present in geojson we will be creating the mask for the new image with decreased resolution
base_dir = os.path.dirname(src_path)
di, name = os.path.split(base_dir)
geojson_dir_name = name + "-labels"
geojson_file_name = name + ".geojson"
geo_path = os.path.join(di, geojson_dir_name, geojson_file_name)geo_path'/nfs/gpu14_datasets/sat/buildseg/train_tier_1/train_tier_1/znz/33cae6-labels/33cae6.geojson'

Revisiting Geojsons

three primary keys — type, crs and features

type : Feature collection

features : list of features. features are nothing but bounding box with labels

FeatureCollection is a method of aeronet.dataset module

  • specially made to read and process geojsons
  • fc is a vector of all the features(building footprints) present in the geojson
  • it also provides other attributes which we will explore below
  • Geojson dataloader that can handle crs transformation
  • Remember we discussed that there are multiple crs? Now we need to ensure that our image coordinates and coordinates in the geojson are represented in the same crs. If not then we wont be able to map them and create mask
  • Unfortunately the default dataloader FeatureCollection.read has crs hardcoded in it so it is not accurate in reading geojsons
  • And also to ease the conversion of geojson from one crs to another we have made some changes to the original dataloader and patched a new one FeatureCollection.read_generic
  • patch_to allows us to patch a function to the a given class as shown below

Explanation of the code is beyond the scope of this blog and we shall take this up in the next one

> fc = ds.FeatureCollection.read_generic(geo_path)
[Out] urn:ogc:def:crs:OGC:1.3:CRS84

Lets have a look at the default crs of the geojson against the crs reference used in the image

> fc.crs, img.crs
[Out] ('urn:ogc:def:crs:OGC:1.3:CRS84', CRS.from_epsg(32737))

Fixing the epsg mismatch and Converting coords from one crs to another

  • The problem with having different CRS is we will have different framework to represent the same point on the earth
  • Like for epsg:32737 the basic unit is metres while for epsg:4326 its degrees. And also both have a different reference of origin
  • Since metres is easily interpretable we will convert the coords in geojsons to epsg:32737
  • fc.reproject provides a way to map its geojson from one crs to another. It’s a gift from the gods.
  • The original geojson in not overwritten but each of the feature stored inside the vector is processed and the change can be observed when we print the geometry of each of the feature as shown below
> fc = fc.reproject(img.crs)
> fc.crs
[Out] CRS.from_epsg(32737)

More simpler way is to load the geojson using the new dataloader and then do this

> fc = ds.FeatureCollection.read_generic(geo_path, dst_crs=img.crs)
[Out] urn:ogc:def:crs:OGC:1.3:CRS84

And you get this

> fc.crs
[Out] CRS.from_epsg(32737)
  • Each feature represents a single building footprint
  • fc.features extracts the “properties” key from all the masks(building footprints)
> fc.features[0]

Lets explore the first feature(building footprint).

Total number of building footprints in the image is

> len(fc)
4439
  • Geometry contains all the coordinates of the polygon of the building footprint
> fc[0].geometry
[Out] {'type': 'Polygon',
'coordinates': (((533700.1052929291, 9367442.727510184),
(533702.2944919249, 9367446.207351241),
(533704.6579756252, 9367445.290231245),
(533707.6459179376, 9367450.135843525),
(533710.2917512723, 9367450.400426859),
(533713.5990429427, 9367449.143656025),
(533715.2526887775, 9367446.828551855),
(533755.0559440283, 9367429.762926837),
(533754.857506527, 9367428.836885171),
(533770.1041211254, 9367420.00641641),
(533766.6645377881, 9367418.352770574),
(533761.3728711164, 9367421.593916412),
(533753.369225275, 9367421.263187245),
(533713.4667512759, 9367440.908499764),
(533708.654641896, 9367433.814359132),
(533702.30464189, 9367437.386234136),
(533703.3629752239, 9367441.222692475),
(533700.1052929291, 9367442.727510184)),)}
  • Area covered by this building footprint
> fc[0].shape.area
[Out] 531.2663747649708
> fc[0].bounds
[Out] (533700.1052929291, 9367418.352770574, 533770.1041211254, 9367450.400426859)
  • Metadata provided
fc[0].properties{'pk_0': 1,
'pk': None,
'pkuid': None,
'building': 'yes',
'unfinished': None,
'problemati': None,
'changeset': None,
'area': None,
'condition': 'Complete'}

Creating mask from geometries present in the geojson

geometry_mask: creates the mask for the entire image using the list of geometries present in the geojson file. We need to provide the transform of the image for which we are creating mask as it internally combines all the geometries and maps it out

> geometries = [f.geometry for f in fc]
  • by setting the shape of the mask and passing the transform of the image we get the mask as numpy array
> profile = new_image_src.profile.copy()
> profile
[Out] {'driver': 'GTiff', 'dtype': 'uint8', 'nodata': None, 'width': 28755, 'height': 26580, 'count': 4, 'crs': CRS.from_epsg(32737), 'transform': Affine(0.1, 0.0, 531847.0,
0.0, -0.1, 9367848.0), 'blockxsize': 256, 'blockysize': 256, 'tiled': True, 'compress': 'jpeg', 'interleave': 'pixel'}
mask = geometry_mask(geometries, out_shape=(profile["height"], profile["width"]), transform=profile["transform"], invert=True).astype('uint8')

Storing the mask as a tif file & visualising it

> mask_path = "mask.tif"
> type(mask)
[Out] numpy.ndarray
  • We will store mask data as tif so that we can also encode the geo spatial information along with it. Hence we will need to create the profile for it.
  • Profile is the same method present in a raster data like shown below.
  • An important key is “transform”. Your profile needs to have transform key. Without that you can’t/shouldn’t save the image as a raster data.
> img.profile
[Out] {'driver': 'GTiff', 'dtype': 'uint8', 'nodata': None, 'width': 28755, 'height': 26580, 'count': 4, 'crs': CRS.from_epsg(32737), 'transform': Affine(0.1, 0.0, 531847.0,
0.0, -0.1, 9367848.0), 'blockxsize': 256, 'blockysize': 256, 'tiled': True, 'compress': 'jpeg', 'interleave': 'pixel'}
  • Creating the profile for the mask
  • We will revisit the transform part later while creating tiles
> profile = dict(
driver="GTiff",
height=mask.shape[0],
width=mask.shape[1],
count=1,
crs=profile["crs"],
transform=profile["transform"],
dtype=mask.dtype,
NBITS=1,
)
> with rasterio.open(mask_path, "w", **profile) as dst:
> dst.write(mask, 1)

Lets visualise the mask

> plt.figure(figsize=(22, 22))
> plt.imshow(rasterio.open(mask_path).read()[0])
Mask obtained using the geojson file of the tif image by projecting the coordinates into the tif’s crs.
Mask obtained using the geojson file of the tif image post projecting the coordinates into the tif’s crs

Creating tiles for images and masks

  • To all the deep learning experts out there, never in your right mind would you input 30K x 30k size image to a model
  • So we will break the images into tiles and carefully also generate masks for the tiles
  • We will be storing tiles as tifs hence we need to set the profile for each of the tiles
  • For extracting the pixel values for each tile we will be using the Window attribute in Rasterio. Not clear? stay put.
  • Directories to store the tiled images and masks
dst_image_dir = "images"
dst_mask_dir = "masks"
os.makedirs(dst_image_dir, exist_ok=True)
os.makedirs(dst_mask_dir, exist_ok=True)

Tiling the image

Converting pixel coords to CRS coords

Why do we need to understand this conversion for generating tiles ?

  • Remember that you need profile dict for storing the numpy array as raster? Yes, thats why we would need it here
  • We can easily slice the numpy array of the original image using a sliding window strategy but we also want to store the tiles as raster
  • Hence we need to create profile dict for all the tiles. That means we also need to create the transform matrix.
  • Lets understand this transform matrix by understanding the conversion
  • We will be using the transform matrix of the parent image to create the transform matrix for each of the tiles
  • The elements of the matrix are addressed using first six alphabets in row-wise manner(first row as a, b, c )
  • The c and e elements of the matrix denote the top leftmost coords in the crs system mentioned in the profile
  • While creating the transform matrix for each tile we need to compute the left topmost coords in the given crs system using the resolution provided
  • For each tile, if we fix the origin to the top left most point then we can calculate the distance from the origin along both axis as
  • x axis : x*transform.a
  • y axis : y*transform.e
  • x and y are the number of pixels traversed along x and y axis respectively
  • a element and e element both are resolution(m/pixel) but with opposite signs
  • Traversing left to right is considered as positive translation. Traversing north to south is considered as negative translation
  • Once we have the distance traversed in terms of meters then we will shift back the origin to the top leftmost point of parent image to calculate the coords
  • For calculating the leftmost coord we will use : coord_x = transform.c + x * transform.a
  • For calculating the topmost coord we will use the eq: coord_y = transform.f + y * transform.e
  • The topmost coord forms the c & e elements of the tile transform matrix
img.profile{'driver': 'GTiff', 'dtype': 'uint8', 'nodata': None, 'width': 28755, 'height': 26580, 'count': 4, 'crs': CRS.from_epsg(32737), 'transform': Affine(0.1, 0.0, 531847.0,
0.0, -0.1, 9367848.0), 'blockxsize': 256, 'blockysize': 256, 'tiled': True, 'compress': 'jpeg', 'interleave': 'pixel'}
img.transformAffine(0.1, 0.0, 531847.0,
0.0, -0.1, 9367848.0)
img.boundsBoundingBox(left=531847.0, bottom=9365190.0, right=534722.5, top=9367848.0)

Size of the tile = 1024

size = 1024
  • the following will help in naming the tiles
i = 0
city = src_path.split("/")[-3]
code = src_path.split("/")[-2]
print(city, code)
znz 33cae6

Using sliding window( with a stride) strategy and storing the tile

  • here size=stride but if we want to alter size then we can just replace size with the required stride in the step argument of range in both the for loops
  • the boundless argument in .read attribute of raster helps in handling with the corner cases where the window will partially extend beyond the image
Similarly we will split the mask tif> mask_src = rasterio.open(mask_path)
> generate_tiles(mask_src, size, dst_mask_dir) # generating tiles for mask
> generate_tiles(img, size, dst_image_dir) # generating tiles for image.

Visualising random tiles and masks

for index in random.sample(range(754), 5):
name_index = "{}_{}_{}.tif".format(city, name, str(index).zfill(5))
img_tile = rasterio.open(os.path.join(dst_image_dir, name_index))
mask_tile = rasterio.open(os.path.join(dst_mask_dir, name_index))
plt.figure(figsize=(12,12))
plt.subplot(1,2,1)
plt.imshow(img_tile.read().transpose(1, 2, 0))
plt.subplot(1,2,2)
plt.imshow(mask_tile.read()[0])
Tiled images and corresponding masks. Each tile is 1024*1024 and can easily be used to train a deep learning model
Tiled images and corresponding masks. Each tile is 1024*1024 and can easily be used to train a deep learning model

Prestige with deep learning

  • Once we have the tiled images and masks we can train a Mask-rcnn and build a build segmentation model
  • This detector will take tiles as input and detect building footprints
  • We can run inference on tiles of a test tif file and restitch it to visualise the entire mosiac with detections

written by — Kunal Singh and Prakash Jay

Resources

--

--