Creating Air Quality Maps Using Kriging

4 min readJul 26, 2022

A lot of everyday maps in air quality (and many other fields) rely on spatial interpolation of concentration levels in the domain of interest using a limited number of observations. A couple of spatial models can be used for this purpose. One of the most common ones is Kriging. For instance, Air Quality Index (AQI) maps are currently calculated by interpolating ground-based observations to a grid using kriging at national, regional, and local spatial scales.

An Example of National Map of AQI from EPA AirNow (July 15th, 2022)

In this article, I show how to use kriging to interpolate air quality data and create air pollution maps on a custom grid. I will use PM2.5 data in Texas for this purpose. Let’s get started!

Tools

R, gstat

Air Quality Data

Daily PM2.5 data can be downloaded from EPA Download Daily Data tool. A sample date was chosen for kriging calculation. The main columns needed for kriging are the daily PM2.5 levels (Daily.Mean.PM2.5.Concentration) in µg/m3, latitude (SITE_LATTITUDE), and longitude (SITE_LONGITUDE) for each site.

It is not uncommon to have multiple instruments measuring PM2.5 at an air quality station. This will cause multiple observations at one location. These duplicate measurements were dropped before any further analysis. Then PM2.5 table was converted to an sf object with a geometry column. The projection was also changed to Universal Transverse Mercator or UTM (EPSG:6340) to have the units in meters. The final step in preparing the PM2.5 table was creating x and y coordinates as separate columns for kriging.

##dropping duplicated observations at the same location
pm25 <- pm25[!duplicated(pm25["Site.Name"]),] ##convert lat/lon to UTM projection
pm25 <- st_as_sf(pm25, coords = c("SITE_LONGITUDE","SITE_LATITUDE"), 
                 crs = 4326)%>% st_transform(crs = 6340) ##create x and y columns for kriging
pm25 <- pm25 %>% mutate(x=st_coordinates(pm25)[,c(1)],
                   y=st_coordinates(pm25)[,c(2)])%>% as.data.frame()

Creating a Grid

The first step of interpolation is to create a grid covering the area of interest. To do this, I first downloaded the shapefile showing the boundary of Texas from this link; then created a regular 20x20 km grid over this domain which was enough for a smooth-looking map. The projection was changed like the previous section. Also, x and y columns were created for the grid centroids which will be used to estimate values for each grid.

#Creating the Grid over Texas 
Texas_boundary<-st_read("./State.shp") %>% st_transform(crs = 6340)grid <- 
  st_make_grid(st_as_sfc(st_bbox(Texas_boundary)),
               cellsize = 20000,what = "polygons",
               square = TRUE,crs = 6340) %>% st_sf()Texas_grid <- st_filter(grid, Texas_boundary, join = st_intersects)##create x and y for kriging
Texas_grid <- Texas_grid %>% mutate(
x=st_coordinates(st_centroid (Texas_grid))[,c(1)], y = st_coordinates(st_centroid(Texas_grid))[,c(2)]
)

Kriging

Kriging in R can be done using the gstat package. It starts with the calculation of sample variograms and fitting a model to them. The variogram describes the degree of spatial dependence of concentration values.

Kriging works the best on normally distributed data which can be usually achieved by taking the logarithm of the values. Variograms were calculated by the following code and a curve function was applied to it. First, a dataframe with x and y of the grid centroids was created and converted to a proper format for gstat (SpatialPointsDataFrame) alongside the PM2.5 table.

#Kriging ##create grid centroid dataframe for kriging estimates
centroids <- Texas_grid %>% select(x,y) %>% as.data.frame()##format data for gstat kriging
coordinates(pm25) <- ~ x+y
coordinates(centroids) <- ~ x+y## calculates sample variogram values 
pm.vgm <- variogram(log(Daily.Mean.PM2.5.Concentration)~1, pm25) ## fit exponential model to variogram
pm.fit <- fit.variogram(pm.vgm, model=vgm("Exp"))
plot(pm.vgm, pm.fit)

Calculated Variograms (Semivariance) and the Fitted Exponential Model

Here I chose an exponential model to fit. You can change the fitting model or manually initialize some of the variogram parameters to fit as you like (look at gstat manual).

This variogram model was used to estimate concentration over the grid centroids using ordinary kriging.

## estimate kriged values on the grid
pm.kriged <- krige(log(Daily.Mean.PM2.5.Concentration) ~ 1, pm25, centroids, model=pm.fit)%>% as.data.frame

The final step was to create a concentration map over Texas. I used ggmap for this purpose.

basemap <- get_map(location = "Brady, TX",zoom=6,maptype = "hybrid")ggmap(basemap) +
  geom_sf(data = Texas_grid, aes(fill = exp(pm.kriged$var1.pred)), col = NA, inherit.aes = FALSE, alpha=.7) +
  geom_sf(data = Texas_boundary, inherit.aes = FALSE,alpha=0)+
  coord_sf(crs = 4326)

Interpolated PM2.5 Concentration (µg/m3) Map for April 20th, 2021

Concluding Remarks

Kriging is a powerful interpolation tool that can be applied to various data configurations. In addition to the steps mentioned here, it is generally a good idea to assess the accuracy of your interpolation with a cross-validation approach. Kriging might become more complicated and not accurate if the data is not normal or has a trend within. While many air quality maps use kriging, some newer approaches blend the interpolation methods with physical models to achieve higher accuracy (see this paper for example).

You can find a detailed description of kriging in R in the gstat user manual or this tutorial. You can also check out my recent paper which utilized kriging to produce air quality maps. The complete code used in this post can be accessed through my github.