Exploring Birmingham’s urban heat island with Python and Facebook Prophet

Published in

CodeX

8 min readAug 3, 2021

What is an Urban Heat Island?

It’s probably not surprising that cities are hotter than the urban areas surrounding them. The thing is, this difference occurs mostly later in the day and at night. What also might surprise you is the reason why there is a temperature difference. It’s not just because you’re less likely to find a shady spot in a city. In fact, rural areas are cooler than nearby cities even when there aren’t many trees in either area. The real reason that cities are hotter is because they absorb solar radiation all day, which causes buildings, pavement, and other human-modified parts of the urban landscape to heat up, and then release it later in the afternoon and at night into the surrounding air. In contrast, rural areas with less pavement tend to be similar to more developed urban areas in their daytime temperatures, but much cooler later in the day and especially at night because the ground and vegetation don’t re-radiate any accumulated heat. This phenomenon of cities being hotter than their surrounding areas is known as the Urban Heat Island (or UHI) effect. UHIs typically form in cities greater than around 200,000 people, and can be quite intense in some places. For example, the current US record holder is New Orleans, where the city can be nearly 9 degrees Fahrenheit warmer than the surrounding rural temperatures. There are lots of resources to explore UHIs, including a very informative page by the folks at Climate Central.

Google earth image of Birmingham, Alabama, showing the three locations where we placed temperature loggers. Probes were placed in forested plots at each of the locations, so they were as similar as possible except for their location relative to the city.

As part of a study looking at the effects of UHIs on local insects, we placed temperature loggers at three different locations going from the urban landscape out to a more rural location. These temperature loggers recorded air temperature and soil temperature at all three locations on an hourly basis from 2013–2015. Given that the population of Birmingham itself hovers just a bit over 200,000, yet the metro area has over 1.4 million, we wondered if we could detect a UHI in the city. The full results will be reported elsewhere, but here I want to show the time series analysis of some of this data using some tools in Python — including the somewhat recent forecasting tools made available by Facebook.

Looking at the temperature data

Air and soil temperatures for three sites in Birmingham. Each location had 2 air loggers and 2 soil loggers. There were some missing observations in late March, which led to the straight lines.

The longest the temperature loggers were in operation was from December of 2014 through September of 2015, so that’s the data we’ll be working with here. We missed some temperatures in late March due to some technical issues, but we still had over 6,000 hourly temperature observations from each logger. You can see from the figure above that there are a lot of daily fluctuations in temperatures, but that the soil fluctuations are smaller (which makes sense). With all this noise in the data, patterns and differences are hard to see.

There’s still a lot of variation when just looking at three loggers, but some differences can be seen, especially in the winter and early spring, where the urban location’s soil temperatures seem a little higher.

Even when we look at just one soil probe from each location we still see a lot of variation. One way to smooth over some of this variation is by taking a rolling average of temperatures at these sites.

Soil temperatures at three locations on a 24 hour rolling average. This smooths out some of the hourly fluctuations.

Once we start to smooth over some of the short-term variation, it’s even easier to see that the rolling average temperature is highest in the urban location and lowest at the rural location. The effect also seems to be greatest in the winter and spring. In April — June, there don’t seem to be many differences, but in the mid- to late summer the urban location seems to have higher highs than the other locations.

Time series modeling of broader temperature trends

If we set aside the comparisons among the three locations for now, it’s clear from the figures above that temperatures in Birmingham exhibit a few broad, and unsurprising trends. Of course, it’s cooler in the winter and warmer in the summer. Temperatures in the fall, while still hot, start to drop down after the August peak. Clearly, these temperature data are not stationary, which means that their mean value is not stable over time. But there are other issues with temperature data too.

Urban soil temperatures show strong autocorrelation that decays slowly over time.

The first issue is that temperature data are highly autocorrelated, which means that the temperature at one hour is very likely to be similar to the hour before it. These autocorrelations occur across a wide time span, from hours to weeks. This makes a lot of sense, because while there may be some hourly fluctuations, current temperatures tend to be pretty predictable and are very similar to (and dependent on) previous temperatures.

Partial autocorrelation for urban soil temperatures

Another issue with these temperature data is that there is strong partial autocorrelation. The partial autocorrelation function (PACF) controls for autocorrelation with lower lags, so any other large partial autocorrelations that remain indicate some underlying trends in the data. To read this plot, you ignore the first spike, since that just the temperature value correlated to itself at time 0. The positive correlation at time 1 (the second spike from the left) shows that 1 hour’s temperature is likely to be very similar to the previous hour’s (i.e., that there is a lag of about 1 hour). The large negative value for hour = 2 and beyond suggests that when we control for the 1-hour lag in autocorrelation, there’s still some strong seasonality that would have to be controlled for in subsequent models. If we were to dive deeper into modeling these temperature data, we’d have to apply some differencing and lags to stationarize all series for further modeling and comparisons. For now, I’ll leave that for a future project.

Facebook Prophet (not profit)

It’s clear to everyone that Facebook makes a lot of money. One way they make a lot of money is by recording temporal data and making accurate predictions about future behavior of these time series data. While some may fear our big blue overlords, one could argue that they DO provide a service people want, thus the profits. Facebook has also provided an additional (and, ironically, free) product to the time series modeling community called “Prophet”. You can read an introduction to Prophet here, a quickstart guide here, and a couple of nice Medium posts with example code for applying Prophet here and here. They were all helpful as I used Prophet to explore my temperature data.

There are some quirks of using Prophet (like installing Prophet and setting up the initial data formatting), as well as a lot of parameters that can be tuned. Here, I just wanted to run through a quick application of its out of the box behavior on these temperature data. It’s easiest to work with just one time series at a time, so I chose the urban site soil temperature.

Prophet plots the observed values of the urban soil temperature time series (the black dots), the forecasted values (blue line) and the uncertainty intervals of our forecasts (the blue shaded regions).

The first plot is pretty simple to make. It shows a comparison between the observed data and the hindcasted predictions (basically, the predictions made on observed data). What’s more interesting is the forecasted predictions into the future (the darker blue line), along with the associated error (the lighter blue envelope). Here, I’ve only asked it to project 3 months into the future, so from September — November. Its prediction is probably pretty reasonable, but the large amount of error is sizeable and grows exponentially into the future. This makes sense, as it has not trained on enough data to understand the full seasonality of yearly temperatures, PLUS it has been misled by the fact that summer temperatures climbed, while the fall temperatures started to decline. I’m guessing the error predictions would be far smaller if it was trained on a full year’s worth of data.

Overall trends across the full time series

Now for my favorite part of Prophet: the decomposition of the trends into different timescales. Basically, Prophet takes subsets of your time series data and breaks it into the components at smaller times —somewhat like partitioning variance in ANOVA. In the figure above, the broad trend shows cool winter temperatures, climbing spring and summer temperatures, a peak in August, then the start of a decline. At this point, Prophet projects into the future that the trend will start downward, but the uncertainty is really high — probably reflective of the issues pointed out above for the entire data.

The next finer scale in this dataset is looking across days of the week. According to Prophet, Fridays are the coolest and the rest of the workweek is warmer. I don’t know how strong or reliable this pattern is, and I don’t know that I believe it, but it might be something to look into. Maybe this is reflective of temperatures in urban environments?

Daily temperature cycles, as identified by Prophet.

Now this one is more believable — temperatures are coolest just before dawn, climb during the morning, peak between 3:00 & 4:00, and decline. Classic! It’s reassuring that this trend popped up as clearly as it does here. Prophet doesn’t do anything that new or magical here, since this decomposition into finer scale components could also be achieved with seasonal_decompose from statsmodels (for example, see this post). But, the interface is very nice and easy to work with, once you’ve figured out the initial quirks.

That’s it! A brief exploration into Birmingham’s UHI, along with some pattern searching using Facebook Prophet. In the future, I will be looking into stationarizing the series, making some statistical comparisons between the three sites, as well as trying to find more interesting patterns in the daily, weekly, and yearly temperature cycles. I’d love to hear your comments and suggestions for the future.

Exploring Birmingham’s urban heat island with Python and Facebook Prophet

What is an Urban Heat Island?

Looking at the temperature data

Time series modeling of broader temperature trends

Facebook Prophet (not profit)

Written by Pete VanZandt