Center Diverging Colors on Leaflet Map

Wendy Wang
IBM Data Science in Practice
13 min readJul 22, 2020

Leaflet is a popular and mature framework for interactive maps. It has a very nice and neat R interface, and has been extensively adopted for when R users need to create an interactive map. Although the existing support for customization is already great in this R package, when it comes to real life, there is never enough pre-built features for customization.

Centering polygon color at a specific value is one of the headaches. R has an easy-to-use color palette system, but unfortunately it doesn’t support a custom centering point for diverging color palettes.

You must be thinking that there has to be some trick to make this happen — well, you are absolutely right, and you will soon see two methods that address this issue. Depending on your need, it may be a couple more lines, or a huge amount of work (not anymore if you choose to take what I shared). An example dashboard can be found here, with code here.

Before diving into this topic, I have to admit that this particular pain point is not necessarily related to Shiny. You may use Leaflet for static plots and you happen to be annoyed by this issue. I simply put them together as Leaflet is used very often with Shiny.

Create A Leaflet Map

Let’s start with a map to illustrate what troubled me for weeks. The Census Bureau hosted a wide range of shapefiles on their website and in this example we will use cb_2018_us_county_5m.

The following piece of code will download and unzip the shapefiles, then load it with readOGR.

download.file('https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_county_5m.zip',
'cb_2018_us_county_5m.zip')
unzip('cb_2018_us_county_5m.zip',exdir='cb_2018_us_county_5m')
shapes <- rgdal::readOGR("cb_2018_us_county_5m","cb_2018_us_county_5m")

This set of shapefiles comes with area of land and area of water for each county, in addition to geographical information.

Assume that we want to explore the ratio of these two variables. We will divide area of wate by area of land, so that the larger the number, the more proportion of water body in the region, while 1 means equal amount and 0 means no water at all. When coloring the shapes, we want to have a centering point at 1 to visually differentiate counties which have more water than land from counties which have more land than water.

A Leaflet map with a pre-built diverging color palette can be created by less than 20 lines of code like this:

poly_state <- shapes[shapes@data$STATEFP == 24,]poly_state@data <- poly_state@data %>%
mutate(WATER_LAND_RATIO = as.numeric(as.character(AWATER)) / as.numeric(as.character(ALAND)),
content = paste0(NAME,': ',WATER_LAND_RATIO))
poly_state %>%
leaflet() %>%
addTiles() %>%
addPolygons(data = poly_state,
weight=1, opacity = 1.0,color = 'white',
fillOpacity = 0.9, smoothFactor = 0.5,
fillColor = ~colorBin('RdBu',WATER_LAND_RATIO)(WATER_LAND_RATIO),
label = ~content) %>%
addLegend(
"topright",
pal = colorBin('RdBu', poly_state@data$WATER_LAND_RATIO),
values = poly_state@data$WATER_LAND_RATIO,
opacity = 0.9
)

It picks out Maryland (FIPS = 24), creates the ratio, and plots this variable with the RdBu (red versus blue) color palette.

Apparently, the pre-built color palette tries to find the middle point of the scale and places the diverging colors accordingly.

Since the automatic coloring doesn’t do us the favor, we need to find a way to tweak what is passed to the map. To be more specific, it’s time to roll up your sleeves and build our own palette!

Custom Palette: Easy Version

Like a couple of Stack Overflow threads mentioned, the colorRampPalette function in R can be used to generate continuous monotonic hex color codes, and we can simply generate the two sets of colors separately.

How do we do that?

First, we take the centering value (here 1) from the ratio, so that now negative numbers should be in one color and positive numbers in another. Then we calculate the minimum and maximum value in order for colorRampPalette to know the range in which it needs to generate colors.

poly_state <- shapes[shapes@data$STATEFP == 24,]poly_state@data <- poly_state@data %>%
mutate(WATER_LAND_RATIO = as.numeric(as.character(AWATER)) / as.numeric(as.character(ALAND)),
content = paste0(NAME,': ',WATER_LAND_RATIO))
poly_state@data$WATER_LAND_RATIO <- poly_state@data$WATER_LAND_RATIO - 1minVal <- min(poly_state@data$WATER_LAND_RATIO)
maxVal <- max(poly_state@data$WATER_LAND_RATIO)
domain <- c(minVal,maxVal)

For the negative numbers, as the value goes from minVal to 0, we tell colorRampPalette to map the color from dark red (#b2182b) to white. Similarly, we want it to map the color from white to dark blue (#2166ac) for positive numbers. By concatenating them together, we now have a vector of diverging colors.

colorPal <- c(colorRampPalette(colors = c("#b2182b", "white"), space = "Lab")(abs(minVal)),
colorRampPalette(colors = c("white", "#2166ac"), space = "Lab")(maxVal))

When we plot the map, instead of specifying the name of a pre-built color palette, we pass our color vector as well as the recalculated range that centers at 0. With both the color codes and the range, the mapping of color to value will be automatically figured out. The last change is in the legend part, where we pass the color vector with the original range.

poly_state %>%
leaflet() %>%
addTiles() %>%
addPolygons(data = poly_state,
weight=1,opacity = 1.0,color = 'white',
fillOpacity = 0.9, smoothFactor = 0.5,
fillColor = ~get('colorBin')(colorPal,
domain)(WATER_LAND_RATIO),
label = ~content) %>%
addLegend(
"topright",
pal = colorBin(colorPal, domain = domain+1),
values = domain,
opacity = 0.9
)

The resulting map for Maryland is as follows follows:

Wait a minute, why does it show white for 1.0–1.2? Shoudn’t it have a blue shade?

One thing is that if we print out the output of colorRampPalette, we will see the generated color codes are mapped between the two colors we provided, with both ends included. Because we choose to use the binned color, both ends (white and dark blue/red) are going to cover a bin, in other words a small range of values, which actually is not desired for the white color. We can replace white with light blue/red (for example #e6f2ff and #f7e9ea) and redo the map.

Now it seems to give a correct coloring — but this trick doesn’t solve all the issues. If this “white color bin covers values very close to the centering point” holds, it should have the same effect to both sides, leaving both 0.8–1.0 as well as 1.0–1.2 white. Our observation is that 0.8–1.0 has a nice light red shade as expected and only 1.0–1.2 seems to be annoying. In addition to what we fixed, another issue might be the imprecise mapping of color to values.

Because of this, the fix wouldn’t be effective all the time. If we color a different state with the same method, for example Louisiana, we still see the problem.

Here 1.0–1.5 has a light red shade, although we expect it to be a blue-ish color.

Another issue caused by this mysterious but definitely unexpected color-to-value mapping is that the assigned color changes as the magnitude changes. If we display the ratio as a percentage number by multiplying the value by 100, we see some bins get a different color.

Again we use Louisiana as an example because the previous picture is Louisiana and you can easily scroll up and down to compare.

Magically, when we increase the magnitude, the 100–150 bin which is equivalent to the previous 1.0–1.5 bin, is assigned the correct light blue color.

This magnitude problem is going to be a pain in a dashboard where you intend to allow users to choose their own magnitude. For example, you may want to show the spending on education in each country in the world and let the audience to pick their own currency. No matter what currency they pick, the coloring should not change.

If the above trick already fixes your issue, I’d like to say congratulation to you. Otherwise, you will have to keep digging into it with me.

Custom Palette: Advanced Version

I know people say do not reinvent the wheel. However, sometimes there is just not a wheel to carry your needs (and personally I find it fun.. no you didn’t see this).

The solution seems to be obvious once you realize that addPolygons recognizes a vector of color codes where each represents the color for one data point, and addLegend builds a custom legend reference if you pass a vector of color codes and a vector of labels. You can just bin the values, generate color codes, assign the corresponding color to each data point, and then sit back and enjoy your life.

At least this was what I thought, which made me happy for approximately 5 minutes. After 5 minutes, I looked at the prototype and had to admit that I was deceived, completely. There are actually a lot of questions to be addressed before we can come up with what we want.

Bin Values

We can easily break the range equally, but nobody is going to like a bin of range -24 to -6, or a bin size of 1 that finally yields dozens of bins. How do we automatically find a reasonable bin size with a reasonable number of bins?

The first thing to do is to round up the minimum and maximum numbers. It varies by scenario. For a percentage number, you may round it up to a multiple of 5. For annual household income, you may round it up to $1000. The round_any function from plyr handles this job very well. The interval here is basically the multiple of which number you want to round it up to.

min_val_round <- plyr::round_any(min_val,interval,f=floor)
max_val_round <- plyr::round_any(max_val,interval,f=ceiling)

With the prettified min/max values, we divide the range (max value minus min value) by a set of potential bin sizes which gives the number of bins needed for a particular bin size, and take the smallest one that gives no more than a specified number of maximum bins.

delta <- max_val_round - min_val_roundnum_bin <- ceiling(delta / interval_options)
res_idx <- which(num_bin < max_bin)[1]
interval_plot <- interval_options[res_idx]

If we have a max value of 21, a min value of -63, a centering point of 0, and our bin size options are 5, 10, 15, … (e.g., seq(5,100,5)), and the max bin we’d like to have is 12, plugging in these information we will eventually receive a bin size (I also call it interval) of 10, which will break the values into roughly 9 bins.

If we apply the same code on Louisiana’s water-to-land ratio, like the following:

max_val <- max(poly_state@data$WATER_LAND_RATIO)
min_val <- min(poly_state@data$WATER_LAND_RATIO)

max_bin <- 10
interval <- 10
interval_options <- seq(interval,100,interval)
min_val_round <- plyr::round_any(min_val,interval,f=floor)
max_val_round <- plyr::round_any(max_val,interval,f=ceiling)
delta_positive <- max_val_round
delta_negative <- abs(min_val_round)
num_bin <- ceiling(delta_positive / interval_options) + ceiling(delta_negative / interval_options)
res_idx <- which(num_bin < max_bin)[1]
interval_plot <- interval_options[res_idx]

It returns back a bin size of 60 that will yield 9 bins.

Now let’s see how to form the breaks, or the values on which we are going to draw a line and say this is the boundary between two bins.

Similar to the rounding we did at the beginning, here we need to round the already-rounded min/max value again to a multiple of the decided bin size, then we can make a sequence where values increase by the bin size.

breaks <- c(seq(plyr::round_any(min_val_round,interval_plot,f=floor), 0, interval_plot), 
seq(0,plyr::round_any(max_val_round,interval_plot,f=ceiling), interval_plot)[-1] # remove 0 so no duplicate 0
)

Using the same example of Louisiana, the breaks generated is a sequence of numbers from -120 to 420, increasing by 60.

Last but not least, we need a label for each bin. To concatenate every pair of the adjacent breaks, we use sapply to loop over the index of the vector, and pick out the break value using the index and add back the centering value to get a number on the original scale. The first one will always be a blank string because when the current index points to the very first break value, we cannot find a previous one for it, and hence it needs to be dropped in the end.

breaks_label <- sapply(1:length(breaks), 
function(x) if (x>1) paste0(breaks[x-1]+100,' - ',breaks[x]+100) else '')[-1]

This works, but not always — the reality could be a little more complicated. Rounding up the min/max value for two times extends the practical range beyond the original range, probably to a point that is meaningless. For example, if you are handling spending data in dollar amounts where all numbers originally are positive, a negative minimum value displayed on the label will be very strange. And here for the ratio, we know that the minimum value is 0 and a negative number should not exist either.

We can keep the numeric breaks intact, and floor or ceiling the break value used for concatenation, so that the final bin label will not tell a story that makes no sense. In this example, our first bin label is “-20–40”, and we can simply make it to be “0–40”.

My implementation is a little tedious, so I’m not going to post it here. You can find it in the shared scripts.

Generate Color for Bins

Now that we have the breaks, generating color for each break sounds quite straightforward. And technically speaking, it actually is! You just need to make a decision on one question and take care of another little trouble.

The question that could make a difference is, how do you want to generate color when we don’t have a symmetric scale? In other word, what color do you want when the number of bins on the positive side is not the same as that of the negative side?

The above example gives us 7 positive bins and 2 negative ones. Do we want to map the change evenly between white and the same intensity of different color, one across 7 bins and the other across 2 bins? Or do we want the color intensity to reflect the deviation from the centering point, so that the 2 bins on negative side will effectively all have very light shades?

Here is a code example of the latter. breaks_above and breaks_below are a vector of the breaks on each side including 0, and max_bin_oneside calculates the number of bins (equal to number of breaks including 0 minus 1) on both sides and takes the maximum. Then we ask colorRampPalette to generate max_bin_oneside+1 colors for each side, get rid of the white color, and take the colors we need.

If we have 9 bins on the positive side and 3 bins on the negative side, colorRampPalette will generate 10 colors on both sides, then we drop the white color, and keep all the 9 positive colors while only 3 negative colors that are closest to 0 (or closer to the end of the color code vector) in order to form the final color_val .

breaks_above <- breaks[breaks>=0]
breaks_below <- breaks[breaks<=0]
max_bin_oneside <- max(length(breaks[breaks<=0])-1, length(breaks[breaks>=0])-1)color_val_negative <- colorRampPalette(colors = c("#b2182b", "white"), space = "Lab")(max_bin_oneside+1)[-(max_bin_oneside+1)][(max_bin_oneside - (length(breaks_below)-1) +1):max_bin_oneside]color_val_positive <- colorRampPalette(colors = c("white", "#2166ac"), space = "Lab")(max_bin_oneside+1)[-1][1:(length(breaks_above)-1)]color_val <- c(color_val_negative, color_val_positive)

You can visualize the resulting color vector with scales::show_col(color_val) :

The only trouble you are left with is to handle the conditions where there is no “both positive bins as well as negative bins”. There could be only one side existing, or it could be neither (when the value of all the regions is precisely at the centering point). Some special treatment will come in handy. And again I’m saving this relatively lengthy work to the shared scripts.

Altogether

With the generated color vector and its corresponding label vector, the rest cannot be easier. We will simply combine these two precious vectors into a data frame, create a column in the main data frame that breaks the ratio values into groups named with the bin labels, and map the bin labels to color codes by joining both data frames together.

df_pal <- data.frame(Color_Value = color_val,
Color_Label = breaks_label,
stringsAsFactors = F)
poly_state@data <- poly_state@data %>%
mutate(WATER_LAND_RATIO_Color_Group = cut(WATER_LAND_RATIO, breaks = breaks, labels = breaks_label)) %>%
mutate(WATER_LAND_RATIO_Color_Group = as.character(WATER_LAND_RATIO_Color_Group)) %>%
left_join(df_pal %>%
rename(WATER_LAND_RATIO_Color_Group = Color_Label,
WATER_LAND_RATIO_Color_Value = Color_Value))

In fact, you may want to replace the labels passed to the cut function by the color codes so that this joining is not even needed. Yes you are right. I do it in this way so that it’s easier for me to check if the linkage between bin labels and breaks are correctly mapped, and you are free to skip it.

Finally, with the colors and bin labels completely customized, we pass the color code column to fillColor in addPolygons to tell the map that for each polygon (row) the color code in this column should be used, and also we need to give the generated color vector and its corresponding bin label vector to addLegend in order for the legend to know what to display.

poly_state %>%
leaflet() %>%
addTiles() %>%
addPolygons(data = poly_state,
weight=1,opacity = 1.0,color = 'white',
fillOpacity = 0.9, smoothFactor = 0.5,
fillColor = ~WATER_LAND_RATIO_Color_Value,
label = ~content) %>%
addLegend(
"topright",
colors = df_pal$Color_Value,
labels = df_pal$Color_Label,
opacity = 0.9
)

The full control over colors means more than the capability to color polygons using diverging colors with a custom centering point. For example, we now are able to make a fixed color-to-value mapping that can be applied to maps of different regions, so that one color will always indicate the same range of values no matter what map we are on, which will be good when we need to list a bunch of different regions side by side and compare the value of the variable of interest by the intensity of the color.

I would be very happy to hear if the method I shared solves your problem.

And stay tuned — more blogs about Shiny are on the way!

--

--

Wendy Wang
IBM Data Science in Practice

Machine Learning | Deep Learning | Evolutionary Psychology | Neuroscience