Surprise Maps: Showing the Unexpected

In 1977, Jerry Ehman — an astronomer working with the SETI project to seek out alien life — came across an interesting radio signal, one needle in the haystack of all of the electromagnetic signals that SETI monitors. An incredibly strong radio signal, one that matches many of the parameters we’d expect to see if aliens were really trying to communicate with us. So impressed was he with this data, that he circled the signal in red ink and wrote “Wow!” in the margins; it’s been called the “Wow!” signal ever since.

As the Wow! signal illustrates, often when we analyze data we are not interested in business as usual: what we care about are exceptions to the rule, outliers, and generally the unexpected. Nobody brings out the red ink to circle data when everything is normal. And yet, when we present data visually, exceptions and outliers may get lost in the sea of usual variation. We need a visualization equivalent of Ehman’s “wow!” annotation.

For geographic data, our proposed solution is called a Surprise Map: a form of heat map that gives more weight to surprising data. The idea behind Surprise Maps is that when we look at data, we often have various models of expectation: things we expect to see, or not see, in our data. If we have these models, we can also measure deviation or difference from these models. This deviation is the unexpected, the data that surprise us. Such surprising data is sometimes important, and at the very least justifies follow-up analysis.

Surprise maps are useful when the raw numbers, by themselves, don’t tell us much: visual patterns might look complex but convey only statistical noise, or patterns may look simple but hide the really interesting features.

Canadian Mischief

Here’s an example. “Mischief” is a category of property crime: it includes things like vandalism and graffiti, where the intent is neither to steal anything nor hurt anybody. We might wonder “which province or territory of Canada is the most safe from mischief?” Well, we have a list of provinces, and a count of events of mischief, and so a common design choice would be a choropleth map (a.k.a. heat map) of the data:

Looking at the map, we notice that Ontario has the most mischief. However, there is a confound. Ontario also has the highest population. More people means more crime. So let’s normalize to the per capita rate of crime:

Now the picture is nearly the opposite of what we saw in our first map. Maybe the Northwest Territories are the most dangerous!? Not so fast… the Northwest Territories are one of the least populous territories of Canada. Fewer than 44,000 people live there, compared to the more than 13 million people who live in Ontario.

Are the Northwest Territories really that dangerous, or are they merely a victim of what Howard Wainer has called “The Most Dangerous Equation?” That is, when populations are low, variation tends to be high. As an extreme example, imagine a province with only two people in it: if person A commits a crime on person B, then that’s a per capita rate of 50%, the highest in the nation! If A and B peacefully coexist, that’s a per capita rate of 0%, similarly the lowest in the nation. Neither case really offers much evidence that this two-person province is really the safest or most dangerous place to live.

Here’s a concrete example of how population can make these maps tricky to interpret. Suppose there is a disease that is endemic to the U.S., and I want to find which counties are safest. Here is a map showing the top 10% safest counties in pink (places with the lowest per capita rate of the disease):

Now, let’s speculate about why these counties are safest. Maybe the fresh air of the great plains has something to do with it, or maybe there’s something about cities that make people susceptible. But, when we take a look at the top 10% most dangerous counties (in purple), we get a very similar map:

In reality, there is no geographic pattern whatsoever. All I have done is given each citizen of the U.S. an equal chance of infection and in effect just flipped a coin for each of them. Less populous counties have fewer coin flips, and so the impact of these flips matter more. In a way, they have more room to be “lucky” or “unlucky.” The map of rates across the entire country shows the full story (purple is more dangerous, pinker is safer):

As population increases, variability decreases. So we get “checker boards” of high and low values in the sparser regions of the country, and a uniform pink everywhere else. Population, not geography, is what drives these spurious “interesting” patterns.

So let’s turn back to our example of Canadian mischief. Instead of coloring regions based on the highest values, let’s color them based on the most surprising values. We’ll describe how we calculate surprise later, but for now, let’s assume two things:

  1. If there is no big geographical differences in mischief, we’d expect each state to have the same per capita rate.
  2. If there is no big geographic difference in mischief, we’d expect variability to increase for smaller populations.

Given these two models of how we expect the data to appear, we can measure deviations from these models. Counter-examples to assumptions 1 and 2 would be a province with a much higher or lower per capita rate than any of the others, or a province with a large population and a high deviation from the average per capita rate. A Surprise Map highlights where these counter-examples occur. The bluish regions are places where we have less mischief than we’d expect, given our models, and the reddish regions are where we have more mischief than we’d expect:

What we get is a map that is somewhere in between the previous two maps we saw: Ontario has a lot of mischief, sure, but it has much less than we’d expect, given its outsized population compared to the rest of Canada (around 500 incidents per 100,000 people, compared to an average of 3,800 incidents per 100,000 people for Canada as a whole). Nunavut and the Northwest Territories have higher per capita rates, but this variability is within reasonable limits given their tiny populations. It is the prairie provinces where we see somewhat unexpectedly high levels of mischief. With the right models to back them up, Surprise Maps suppress noisy and irrelevant patterns in maps, and visually highlight what’s left.

Calculating Surprise

Surprise Maps are driven by a statistic called Bayesian Surprise. Bayesian Surprise is a measurement initially developed by vision researchers to help identify the most salient (interesting) parts of an image or video. The key idea is that it is not the data by itself that drives interest, but how the data shifts our models of expectation. The same data can produce different levels of surprise, based on our underlying beliefs about that data.

Here’s an example. Suppose you are a student taking a class, and you have two different potential expectations:

  1. I am going to pass this class.
  2. I am going to fail this class.

Suppose you have been a good student so far, and so your prior belief that you will pass the class (1) is high, and your belief that you will fail the class (2) is low. Suppose now that you are handed back a midterm test, and you receive an extremely low grade. This new data is very surprising and it forces you to update your beliefs: you are now much less certain that you will pass the class, and more certain you will fail. Similarly, suppose that you were pretty sure you were going to fail the class before the midterms were handed back (you haven’t been attending lecture, or the material is out of your depth, or some other strong belief). When you receive your failing midterm score, you have certainly received new information, but this information is not very surprising: you already strongly suspected you were not going to do well. Depending on our prior beliefs, the same (low) midterm score can cause different levels of surprise.

Surprise Maps leverage the same intuition: we select an initial set of models and initial beliefs about each of those models. The most surprising information is that which causes the biggest shift in our beliefs about those models: strong evidence for a model we had little belief in, or strong counter-evidence for a model we thought was a sure bet.

The procedure for generating a Surprise Map is then:

  1. Select a set of potential models for the data. Connected with each of these models is an initial belief (a Bayesian prior) about how likely this model is to be true. Initially, our models might be equiprobable: we have no strong initial guess as to what we expect to see.
  2. Compare the expected distribution of data to the actual distribution. This allows us to estimate a likelihood that we would see our real data, if our model(s) were correct.
  3. Using Bayes’ Theorem, calculate the posterior probability of each model. That is, how accurate is our belief in our model given the real data we just observed? (For example, that we are passing the class, or that Manitoba will have the same rate of mischief as Alberta.)
  4. Calculate surprise as a difference between our prior and posterior probabilities, across all models. High surprise occurs when beliefs shift rapidly; low surprise occurs where there is not much change (we already knew we were failing, and this new ‘F’ grade doesn’t change our minds).
  5. Visualize the surprise values. One can plot either total surprise or signed surprise (where we see if our surprise is caused by over- or under-estimation of the data). Negative surprise is where we see lower quantities than we were expecting, positive is where we see higher quantities than we expected.

For Surprise Maps, we do not need to choose particularly complex models, so long as the deviation from these models is informative. For instance, the assumption that each Canadian province will have the same rate of mischief (model 1, above) is somewhat naïve: different provinces have different levels of urbanization and poverty and other confounding variables that may impact crime rates. Yet, this initial model is both easy to describe, and has meaningful definitions of counterexamples or deviations. Surprise Maps are intended to work with a small set of simple models, where deviation is meaningful to the analyst. These coarse models function as initial rough guesses as to how the data might appear.

Conclusion

By visualizing surprise, rather than just the data, we make the informed choice to highlight the unexpected, at the expense of the normal. This might not be appropriate for all tasks. For instance, if I really want to know the exact rate of mischief in Ontario, the Surprise Map won’t give me that information. The more traditional choropleth map would be the better choice. Surprise Maps are intended for situations where there are biases and confounding variables at play, and the raw numbers may mislead the viewer. These situations happen quite often in visual analysis: spurious patterns, noise masquerading as signal, and insufficiently strong evidence are all problems that arise when we visualize geographic data. They are also appropriate in situations where we have strong prior beliefs, or strong confounding variables, that we wish to account for. We’ve applied Surprise Maps to bird death data (where seasonal patterns of mortality can drown out interesting signals), natural disasters, and even voting patterns. Surprise Maps allow us to apply techniques of statistical modeling, but in the context of a simple color-coded map.

For more information about Surprise Maps, read our paper, “Surprise! Bayesian Weighting for De-Biasing Thematic Maps,” to be presented at the IEEE VIS conference in October 2016.

This post was authored by Michael Correll, in collaboration with Jeffrey Heer.