Ratings: Deep Dive

Alec Ramsay
Dave’s Redistricting
7 min readOct 9, 2021

On the Analyze tab in DRA, we rate five key dimensions of redistricting maps: proportionality, competitiveness, opportunity for minority representation, compactness, and splitting. The ratings use a scale of [0–100] where bigger is always better.

Together these create a radar or spider chart Ratings Diagram which allows you to quickly characterize the essence of a map.

While these ratings are simple for you to digest — the design goal — they can be complicated to produce. This note explains why we convert these metrics into ratings, what the ratings are & aren’t, and how we rate each dimension.

A Word on Terminology

We use the term rating for the more technical concept of normalization which is the process of taking disparate measures and putting them on a common scale. Sometimes below, I’ll say normalize instead of rate.

Why Ratings?

Simply put, ratings give you a simple and consistent way for you to understand the basic characteristics of a redistricting map.

The raw metrics for a map can be very hard to make sense of: for example, how good or bad are Reock and Polsby–Popper compactness measurements of 0.3373 and 0.2462, respectively? and how good or bad is compactness overall for a map with those two measurements? Ratings help you answer those questions.

What Ratings Are & Aren’t

Our ratings map raw measurements like 0.3372 onto a [0–100] scale where bigger is always better, so you can quickly assess how good or bad a map is on a dimension.

There are a few essential characteristics of ratings that are important to keep in mind:

  • Ratings are relative! — or said more verbosely, ratings are meant to be relative. Ratings allow you to compare two maps on a dimension and answer the question “Is map A better or worse than map B on this dimension?”

The absolute numbers don’t mean all that much (more on that below), so you shouldn’t place too much stock in them. The point is to allow you to compare two (or more) maps. Moreover, the ratings aren’t meant to be comparable across states — how good or bad maps can be on each of the dimensions, and what the tradeoffs are between the dimensions, depends on the political geography of each state (and type of map — congressional, state upper, and state lower house).

Aside: You might object to the statement that ratings are relative by pointing to the Very Bad, Bad, OK, Good, Very Good bucket labels below the ratings “thermometer” and you would be right. We added those labels to try to help you categorize maps, but ratings are relative within a state.

  • Rating scales are subjective — There’s nothing magical about our ratings. As you will see below, we had to make a bunch of decisions about what scale to use to normalize raw measurements into [0–100] ratings. All our decisions are well motivated, but, at the end of the day, they were also subjective.

This reinforces the previous point — The important thing about ratings is that they allow you to make relative comparisons.

  • Ratings aren’t right (or wrong) just useful (or not) — This is another way to make the previous point. The important thing isn’t the specific rating but that ratings allow you to compare maps.

The bottom line is that ratings are not a substitute for critical thinking! Use your judgment.

Details

The next five sections describe how we rate each of the dimensions. They are sequenced to show the progression from simplest to most complicated to rate. While the resulting ratings are simple & intuitive on purpose, creating them can be quite complicated conceptually.

Rating Minority Representation

The simplest dimension to rate is minority representation where the raw measurements are also on a scale where where bigger is better:

  • We estimate the probable number of opportunity-to-elect districts for the individual minority groups in district Statistics (Hispanic, Black, Asian, Native, and Pacific)
  • We compare that to what would be a proportional number based on statewide demographics
  • That yields a simple percentage, and
  • We just just convert that 0–100% to a [0–100] rating

That’s very straightforward, of course.

The only wrinkle here is we estimate both opportunities-to-elect for individual racial & ethnic groups and potential coalition districts for all minorities combined in that way and then combine the results into an overall rating discounting coalition districts by half.

Rating Competitiveness

Ratings for competitiveness add a twist: an ideally competitive set of districts only has a ~75% competitiveness, so we have to transform [0–75%] to [0–100].

That, again, is pretty straightforward.

Rating Compactness

In both of the previous cases, the unrated scales make some sense directly. Rating compactness introduces the complexity of the underlying scale not meaning anything to a layperson: both Reock and Polsby–Popper measure mathematical aspects of shapes. As noted above, that begs the question: how good or bad are Reock and Polsby–Popper compactness measurements of 0.3373 and 0.2462, respectively?

To normalize these raw compactness measurements into [0–100] ratings, we have to establish what the “best” and “worst” thresholds are that should correspond to the maximum (100) and minimum (0) ratings. To do this, we looked at historical data and the values for ideal shapes, like an 8.5 x 11" piece of paper. We set the range for normalizing Reock to [0.25, 0.50] and the range for normalizing Polsby–Popper to [0.10–0.50]. Notice how they are different because they measure different aspects of shapes!

We then average those two individual ratings into an overall rating.

Rating Proportionality

For the dimension above, bigger was better for the underlying scales. Rating proportionality introduces the nuance of having to invert a scale: for disproportionality, smaller is better.

As with compactness, we had to set thresholds for what best and worst disproportionality percentages should correspond to 100 and 0 ratings. We chose [0–20%]. No disproportionality — 0% is clearly the best case — and based on historical data we chose 20% disproportionality to correspond to a zero rating. Any disproportionality more than that also gets a zero.

The one additional wrinkle is that as part of the rating process we adjusted the simple disproportionality to incorporate a two times “winner’s bonus” (like the efficiency gap). In other words, the greater the statewide vote share, the more you expect the seats won to be disproportionately more than the vote share.

Rating Splitting

Finally, there’s splitting. Rating county & district splitting is far & away the hardest dimension to rate.

Like (dis)proportionality, smaller is better for county- & district-splitting: the ideal value is 1.0 (no splitting), and bigger values measure the amount of splitting. So we have to invert the scales for splitting.

For all the other dimensions, the heuristics for transforming raw measurements into [0–100] ratings were static and the same across states. Rating county- & district-splitting introduces the complexity of dynamic scales: your expectation for how much counties & districts must be split in a map to achieve districts with ‘roughly’ equal populations depends on the ratio of counties to districts, but the number of counties and districts varies by state and type of map. So the thresholds for best and worst splitting depend on those too.

There’s always more counties than districts for congressional maps, there’s generally more districts than counties for lower state house maps, and upper state house maps have a mix and sometimes the ratio is close to one. So our approach to rating splitting needs to work for all those possibilities.

The best values for both county- and district-splitting metrics are 1.0 (no splitting). But not splitting any counties or districts in a map is not generally a realistic possibility. Hence, we use these thresholds instead:

  • County splitting — We assume that if a county is split, it is split only once into 95–5 (1.20) splits. When the number of counties is greater than the number of districts, we assume that the number of counties that have to be split is the ratio of districts to counties, and we assume the other counties are not split. When the number of counties is less than the number of districts, we assume that all counties have to be split.
  • District splitting — Similarly, we assume that if a district is split, it is also split only once into 95–5 (1.20) splits. When the number of districts is less than the number of counties, we assume that all districts have to be split. When the number of districts is greater than the number of counties, we assume that the number of districts that have to be split is the ratio of counties to districts, and we assume the other districts are not split.

We use those heuristics to set what translates into 100 ratings for county- & district-splitting measurements. After reviewing historical data, we decided to make the worst threshold ⅓ bigger than that.

As with compactness, we separately rate the individual county- & district-splitting measurements and then average the two to create the combined splitting rating.

Finally, we reserve an overall 100 rating for maps that don’t split any counties or districts — that can occasionally happen (in real world maps). So if this normalization process would otherwise yield a 100 but there’s some splitting, we decrement the rating to 99.

Appendix: Supplemental Ratings

For experts & power users, we rate (normalize) two additional metrics on the Analyze tab.

Rating Partisan Bias

In the Proportionality section, we rate partisan bias. As described in How We Rate Partisan Bias, we rate seats bias and votes bias separately and then average the two to create an overall [0–100] rating.

Unlike the dimensions above, here we use a non-linear rating function.

Rating KIWYSI Compactness

In the Compactness section, we rate how people judge compactness. Unlike the dimensions above where we calculate a value directly using a formula, here use a PCA model that approximates large ML model that mimics how people judge compactness including factors like the orientation of the shape.

As described in Compactness, we take that “know it when you see it” [1–100] smaller is better rank of compactness and invert that to a [0–100] bigger is better rating.

--

--

Alec Ramsay
Dave’s Redistricting

I synthesize large complex domains into easy-to-understand conceptual frameworks: I create simple maps of complex territories.