Measuring County & District Splitting

Alec Ramsay

Published in

Dave’s Redistricting

8 min readJun 3, 2020

This note explains how we adapted Professor Moon Duchin’s metrics for county- & district-splitting in DRA 2020.

The detailed math is fleshed out in the appendix.

After first summarizing the concepts, I explore four issues:

Removing the implicit penalty for whole “splits”
Accommodating the variable ratio of counties to districts across states
Normalizing splitting scores, and
Combining county- and district-splitting scores into one metric

I explain each issue and how we enhanced the basic metrics to account for it.

Concepts

The metrics and procedure for computing them is straightforward. In summary, the county-splitting metric measures how much districts split counties:

where wⱼ are the county populations divided by the total population and fᵢⱼ are the district-county combinations (or “splits”) divided by the county populations.

As Professor Duchin notes, this is a “modification of the classical Shannon entropy which measures how much two different partitions cut each other into pieces.” She uses “square roots instead of logs … to substantially penalize small ‘nibbles’ that cut off the corner of a county, whereas Shannon entropy considers a 99–1 split to be negligibly worse than an intact county.”

The district-splitting metric analogously measures how much counties split districts:

where xᵢ are the district populations divided by the total population and gᵢⱼ are the district-county combinations divided by the district populations.

To get an overall splitting score to drive MCMC, Professor Duchin combines these scores by adding them together:

Removing the Penalties For Whole “Splits”

Those metrics implicitly penalize two desirable redistricting configurations — districts that contain entire counties and districts that consist entirely of one county — because both instances are treated as additional splits in the formulas.

Our approach to removing these penalties is simple and straightforward. Assuming that a state has i districts and j counties and that (Dᵢ ∩ Cⱼ) are the population counts when counties and districts intersect, then:

Expand the matrix to include dummy 0th counties and districts
Consolidate all single-county districts within a county across into one uber 0th “split,”
Similarly consolidate all whole counties contained within a district up into one uber 0th “split,” and
Then use the same metrics on these consolidated splits.

This is illustrated in the figure below.

Consolidating single-county districts only affects the county-splitting measurement in six states’ most recent congressional districts, because most states don’t carve entire districts out of large counties even though many can. When such districts exist, however, consolidating them as described above reduces the county-splitting score ~15%.

In contrast, consolidating whole counties contained within districts affects the district-splitting measurements in all 43 states with two or more congressional districts, because many entire counties are contained within single districts. Here the impact on the measurements is nearly a 40% reduction in reported splitting.

Hence, we consolidate or “reduce” splits before applying the entropy-based metrics described above.

Accommodating the Variable Ratio of Counties to Districts

The basic metrics also don’t account for the wide state-by-state variability in the ratio of counties to districts (1.1 to 31.0). The higher the ratio — i.e., the smaller counties are relative to the number of districts — the less splitting you would expect in a state, all else equal.

As the plot below shows, the consolidated county-splitting measurements are significantly correlated to the ratio of counties to districts in states. As expected, the more counties there are relative to districts, the lower the typical county-split scores, because more smaller counties are fully contained within single districts.

Within a state the county-splitting measurements correctly assess the relative county-splitting of various redistricting plans, of course, but the variable ratio of counties to districts makes comparing scores across states problematic.²

We toyed with trying to normalize the splitting scores to take into account the ratio of counties to districts, but it quickly became cumbersome to explain & rationalize the decisions. So, our approach is to only use these county- & district-splitting metrics to compare plans within a state.

Normalizing Overall Splitting Scores

The raw values of these entropy-based splitting scores aren’t all that meaningful to lay people: How good or bad is 1.23? To normalize them so they could be easily interpreted, e.g., [0–100] with bigger better, we needed to establish thresholds for acceptable levels of county- and district-splitting.

The best values for both county- and district-splitting metrics are 1.0, i.e., no splitting. But not splitting any counties or districts is unrealistic. These seemed like more practical ideals:

County splitting — We assumed that if a county is split, it is split only once, and we assumed that the number of counties that split are split equals the number of county splits one expects split according to Euler’s Theorem (the number of districts minus one). We assumed the other counties are not split. Finally, we assumed that counties that are split would be split 90–10 (1.26), i.e., 90% in one district and 10% in another.
District splitting — Here we assumed that each district is split 90–10 (1.26), i.e., 90% whole counties and 10% a fragment of a county to achieve roughly equal population.

Since we normalize analytics scores into the range [0–100] so bigger is always better and there is a consistent dimension, these ideals should translate into scores of 100.

For the other end of the ranges, we had to establish the “worst acceptable splitting” thresholds. In other words, “What raw score (and anything bigger) should translate to a normalized score of zero?”

We decided to use a simple ⅓ worse heuristic, i.e., the practical ideal value multiplied by one plus ⅓. This seems to give a good range of scores for congressional districts.

For state legislative plans, we take a similar approach. Because the ratio of counties to districts is much higher than for congressional districts, however, we use a much more stringent splitting ideal. We assume 99–1 (1.09) instead if 90–10 (1.26) ideal splits.

Combining County- and District-Splitting Into One Metric

The final issue is how to combine county- and district-splitting scores into a single metric. In other words, “What should the relative weights of county-splitting and district-splitting be in an overall splitting score?”

Duchin simply adds county- and district-splitting scores to get an overall value. This weights the two components equally and makes the ideal score (no splitting) 2.0.

To make it easier to normalize the combined results, we averaged the two scores so the ideal is 1.0. This keeps the weights equal which is the simplest and easiest weighting to defend.

Appendix: Procedure for Measuring County & District Splitting

This appendix describes how to compute Professor Duchin’s county- and district-splitting metrics.¹

(1) Sum total population by county and district.

C = An array of total population by county, Cⱼ.

D = An array of total population by district, Dᵢ.

(2) Sum total population by county-district combinations.

The intersection of districts and counties is the same as the intersection of counties and districts. Index these county-district combinations as C[j]D[i], where j is the number of counties and i is the number of districts.

(3) Weight the combinations (“splits”) by county and district size.

The county-weighted split fractions are the county-district combinations divided by the county populations:

The district-weighted split fractions are the county-district combinations divided by the district populations:

(4) Create county and district weights.

The weight of each county is its fraction of the total state population:

The weight of each district is its fraction of the total state population:

(5) Calculate the county- and district-splitting metrics.

Counties can be split by districts, and districts can be split by counties, so there are two metrics.

For county splitting, see section 6.1.1 How much do the districts split the counties? In the Duchin’s Appendix.

This is the split score for one county:

And this is the overall county splitting score:

For district splitting, see section 6.1.2 How much do counties split the districts? in Duchin’s Appendix.

This is the split score for one district:

And this is the analogous overall district splitting score:

These two scores — how much districts split counties and vice versa — can be combined into an overall score:

Footnotes

Professor Moon Duchin introduced these metrics in Appendix 6 of her report for the Pennsylvania Supreme Court (link).
This is similar to problems with measures of geometric compactness (e.g., Reock and Polsby-Popper) where the shapes of states and their counties also make cross-state comparisons difficult.

Acknowledgements

I am indebted to Daniel B. Reeves, PhD, a research fellow at the Fred Hutchinson Cancer Research Center, who helped me sort out the math, and to Ian Wahbe, who helped me render it in LaTeX.