Communicating Disease

Published in

The Startup

18 min readJun 10, 2020

How a specific affection for Trump drove the social distancing divide.

On April 15th, small groups began to assemble outside the Michigan capitol building, simultaneously armed with guns, prophylactic masks, and a misunderstanding of the gravity of the public health crisis enfolding them.

Shortly after, on April 17th, President Trump tweeted out his support for the aims of these groups — issuing the rallying cry to liberate Michigan, Minnesota, and Virginia from the tyranny of public health measures imposed by their liberal state governors.

Meanwhile, Gov. Cuomo of New York was gaining significant public plaudits from another slice of the populace for his series of forthright press events that were evidently motivated by input from the scientific medical community and public health experts. Far from aiming at the emotions of an aggrieved set of voters who are notably receptive to anti-elite sentiments, Cuomo’s daily conferences were underwritten with the intent to provide understanding handed down from medical authorities and to contextualize the various restrictions his government was imposing.

Even much earlier than this, prior to the pandemic’s blossoming in the US, it was clear that there was a divide apparent along political lines between elected officials in how to understand and respond to the approaching pandemic. However, the onset of protests began to crystallize the notion that this divide might manifest itself in the actions of citizens of different political stripes.

Divides in political affiliation have analogs in media consumption — those filter bubbles within which we’ve understood we’re trapped for at least the past four years — with conservative and liberal media outlets presenting different information, or the same information differently, with different points of emphasis, different levels of concern, and different takeaways regarding, in this case, preparation and public health.

Therefore, there is reason to suspect that liberal and conservative citizens may exhibit different behaviors during the COVID epidemic, due to different mental models primed by the information presented to them about the nature of the disease and the appropriate safety measures to take in response.

However visible, these political-behavioral associations might ultimately reside more in the territory of casual rather than causal associations, the ready caricatures of stereotyping minds, than a broad look at the data might suggest. The appearance of armed protesters with protective masks and often confined to their cars, suggests a level of caution even among those who chafe at that the constraints imposed by the demands of public health authorities.

Moreover, while appearances suggest the protesters come largely from the political right, apparent relationships don’t always bear out in the data, and the small group of people motivated to protest is probably less important (despite the possibility of super-spreader events) from a public health standpoint than the overall behavior of the bulk of the population.

So, has media messaging had an observable effect? Do conservatives demonstrate less social distancing response than liberals? I’ll run through some descriptive statistics, a regression analysis, and some machine learning techniques which support the notion that areas that supported Trump have demonstrated less social distancing than those that didn’t. I’ll also show that the size of the effect of Trump’s influence on social distancing behavior varies with the political history and climate of each county: that Trump’s influence stands out above and beyond the (recent) historical position of each county on the liberal/conservative axis, and that counties which are more likely to face local political disagreement (i.e. who voted differently than their neighbors) are more sharply divided in their social distancing behavior than counties which are nestled in politically homogeneous enclaves.

You’ve probably seen a million and one COVID-19 maps in the past 12 weeks, and, yes, I definitely have more for you. To start, let’s look at the measured spread of COVID and the adoption of social distancing behaviors over time.

This first map shows a few variables which describe the spread of Coronavirus as well as social distancing metrics over time, and it allows one to begin to correlate in their mind the severity of viral spread and the social distancing actions taken in response to that spread with the political map. Coronavirus infection data is from the NYTimes Github repository, and social distancing metrics are sourced from Google and Descartes Labs.

We can see that the spread of the virus wasn’t particularly time-varying. Yes, it arrived to the coasts, and Colorado, first, but there wasn’t a significant amount of time before it had made itself known as a threat in most parts of the country. It does not appear that there was a great time difference between its widespread infection in any one place and its initial appearance in another, and no region of the country has been spared of coronavirus infection. By the second half of March, most states had confirmed infections, and Italy, Spain, France, and the UK were rife.

What this means is that there was not a great deal of latitude for one region of the country to learn from the experiences of another part of the country, and thereby devise ways of slowing the virus without requiring distancing. So, to the extent that people took coronavirus seriously as a public health threat worthy of avoidance, distancing was the only real avenue.

To begin building up the regression model, let’s first discuss the policy variable — media consumption. Unfortunately, I don’t have access to a broad, geographically precise and sufficiently disaggregated measure of the types of media being consumed by a broad swath of individuals. However, a group of economists led by Leo Bursztyn from Harvard did devise an excellent instrumental variables strategy based on the remaining daylight at the time of day that Hannity versus Tucker Carlson came on Fox News in different locations. Controlling for the ratings of each program and Fox News channel generally, they verified through independent observer coding that Hannity’s programming conveyed a lower level of warning relative to Carlson’s, and that this variation was connected to lower social distancing among those more likely to view Hannity than Carlson.

Its an excellent paper, and later I’ll add support to the notion that specific affinity for Trump — as has been well noted between the President and Hannity — is a strong driver of social distancing choices.

Instead of direct media , I’ll be using the percentage of citizens in each county who voted for Trump in 2016 as a proxy for media consumption. This has advantages in that it is a fairly broad survey of political opinion in each county, and has a strong correlation to media consumption. While it isn’t a direct measure, and suffers from degradation over time as a measure of current political opinion, its availability in terms of geographic detail makes it an appealing stand-in.

I measure the amount of distancing using data publicly provided by the movement tracking firm Descartes Labs, which has made available time series data on the average movement at a county level beginning on March 1st. Descartes Labs generates its estimates using opt-in cell-phone location tracking. Series are normalized such that a value of 100 represents the average movement levels in a county for the first 6 weeks of 2020.

An interesting add-on to this analysis would be to use something like Google Trends API to correlate searches for misinformation keywords related to the pandemic with social distancing behaviors.

If you’re uncomfortable making the leap to implicating media messaging, the use of Trump voting answers a slightly different question more directly: how much of the variation in social distancing is driven by political affiliation?

Social distancing behavior in total is the result of many different factors, forces, and circumstances. The situations people are embedded in do a lot to constrain their ability and willingness to forego public activity regardless of the messaging from their political officials and media outlets.

For instance, the more educated are in general more healthy, more wealthy, and may be more receptive to the guidance of scientifically-oriented public health authorities. They are also less likely to vote for and listen to Trump. Wealthier people, by virtue of the ability to store goods, hire delivery people, bunker up in larger homes, and work remotely, in turn ought to be much more able to undertake social distancing.

All of which illustrates the analytical task — isolating the effect of media messaging from other factors related to both media consumption and social distancing patterns.

This map shows the geographic distribution of a large set of variables that show distinct regional patterns and which are likely to be relevant to either health behaviors, political behaviors, or both. For many of the variables, the geographic patterns strongly resemble the political map.

The strength of public health institutions may influence social distancing behaviors, and may be correlated with political choices through a common underlying factor or mechanism such as funding priorities. They are also likely to be connected to the prevalence of chronic disease conditions in an area. So, including broad indicators of public health such as recent-historical chronic disease rates allows us to proxy for the state of these public health institutions. It also gives us a notion of where the most dangerous spots are for coronavirus to hit.

Likewise, for economic and demographic data — not all jobs are equally able to social distance, and the composition of jobs varies by region in ways that may be connected to both voting and media consumption. Common factors may underlie this association and are important to control.

Official response to the pandemic entered a new stage on March 16th, when the White House Coronavirus Task Force issued its first stay-at-home guidelines, while on the same day six Bay Area counties issued stay-at-home orders with full cancellations of public activities. While other regions had begun to partially shut down high-transmission risk activities such as bars and nightclubs, the White House’s action had a national scope, and the potential to break through previously cordoned off media ecosystems. It signaled a date at which the coronavirus needed to be reckoned with on a national scale, and that efforts would be needed to deal with it.

These features also make the date useful for determining the reaction of citizens in counties of varying political stripes using a difference-in-differences design.

The top panel in the above chart shows the average movement over time, relative to baseline, in counties that voted for Trump and Clinton respectively. After March 16th, a clear divergence begins to develop, which is fairly consistent over time, excepting convergences on weekends, especially Easter Sunday.

The bottom panel in the chart shows the result of a regression of movement on county and time fixed effects (exluding an effect for March 16th) and the percentage of votes for Trump in each county interacted with the set of time fixed effects to produce daily estimates of the effect of Trump voting on movement. The equation is:

County fixed effects control for time-constant factors inherent to a county, and in light of the relatively short horizon of the two months of this analysis and the longer stretches of time needed to change economic, demographic, and underlying chronic health conditions present in a county, effectively controls for the mitigating factors I discussed above. Time fixed effects adjust for the average movement levels in all counties as the coronavirus progressed, such that the regression estimates reflect relative changes associated with voting patterns.

Movement after March 16th was markedly higher in Trump voting counties relative to their behavior before that date, and was both statistically significant and quite consistent on weekdays across the time sample. These results suggest that estimating the effect of Trump voting across the time sample by simply looking at movement before and after March 16th is valid and stable, and that the difference between counties didn’t experience any reversals or catch-up behavior by Trump-favoring counties within this time frame. Splitting the sample before and after March 16th, the Trump voting coefficient is 53.33% (P-val <1.1E-12).

The effect of politically mediated risk messaging on distancing can actually be sharpened in ways that suggest current coverage is related to distancing, supporting the work of Bursztyn et. al., as well as Anderson (using a similar strategy on different dataset), and matching with results from studies using survey methods such as those of Allcott et. al. and Joseph van Holm et. al., and matches with data coming from a similar analysis in Brazil. Subtracting from the Trump vote the average vote for Republican candidates in national elections in the 4 years prior to 2016 yields a measure of enthusiasm for Trump above and beyond party-based support which may be traditional, habitual, generational, or otherwise ingrained. Replacing direct Trump voting with this measure of deviation-from-trend support for Trump in the regression gives a coefficient of 46.10 (P-value 2.94834E-8), which means that the relationship between political partisanship and social distancing is not simply related to a deeper conservative ideology, but appears to have a durable and powerful relationship to a specific ardor for Mr. Trump.

While the regression estimates above represent the average effect across all counties, an understanding of any variation in that effect would be useful as well. The maps above showed meaningful variation in health conditions and population characteristics like wealth and job composition that are related to the speed and devastation with which COVID may spread. Moreover, as regulations are relaxed or eliminated, the only effective check on transmission will be the behavior of the public. A public that has shown a strong response in terms of social distancing due to messaging informed by medical science is more likely to understand that the relaxation of regulations does not necessarily mean the abatement of the threat of spread, and thus more likely to continue efforts to reduce unnecessary contact and contamination.

From a statistical perspective, the average differences in liberal versus democratic counties may still demonstrate some time variation around that March 16th date, in ways that relate to geography. Obtaining estimates local to geographic regions helps eliminate concerns about falsely attributing the difference in social distancing behavior to a political gap that is really driven by differential timing (although, the consistency of the Trump effects through the end of April shown above speaks to the unlikelihood of this result).

This map shows the results of clustering counties based on economic and health characteristics (covariance matrices and PCA analyses suggest the k-means clusters are based on a clear set of underlying factors present), and then running the same regression within each cluster. In the interactive map, you can change the number of clusters if you’d like, which will narrow the population groupings. With five clusters, the counties partial out into some recognizable entities: Metro areas and the coasts (orange), a deep south band (teal), the Bible-Belt/Appalachia (green), the agricultural upper Midwest (blue), and a somewhat catchall category covering the rust belt, non-urban coastal areas, and some sparsely populated counties in the West (red).

The upper Midwest region carries the most concern about incorrect timing as cases arrived there later and more sparsely, leading to the concern that little difference in terms of social distancing would develop between counties around the March 16th split. Regardless of media bias, a low number of cases provides a rational basis for less concern, especially in largely rural, low-density areas. Another concern for the analysis of this group is the high degree of political homogeneity: most counties in this cluster were pro-Trump, which limits the regression’s ability to parse any apparent differences in social distancing. Nevertheless, the relationship between pro-Trump affiliation and greater mobility is highest in this cluster, and remains high across the different clusterizations. The estimate shown, 94.37, implies that a shifting a county in this region one standard deviation in Trump support (15.7%) would result in a nearly 15% change in average mobility.

The other region that shows a strong relationship is the Metro/coastal cluster. It appears that standing out as a Trump voting county among these relatively more liberal zones, counties where economic activity is more closely tied to cities, is associated with a greater degree of action related to messaging about the lower severity of the virus coming from Trump.

The only place that doesn’t show a strong and consistent Trump effect is the heavily pro-Trump Bible-Belt cluster — a region which nevertheless displayed a high degree of average movement, but in which the political uniformity was such that the regression could not determine the difference to be related to political affiliation.

Visual inspection of the cluster regression maps alongside the 2016 voting map suggests that there may be an association between the types of “local” political environments counties reside in and the effect of Trump voting on social distancing behavior. The deep south, the Metro/coastal, and the upper Midwest clusters show strong effects, and also contain many areas mottled with red and blue on the voting map. These features suggest that the differences in behavior may be stronger in areas where there is greater local ideological disagreement, and that small differences in Trump voting may be more meaningful in these areas — perhaps because people feel a greater need to demonstrate their ideological affiliation through action in heterogeneous environments, where the assumption of their affiliation is not as statistically straightforward.

To test this, I ran two non-linear models which produce estimates of the Trump voting effect on an individual county basis. Not having an a priori guide to how changes in Trump support might vary, I chose methods which allow the data to dictate the shape of differences in Trump support on social distancing.

The first method uses the same regression strategy as above, but models the Trump effect using a mixture of normal distributions over the range 0–100%. This specification allows me to estimate a flexible function in which the effect of additional Trump support varies depending on the level of Trump support, but which controls for time and county fixed effects in the same way as before.

The top panel shows the variation in Trump effect across the range of Trump support (with color indicating the density of observations), while the bottom panel plots each county against the average absolute difference in Trump voting percent between each county and its adjacent neighbors — a metric which proxies the degree of difference of opinion each county faces in its local environment.

Moving from right to left, the trend line shows that throughout the main density of the data between 50% and 85% support for Trump, both the average vote difference and the marginal effect of Trump voting on movement levels increase as Trump support decreases, rising to a peak at 50% (aside from strong effects estimated on outlying counties). Changes in the effect strength suggest that social distancing behavior isn’t just a matter of individual choice, but instead has a community basis. As the community of like-minded individuals grows and flips from minority to majority, there is an emboldening effect of each person’s contributed opinion on the average behavior.

When the community of Trump voting individuals is smaller, there also appears to be an anti-effect. The negative marginal contribution of Trump support around 30–35% suggests that when there is a sizeable minority of Trump voters in a county, additional Trump support results in increases in social distancing among the population as a whole. It is almost as if there is a counter-response to a materially significant presence of the opposing ideology.

The last method I demonstrate uses a machine learning method to both produce a flexible non-linear effect estimate for each county, but which also addresses the possibility that using Trump voting in the difference-in-differences regression is inapposite. It may be that the before/after difference in social distancing behavior we are attributing to Trump voting might bear a greater relationship to other factors associated with political affiliation and which better explain the differences in social distancing that developed after March 16th. For instance, income variations may have made a bigger difference in social distancing behavior after the announcement of stay-at-home orders than before: high and low income groups in an area may have both felt safe to continue to go to work, but only wealthier groups had the wherewithal to distance.

To that end, I implement an ensemble of deep neural networks to produce a mapping of the relationship between the before/after average movement level in each county to a large set of economic, demographic and health variables alongside both Trump vote and average Republican support. In mathematical terms, this modifies the regression by adding interactions between these control variables and the time indicator:

Using this mapping, I can then produce an estimate of the Trump effect via the gradient of the model with respect to Trump voting. Shown below, the estimates coming from the Trump effect are, as expected, lower than for the simple single-variable diff-in-diff. Nevertheless, they are still large and correlate strongly with the estimates from the mixture normal diff-in-diff above (r = 0.79).

While this method of estimation does not allow classical inferential tests of statistical significance, we can get a sense of the robustness of each county’s Trump effect in a Monte Carlo fashion. If we make perturbations in the Trump voting percentage for each county, and estimate the gradient resulting from each perturbation, we will produce a range of coefficient estimates. If these estimates are stable across the range of perturbations, then we know that the method is not producing knife-edge results based on an overfit model.

With neural net methods, one concern is that the high number of parameters allows the model to identify observations on numerical means that have no bearing on the counterfactual question — what would have happened if the Trump vote were different, and people were less likely to be tuning in to low-risk messaging? When the gradient of the model is consistent across a range of counterfactual scenarios, then we have greater confidence that the model is producing meaningful information about counterfactual scenarios.

Estimating the counterfactual scenarios in this Monte Carlo fashion for each neural network in the ensemble also allows us to produce an estimate of the variation in the counterfactuals across models. When this variation across models is large, then our predictions are based on a set of models that are each overfitting the data in a different way — and while the average of models across the ensemble might still produce a good fit, we have less confidence that our model is valid out of sample, where our counterfactual scenarios inevitably lay. However, where the variation across models is small, the data is suggesting strongly that each model takes the same opinion with respect to the effect of Trump voting on social distancing behavior. That each component model would do so even though there is significant freedom of parameters to fit the data however it might choose suggests a reliable relationship between Trump voting and social distancing behavior.

In the above map, hovering over a county shows the counterfactual change in the percentage of Trump voters in 2016. The bar heights show the magnitude of the Trump effect on social distancing behavior (averaged across the ANN’s in the ensemble) while the bar colors show the precision of the Beta estimates.

The map shows counties containing urban cores to have negligible effects of Trump sentiment on their social distancing behavior, while suburban and surrounding counties have much larger effects. This indicates the model is isolating outliers, as high density urban cores show low support for Trump and have no effective analog as a basis for behavioral comparison, while suburban counties have a range of political opinions to compare between. Effect sizes for these suburban counties are somewhat smaller than found for the OLS cluster regressions, suggesting that the OLS results were in part driven by non-Trump differences between urban and suburban counties, but that the suburban effect persists when the model is able to set urban counties aside.

Again, this map suggests the relationship between the effect of Trump on social distancing and the degree of heterogeneity in political opinion.

This final panel plots the average absolute difference against the Trump effect from the machine learning model, and again shows a positive relationship. The Trump effect on social distancing behavior is stronger where political disagreement is greater.

Not only is there an observable effect of political partisanship on social distancing behavior, its evident that it is widespread both across the country and within broad population segments — not merely the active and vocal but small groups of protesters showing up at state capitols. Moreover, this effect seems to be the greatest where partisanship is the most contentious and where people are most likely to come into conflict with or have to negotiate with those who hold differing opinions. And while the effect of media selection cannot be conclusively, causally demonstrated here, it is strongly suggested by the fact that it is affection for Trump, not simply a place on the historical political axis, that influences risky action in the face of a global pandemic. Others have suggested that non-partisan messaging is critical to get around this divide, but it remains unclear what kind of public messaging is resistant to the influence of partisanship in 2020.

Communicating Disease

Written by Russ Gerber