Dangers in Visualizing Data: Looking at Mother Jones Mass Shootings Database
Visualization empowers data journalists to tell compelling stories, but story telling requires certain omissions. Assumptions on how to handle analysis can lead to vastly different conclusions. Sometimes, as an unintended consequence, visualizations can be misleading.
As a platform largely driven by visual imagery, social media content is often spread without detail of underlying assumptions. I recently came across a visualization modeled after a Mother Jones mass shooting database. This write up will attempt to test the general thesis of that visualization, and I will explore my own visualizations of the data set.
I understand the sensitive nature of this topic, and in no way do I intend to undermine the horrific situation that America has found itself in. The impact these shootings have had is immeasurable. We can not quantify the suffering and pain of victims and their families or the psychological damage we are imposing on our children as they run drills preparing for school shootings. Mass shootings are a systemic problem requiring action. Every day that congress refuses to act casts a murderous shadow on America.
Looking at Total Injury Count as Measurement of Violence
In a couple of the political groups I frequently visit, I noticed people share a visualization which measured the total injury count under each president. The Purpose of this graph was to prove that violence has intensified under president Trump. The graph looked like this
Looking at the above graph, specifically with the word “average”, one assumes a tremendous increase in violence under Trump. However, assumptions which underlie this visualization, may lead to exaggerating this claim.
As Trump has yet to complete a full term, the analyst decided to forecast out how many injuries and deaths there would be if Trump was in office a full term. On it’s own, this is a perfectly valid forecasting assumption. President Obama shouldn’t be penalized for being in office almost three times longer than president Trump. However, the analyst has done nothing to address the issue of outliers in the dataset. As seen on the graph below, the Las Vegas shooting was a tremendous outlier
The Vegas shooting left nearly 600 injured, nearly exceeding the total injuries from mass shootings under any president(Obama in 2 terms 616 total injuries). Outlier removal is a very subjective decision. Looking back at our original social media graph, not only is this horrific event counted in the average, it is counted nearly 2 times as Trump has only been in office for 3/5th of his presidential term. Objective analysis needs to remove or at least impute this event with the average.
The graph above on the right shows all mass shootings excluding the Las Vegas shooting. The color coding is by presidency. It’s clear that neither graph shows an increase in a linear relationship between time and the size of total injuries from mass shootings. Visually inspecting the graph on the right, it seems that there are larger clusters in the past decade(more events). Overall it doesn’t appear that shootings themselves have become more deadly. Scatter plots don’t give us the information needed to test the theory of an increase in mass shooting violence.To test that theory, we need to aggregate the data by term.
How We Approach Aggregation of Injury Data
Below I graph four different scenarios
Any of these graphs could be valid options to explain the same data. In the top left graph, we are looking at the unedited data “Total injuries from mass shootings” . Trump’s term appears rather violent, as does Obama’s term. In the top right we are making the assumption that Vegas shooting needs to be excluded, and by these standards, the Obama presidency seems to have suffered from the most mass shooting violence. On the bottom left, we have a graph which represents the original graph I took issue with. We see a substantial increase in injuries during Trump administration. The bottom right graph, is our best bet at a valid graph. It excludes the outlier Las Vegas shooting, but it also attempts to normalize all presidential terms. The bottom right graph shows that mass shooting violence is in fact worse under Trump and it shows violence has been building over time.
Comparing Mass Shooting Events By President
Mass Shooting Events are Surely Increasing?
Perhaps, but not conclusively. This is why observational data is so difficult to deal with and draw conclusions from. Shootings are becoming more frequent, but the median injury rate seems to be decreasing. Seeing this relationship makes me wonder about confounding variables. Today, we are more keenly aware of mass shooting events and therefore more likely to identify them. This likely biases older data to encompassing only larger mass shooting events thus higher injury counts per event. But this also biases the data against keeping track of frequency accurately, thus more modern day mass shooting events. Therefore, it’s difficult to say that the increase in frequency we are seeing is not simply due to an increase in reporting.
To avoid this conflict we need to focus on the Trump and Obama presidency, where reporting should be similar. To do so I will break down the data into quarterly aggregated data. But breaking the data down to quarterly data we suffer from very small sample size for Trump (n=10), so statistical tests are not valid.
The Las Vegas shooting makes comparison difficult so above I plotted the quarterly distribution of mass shooting injuries with and without Las Vegas. The distribution plot(without Vegas) was plotted on the left as well. You can tell it does appear there is an increase in volume and average injury during Trump. Our sample size is very low. We can statistically test the difference in total injuries by running logistic regression or by some non parametric test. I ran both with the assumption that our distributions are zero inflated and over-dispersed, and it does appear there may be a difference in the means, but given our low sample size it’s difficult to say so conclusively. None of my statistical models attempt to detrend the data and given the large increase since 2008, doing so seems difficult.
Overall there may in fact be some validity to using injury counts as a proxy for violence in mass shootings under Trump. However, given the original graph, it is quite clear that the relationship is nowhere near as significant as the initial visualization would have people believe. Once again there was nothing statistically incorrect about the original graph. It merely relies on different statistical assumption about how to handle the data. I may ideologically agree with the intended effect that such a graph may illicit, but it serves to highlight how visualization can lead to an exaggerated conclusion
There is no way the we can account for overall changes in reporting methods. Even if we could, this wouldn’t imply the upswing is caused by the Trump presidency. There does appear to be a significant upward trend in the data, some of which is likely due to better reporting. But claiming the Trump administration is responsible for more shootings, would be like claiming it is responsible for an increases in temperature. Yes, it is likely that Trump’s rhetoric is inciting violence, but measuring that effect via mass shootings and injuries from mass shootings isn’t statistically sound.
I really did not mean to sound like Neil Degrasse Tyson , I just thought this was worthy of some discussion.
To see the code for the above article, please visit my github repository