Hansol Rheem
Human Systems Data
Published in
4 min readMar 20, 2017

--

Old stuff is not always obsolete.

People are fanatical about the data visualization nowadays. Data visualization is an area that concerns how to present data in a pictorial or a graphical format to attract attention and facilitate understanding of the raw data. By visualizing the data in new and unconventional formats, people can identify hidden patterns which could not have been revealed from old and conventional formats. Plus, data visualization works as a brain-teaser which attracts viewers. Thanks to the benefits of data visualization, many experts started to depict the future where data visualization is the new norm. Gelman and Unwin (2013) carefully predicted that data visualization could replace the current conventional graphics of line plots and scatter plots just like they replaced the data tables a hundred years ago. Despite their efforts to speak in a neutral tone, it was not hard for me to read between the lines and figure out that they thought that the statistical side should depend more on the data visualization. However, I do not agree with their idea. I predict that such changes will happen slower than they predict. Gelman and Unwin state that the current conventional graphics have been around for about a hundred years, but they don’t mention why these graphics survived that long (They do actually mention one reason, but not in the context of why they survived that long). So, were statisticians stupid not to recognize the importance of the data visualization? That was definitely not the case according to the Cleveland (1985), and Chambers et al. (1983).

To illustrate my point, let’s first take a look at how Gelman and Unwin (2013) distinguish the statistical side and the infovis side. Gelman and Unwin describe that the statistical side is interested in finding effective and precise ways of representing data whereas the infovis side is interested in attracting the audiences’ attention and telling them a story. Moreover, they mention that the statistical side assumes fellow researchers as the target readers. These target audiences of fellow researchers are one reason why conventional graphics have been around for a long time and will not perish in the near future. The graphics themselves are some kinds of conventions that aid communication between researchers with relevant knowledge. That is, every detail of a graphic is predefined so that one researcher with knowledge of the type of the graphic and the data, can easily extract information from it. For example, let’s assume that we are viewing a regression plot that describes the causal relationship of two variables. If we were statisticians, we would know that the x-axis represents the predictor (the cause), and the y-axis represents the response (the result of the cause). We can retrieve these kinds of information in the blink of an eye. Therefore, when Gelman and Unwin said conventional graphics lack of communicative function, they were taking the “average” audiences into account. However, what is the possibility that a fisherman will read a research paper on “Physical properties of carbon nanotubes”? In reality, conventional graphics are actually effective communication methods that facilitate effective communications between area-specific audiences. Gelman and Unwin also mention about graphics that are not boring. Yes, it does not hurt for a graphic to look interesting. However, such interesting graphics often lack core information which should be communicated, thus will only confuse the audiences. In this sense, I believe that Gelman and Unwin overgeneralized the audience of infographics to the audience of statistical graphics, even though they tried hard not to give that impression.

Another reason why conventional graphics will be around longer can be found on Gelman’s blog written in 2016. In this blog post, he criticized of p-hacking practices. The p-hacking refers to the use of data to uncover patterns that can be presented statistically significant without a hypothesis to explain the underlying casualty. The p-hacking practices include testing more participants than planned and testing various methods to find significant results. However, wasn’t taking various perspectives on the data, the fundamental part of data visualization? I do not know why Gelman made two contradicting arguments. What I know is that data visualization can only play the limited role of providing insights and alternative perspective in the statistical side, considering current concerns on the p-hacking practices. This indicates that insights from data visualization are highly unlikely to contribute to the main point of a research paper unless such visualization was previously planned. Again, data visualization benefits the most when it can be performed without restrictions to when and how it can be performed.

At this point, I would like to make it clear that I am NOT skeptical about data visualization. Certainly, the data visualization has many benefits and has helped us reveal important findings that conventional graphics failed to reveal. I also agree with Gelman and Unwin’s distinction of the statistical side and the infovis side, and that they each pursue different but important objectives. What I don’t agree is the argument that the new ways of representing the data (influenced by the infovis side) will replace the conventional graphics in statistical side within few decades. Such argument is based on the assumption that there are problems and limitations in the conventional methods that statistical side uses. Fancy looking and interpretability of a graphic can certainly benefit to some extent, and conventional graphics lack of these components. But, do these limitations imply that conventional graphics are obsolete and constitute the reason they should change within few decades? I say no for reasons I described above.

References

Chambers, J. M., Cleveland, W. S., Kleiner, B., and Tukey, P. A. (1983), Graphical Methods for Data Analysis, New York: Chapman and Hall.

Cleveland, W. S. (1985). The elements of graphing data (pp. 135–143). Monterey, CA: Wadsworth Advanced Books and Software.

Gelman, A. (2016, September 16) Statistical Modeling, Causal Inference, and Social Science. [Web log post] Retrieved March 8, 2017, from http://andrewgelman.com/2016/09/21/whathashappeneddownhereisthewindshavechanged/

Gelman, A., & Unwin, A. (2013). Infovis and statistical graphics: different goals, different looks. Journal of Computational and Graphical Statistics, 22(1), 2–28.

--

--