Week 4 Reading Post — The Truthful Art: Chapter 6 and Chapter 7
Chapter 6: Exploring Data with Simple Charts
In this chapter, Cairo explains the usefulness of exploring data in order to find patterns and trends, and touches on statistical analysis as a way to observe those patterns. Stories can be found in both the normal (smooth) patterns as well as the deviations from those patterns. The example used in the book was Ibeb, an index used in Brazil to measure the education quality of schools in every state and district. In calculating various measures of central tendency — the mode, median, and mean — one can discover trends and outliers the data. Cairo notes that it is important to know which statistic to calculate and use depending on the distribution of the data and any potential outliers that may skew the statistics. He also points out that taking the mean of a distribution of means cannot also represent the data accurately if the former range of values gives equal weight to varying sizes (i.e. the number of students in any given school). Another way to examine patterns and trends in a data set is to look at the shape of the data; this can be done using histograms, geographical maps, lollipop charts, strip plots (these are super cool), or violin plots, to name a few. The main point of this chapter is to highlight the need to explore data using a variety of statistical analysis and physical data representations. No one method can suffice, you must examine the data at different levels of detail and using different methods in order to find all relevant patterns and trends.
Chapter 7: Visualizing Distributions
In this chapter, Cairo delves more deeply into statistical analytics used to examine data distributions. The first of these is the standard deviation, which uses variance to calculate how far the data deviates from the mean or average value. Furthermore, one can calculate the standard score (z-score) to observe how far an individual raw score deviates from the mean as measured in standard deviations for that dataset. In calculating these values, you can manipulate and extend the data in order to examine deeper trends or patterns between data points and between data sets. Frequency charts can be a useful alternative to histograms for both exploring data sets on your own as well as revealing the data to viewers. Cairo notes in this chapter again, and I think this holds great importance for science communication and revealing data patterns that do not distort or hide certain parts of the data, that standardizing and averaging scores can be misleading for certain data sets. For example, estimates for small populations show more variation than estimates based on larger populations, simply because of the nature and size of the data distribution.