Golden Methods to Visualize Skewed Data
Key Techniques to Unveil Hidden Patterns in Clustered Data
If you are not a member, click here to read the full article.
Skewed data refers to data with highly uneven distribution: when the data of a variable is displayed as a histogram, the bulk of data points are either clustered on the left side of the distribution, with a long tail stretching towards the right (right-skewed), or the other way around (left-skewed), or in more complex skewed pattern. The long tail (sparse data) dominates the graphic space, and squeezes the majority of the data points to the plot corner, making it difficult to discern the underlying pattern of the clustered data majority.
In this article, I’ve summarized seven powerful strategies to visualize data with skewed distribution.
The graphics in this article are created with R, with most source code available here, unless otherwise specified.
Use Transparency, Open Circles, Colors, and Marginal Distribution to Unveil Clustered Data Pattern
The following scatterplot on the left displays the relationship between the housing sales and prices. The sparse data points at higher sales squeeze the majority of data to the left…