Golden Methods to Visualize Skewed Data

Key Techniques to Unveil Hidden Patterns in Clustered Data

Bo Yuan, Ph.D.
6 min readJun 20, 2024
Image by author. Exemplary data viz. for skewed data.

If you are not a member, click here to read the full article.

Skewed data refers to data with highly uneven distribution: when the data of a variable is displayed as a histogram, the bulk of data points are either clustered on the left side of the distribution, with a long tail stretching towards the right (right-skewed), or the other way around (left-skewed), or in more complex skewed pattern. The long tail (sparse data) dominates the graphic space, and squeezes the majority of the data points to the plot corner, making it difficult to discern the underlying pattern of the clustered data majority.

In this article, I’ve summarized seven powerful strategies to visualize data with skewed distribution.

The graphics in this article are created with R, with most source code available here, unless otherwise specified.

Use Transparency, Open Circles, Colors, and Marginal Distribution to Unveil Clustered Data Pattern

The following scatterplot on the left displays the relationship between the housing sales and prices. The sparse data points at higher sales squeeze the majority of data to the left…

--

--

Bo Yuan, Ph.D.

Postdoctoral Fellow at Harvard University | Chem Bio Scientist | Data Scientist | Founder of DataBrewer.co.