Gratuitous DAX and Senseless Violins
There’s an ever-growing pool of really useful custom visuals for Power BI, and I’ve been really excited to contribute to this recently.
Even after a few years, there are still some gaps and I’ve been looking to expand my knowledge with finding and developing high-utility charts that might still be needed. They take a while, so I’m not exaclty trucking them out, but it is time for a new challenger to enter the arena.
Enter the Violin Plot
For the uninitiated, if you want to understand more about the composition of your data you might have used a box plot to do this. They’re very handy, and whilst they show the distribution of your data through quartiles , they don’t show the shape of this distribution and you would need to do further analysis elsewhere.
A violin plot is essentially a way of achieving this patricular goal.
It’s a combination of a box plot, and a sideways kernel density estimation (KDE) plot, which, if you want an ELI5-style answer (and that 5 year old has done basic statistics), are analagous to a “smooth histogram”.
They can sometimes cause an initial chuckle or two (or seemingly never-ending if you’re Mrs. M-P), but they can be pretty handy for these distribution and statistical analysis use cases.
I’ve attempted to produce a violin plot that is pretty good for casual use but also allows some fine-tuning for those cases where data is… well, data, or for those who want to see how these variables can effect the plot and possibly (hopefully) provide more meaningful insights.
This wouldn’t have been possible without some really great work by Mike Bostock on how to plot KDE using d3.js (and d3.js of course), as well as Andrew Sielen for showing it can be done.
Tuning Your Violin
There’s a few factors than can have an effect on how this plot looks:
Kernel Function Applied
The kernel function is a particular algorithm that is used for the smoothing calculations. The visual contains the following kernels:
Kernel Bandwidth
The bandwidth is a parameter that affects the smoothing of the plot. Too low and it potentially identifies too many features; too high and it will smooth too much and miss some out.
By default, the violin plot visual will apply a rule-of-thumb calculation to derive a suitable bandwidth. This can work well for normal distributions but your mileage my vary for multimodal or skewed distributions. Fortunately you can opt to override this with a manual amount, e.g.:
The visual has a tooltip property menu, which allows you to specify a number of statistics for inclusion. The KDE bandwidth is not on by default, but you can enable this if you want to know what the visual is estimating. If manually specifying the bandwidth then the estimated will be included for info but shows as N/A, just in case you wish to compare these two values.
Sampling Resolution
Sampling resolution is analagous to the number of bins you might assign a histogram. Because everyone’s axes can be different we attempt to make this a bit more user friendly with three settings, e.g.:
Similar to bandwidth, the resolution may have an effect on the features displayed in your data. If more features are displayed than desired, even at standard resolutions, you can offset this by increasing the bandwidth.
I Want It!
Good! You can get this visual from AppSource or using the Power BI visuals marketplace within the application. There’s a sample workbook available to showcase some of the features.
The visual is also open source, if you want to have a look around.
I hope you find the visual useful, and if you have any feature requests, or issues to report, please head over to the support page for more details. While this is product of my free time, I’ll do my best to keep it truckin’.