Basic Statistics on Power BI

Using Statistical Measures in Power BI with and without DAX

Douglas Rocha
9 min readSep 7, 2022

Hi! I’m Douglas and I’m a Data Science enthusiast trying to learn everything I think I need to become a Data Scientist.

In this learning path toward becoming a Data Science, I have been introduced to Statistics in Power BI and the importance of this subject in the whole of Data Analysis. Looking at charts, visuals, and graphs is a good form of summarizing data, but often not the best. Statistical measures can also give a really good insight into data, especially when numeric values are more understandable than images.

More specifically I learned all of that in the eleventh chapter of Microsoft Power BI for Data Science, a course taught by Data Science Academy. Thus, I shall exhibit here some of my learnings from that chapter. For one thing, I shall use as an example here the same example used in the course: a Car Sales dataset from an imaginary car selling company from Brazil. To be clearer, I will analyze the sale value of cars from that company.

Before I start there is another thing I want to make clear: this is not a statistics summary but rather a simple tutorial on how to use common statistical measures in Power BI.

How to Create a Measure in Power BI

So, first of all, I should go into how to create a measure in Powe BI. A measure is a value calculated over the data you have. It is useful to have values such as the mean, median, or other calculated values ready to use and constantly updated if anything should happen o the data. To create them in Power BI you must find the New measure icon on your menu.

Location on Report View
Location on Data View

And a typing bar will show up where you can write the DAX expression that will give you your measure. Yes, you do need to learn some amount of DAX to properly use measures. Can you do statistics in Power BI without DAX? Yes, you can, you can do it without measures as well and I will teach you how at the end of this tutorial, but for now, we will take a look at how to do it with DAX.

DAX typing bar

After typing the expression you want you can click on the “Ok” icon or just press Enter and the measure will be created and available on your right-hand side in the Fields section.

Mean or Average, Median, and Mode

The most basic (and yet powerful) statistical measure one can think of is definitely the average (also called mean). It is calculated by summing over all of the values and then dividing by the number of values summed over. The mathematical expression is:

Average or Mean of n values of x

And to have that turned into a measure you first perform the first step and create the measure you will call “Mean” and will store the result of the following expression:

Mean = AVERAGE(CarSales[Sale Value])

The Mean is one of the statistical measures Power BI has a ready DAX function for, but we will meet some that aren’t so easy to use. Median is not one of those. That is the number in a sorted list of the data (or, rather, the mean of the middle two numbers if there is an even number of values) and it is easily calculated with the corresponding native DAX function and stored in a “Median” Power BI measure.

Median = MEDIAN(CarSales[Sale Value])

Last but definitely not least is the Mode. Even though one can assume it is the simpler of all statistical measures since it is only the most common value in the dataset, there isn’t a native DAX function to calculate it. Thankfully, the Power BI community is great and has built a repository with the expression that calculates most statistical measures, it is called DAX Patterns — Statistical Patterns. In there they have an expression (or, as they call it, a pattern) to calculate the mode, and it is as follows:

Mode =MINX (    TOPN(        1,        ADDCOLUMNS(            VALUES (CarSales[Sale Value]),            "Frequency", CALCULATE(COUNT(CarSales[Sale Value]))
),
[Frequency], 0 ), CarSales[Sale Value])

Variance and Standard Deviation

On the matter of evaluating the variation in the data, the most commonly used measures are these two. The Variance is the average of the squared deviations, that is, the average of the squared differences between the values of the variable and its mean.

Variance of n values of x

It is denoted as s² because the Standard Deviation, which is fairly more used than the variance is actually the square root of the variance. It is then denoted s while the variance is called s².

Standard Deviation of n values of x

Both of them have native DAX functions and can easily be called through:

Variance = VAR.P(CarSales[Sale Value])

and:

Standard_Deviation = STDEV.P(CarSales[Sale Value])

Both have a .P and .S versions. They differ in the matter of making the calculation considering it is the whole population or a sample. We are not dwelling on that matter here.

Percentiles and Quartiles

Understanding Percentiles is far easier than it is explaining them. A simple definition is: the P percentile is the value such that P percent of the data lies below it. Two examples that may help you understand are:

In a list of numbers from 1 to 100, the 80th percentile is 80 itself, once 80% of the data is equal to or smaller than 80.

In a list of numbers from 1 to 200, the 80th percentile is 160, once 80% of the data is equal to or smaller than 160.

But, nevertheless, it may be better for you to seek a statistics book/article/paper explaining it properly. In the matter of what we are trying to explain here, you can find the P percentile of your data in Power BI using the native DAX expression for it:

Percentile80 = PERCENTILE.INC(CarSales[Sale Value], 0.8)

In this case, I used the 80th percentile as an example, but to change it to any P value one needs only to change the second argument of the function to P/100 as I have here.

The Quartiles are only a small group of relevant percentiles that statisticians wanted to give a different name to. They are:

  • The 1st Quartile or Q1 is the first quarter of the data or the 25th Percentile;
  • The 2nd Quartile or Q2 is the middle of the data (yes, equal to the median) or the 50th Percentile;
  • The 3rd Quartile or Q3 is the third quarter of the data or the 75th Percentile.

There isn’t a native DAX function to calculate them, but you can simply use the percentile function to do so:

Quartile25 = PERCENTILE.INC(CarSales[Sale Value], 0.25)

Building the Dashboard

Having all of the measures ready and calculated is really good, but it isn’t worth much if your boss has to look into your Data view to see them. That is where building a dashboard with this information comes into place.

One thing you use is a Column Chart where you can see the Mean, Median, Mode, and Standard Deviation. It can be an odd form of visualizing these measures but it can be rather effective in some cases.

This already gives you the information that the average sale value is way higher than the middle and most common values, which indicates the existence of Outliers, points that are too far from the rest of the curve like (as is the case for this dataset) cars sold for over 180 thousand Reais. That information is also confirmed by the high Standard Deviation, showing there is really a large variation between values in the dataset.

Furthermore, something interesting you can do is visualize these metrics segmented by some other variable, just plug that variable in the X-axis and Power BI will do the rest. You can look at these metrics by state:

By manufacturer:

And even by color:

One may for some reason ask why I have not included the Variance in the chart. I hope this answers:

The may reason we use the Standard Deviation instead of the Variance is this: the Variance’s scale is quadratically proportional to the scale of the real values. Thus, taking the square root of it (the Standard Deviation) gives a more interesting measure to observe.

The percentiles and quartiles aren’t so easy to turn into visualizations. Or, rather, they are, through the use of the Boxplot, but Power BI doesn’t offer a native way of creating one. Instead, one could have the weird idea of plotting the 3 quartiles in a Column chart to kind off simulate what a Boxplot would inform.

Statistical Dashboards Without DAX

As I have said at the beginning of the article, there is another way to find these statistical measures without DAX or the Measure feature. It is also far more simple, actually, but more limited as well. To show it, I will use the Matrix as an example.

When I plug the “Sale Value” column in a chart, the value that appears is really the sum of that column. In reality, it rarely makes sense to show all of the values in a column with continuous values, so Power BI automatically chooses to give us a summary of that column, usually the sum. But we can change that summary into a statistical measure like so:

When you click that down arrow beside the name you can see the menu with some options, among which are the possible summaries one can use. You can change the summary each time you plug the column into the chart and have this:

And now you can do the same thing you did with the DAX measures and plug other variables in the chart to drive your conclusions from them.

Statistical Measures by State
Statistical Measures by Manufacturer

Obviously, with this approach, you are limited to this limited set of measures. Using DAX expressions gives you more freedom and power to use the metrics you want and even develop more intricate ones that better suit your specific case.

This is a quick and simple tutorial on using the most basic statistical measures in Power BI and should be looked at as that. I do not and did not expect to explain statistical concepts or dive into more theoretical affairs. My goal is fulfilled if you now know how to use Power BI’s help to find some statistics of your dataset. Hope I’ve helped you in some way and see you next time!

--

--

Douglas Rocha

Software Engineer | Working Java, React, SQL and Python | Writing Best Coding Practices, Clean Code and Software Engineering