How to Calculate Median the Right Way in PostgreSQL

Let me show you 2 different ways!

Sebastian Blue
Learning SQL
4 min readJul 11, 2022

--

Photo by Sincerely Media on Unsplash

Intro

Calculating the median has never seemed to be straight forward in PostgreSQL. There is no function available that directly calculates it like Python’s median function. So today let me share with you how to calculate the median in PostgreSQL.

What is median

The median is “the middle” value separating the higher half from the lower half of a data sample (ascending order). It is usually thought of as a “typical” value (like a mean) and sometimes as its better representation depending on the dataset because the median is less likely to be skewed by a small proportion of extremely large or small values compared to the mean.

How to calculate median

Unfortunately, calculating the median is not straight forward. We have to consider 2 cases depending on whether a dataset has an odd or even number of observations.

CASE1:

If dealing with an odd number of observations, the median is literally “the middle value” of the dataset. This is easy.

Ex. dataset = 2, 4, 6, 8, 10, 12, 14

Median = 8

CASE2:

If dealing with an even number of observations, there is no value that sits in “middle”. So the median is defined to be the arithmetic mean of the two middle values. So we have to actually “find and calculate” them.

Ex. dataset = 2, 4, 6, 8, 10, 12, 14, 16

Median = (8+10)/2 =9

How to calculate median in PostgreSQL

Ok we understand what the median is and how it is calculated. Now let’s talk about how to get it in PostgreSQL. And for now let’s use the dataset from Case 2. So we created a table with 8 records (even number of observations) in PostgreSQL. There are mainly 2 ways to get the median from this table.

Image by author

No.1 Use PERCENTILE_CONT(0.5)

PERCENTILE_CONT() calculates a percentile based on a continuous distribution of the column. So if you specify 0.5 as its argument, it returns the median using interpolation between two middle adjacent values if the number of observation is even (and returns just the middle value if odd).

No.2 Write from scratch using Window functions!

Just like PERCENTILE_CONT(0.5) we can get the same median value by using a combination of window functions. And as you can see in the WHERE rn_asc BETWEEN ct/2.0 AND ct/2.0 + 1 , this query interpolates between two middle values for an even number of observations, otherwise just finds the middle value.

So which one should you use, Use PERCENTILE_CONT(0.5)? or Write from scratch?

We looked at 2 examples of calculating the median, but which one should you use? Use PERCENTILE_CONT(0.5) is my answer, because it looks easier to read with only a few lines of code and seems to execute faster! Please take a look at and compare the performance from QUERY PLAN.

Execution Time is only 0.051 ms for ‘Use PERCENTILE_CONT(0.5)'
Execution Time is 0.088 ms for ‘Write from scratch'

Final thoughts

In this post, we talked about how there is no straight forward function that directly calculates the median in PostgreSQL like Python’s median function. Alternatively we can easily get it by leveraging PERCENTILE_CONT(0.5). I hope you learned something new!

References

  1. P. Pedamkar, PostgreSQL Median, EDUCBA

Go from SELECT * to interview-worthy project. Get our free 5-page guide.

--

--

Sebastian Blue
Learning SQL

Data Scientist / Teaching Python and SQL / Working in Japan