Coalesce? Hmm… What’s that?

Sayantani Mitra
CRM Analytics
Published in
7 min readApr 12, 2021

What is Coalesce?

Per Oxford Dictionary, coalesce is described as:

“coalesce (into something) to come together to form one larger group, substance, etc.”

What about coalesce it in terms of SQL or any other SQL based query language?

Oracle says, “COALESCE returns the first non-null expr in the expression list. At least one expr must not be the literal NULL. If all occurrences of expr evaluate to null, then the function returns null.

coalesce

Thus, combining the two definitions above, we find that coalesce can:

  1. Join two or more things or groups or dimensions to form a larger one.
  2. Replace nulls with the first non-null value in the expression.

As part of Tableau CRM, coalesce is an extremely important function that can solve a lot of puzzles easily!

Coalesce in Tableau CRM

Case 1 — Dimensions:

We have all come across scenarios where we need to group the data by more than one stream or datasets. And coalesce makes it easier to do that!

When we take two datasets or two streams from the same dataset (usually filtering), the first question is, now how do we merge them into one?

Simple, easy solution to our rescue! coalesce!!

To show this, we are going to go back to our usual taxi dataset and divide the data into two streams. What we want to find is how the weekdays and weekends of the month stack up against each other. So, Monday in Jan compared to Monday in Feb, so on and so forth.

But before we filter the data into Jan and Feb, we need to prep the data. That is find the day of the week!

Tableau CRM has a very simple formula to find this! day_in_week(toDate(‘Close_Date_sec_epoch’))

It gives us the days as 1,2,3…7 where 1 == Sunday, 2 == Monday …

We start with grouping the taxi data by Year-Month and Day (separately). Then use the formula above with a slight modification to create a column called Weekday and convert it into a string (The result of the below statement is a number!). At this step you can add a case statement and convert the days to Sunday, Monday etc.

day_in_week(toDate(max('Trip_Start_Timestamp_sec_epoch')))

We are using max because:

  1. We are using epoch which is a number. And a non-grouped measure needs to have an aggregate function. In this case always use max or min because it doesn’t matter. All we need is the day of the year. Caution: Don’t use average because it can lead to a non whole number (number followed by decimals) which cannot convert to epoch.
  2. Also, we will divide the data based on Year-Month. Hence we keep the Day separate. This will make it easier for us to divvy the data up.
Day of the Week

Till now, everything we did was in compact mode. But now, we will need to convert this to SAQL or Query Mode.

Click on the Query Mode on the top right of the widget/lens and it looks something like this.

q = load "Taxi_Trips";
result = group q by ('Trip_Start_Timestamp_Year', 'Trip_Start_Timestamp_Month', 'Trip_Start_Timestamp_Day');
result = foreach result generate q.'Trip_Start_Timestamp_Year' + "~~~" + q.'Trip_Start_Timestamp_Month' as 'Trip_Start_Timestamp_Year~~~Trip_Start_Timestamp_Month', q.'Trip_Start_Timestamp_Day' as 'Trip_Start_Timestamp_Day', number_to_string(day_in_week(toDate(max(q.'Trip_Start_Timestamp_sec_epoch'))), "##") as 'Weekday';
result = order result by ('Trip_Start_Timestamp_Year~~~Trip_Start_Timestamp_Month' asc, 'Trip_Start_Timestamp_Day' asc);
result = limit result 2000;

What we will do now is to filter the data based on Month just before the order statement.

q = load "Taxi_Trips";
result = group q by ('Trip_Start_Timestamp_Year', 'Trip_Start_Timestamp_Month', 'Trip_Start_Timestamp_Day');
result = foreach result generate q.'Trip_Start_Timestamp_Year' + "~~~" + q.'Trip_Start_Timestamp_Month' as 'Trip_Start_Timestamp_Year~~~Trip_Start_Timestamp_Month', q.'Trip_Start_Timestamp_Day' as 'Trip_Start_Timestamp_Day', number_to_string(day_in_week(toDate(max(q.'Trip_Start_Timestamp_sec_epoch'))), "##") as 'Weekday';
res_jan = filter result by 'Trip_Start_Timestamp_Year~~~Trip_Start_Timestamp_Month' == "2017~~~01";
res_feb = filter result by 'Trip_Start_Timestamp_Year~~~Trip_Start_Timestamp_Month' == "2017~~~02";

Our two streams in this case are res_jan and res_feb filtered to the Trip_Start_Timestamp Year-Month “2017~~~01” and “2017~~~02” respectively. Notice the “~~~” here. Though the Year-Month shows as “2017 — 01” and “2017–02”, it is being converted by Tableau CRM from “2017~~~01” and “2017~~~02” to these when we use the term ‘Trip_Start_Timestamp_Year~~~Trip_Start_Timestamp_Month’. Hence we filter on these values instead. And we can also replace “~~~” by “-”. That makes it simple.

Next, we will group the res_jan and res_feb dataset by ‘Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month’ and ‘Weekday’ and use coalesce to do that!

q = load "Taxi_Trips";
result = group q by ('Trip_Start_Timestamp_Year', 'Trip_Start_Timestamp_Month', 'Trip_Start_Timestamp_Day');
result = foreach result generate q.'Trip_Start_Timestamp_Year' + "-" + q.'Trip_Start_Timestamp_Month' as 'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month', q.'Trip_Start_Timestamp_Day' as 'Trip_Start_Timestamp_Day', number_to_string(day_in_week(toDate(max(q.'Trip_Start_Timestamp_sec_epoch'))), "##") as 'Weekday';
res_jan = filter result by 'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month' == "2017-01";
res_feb = filter result by 'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month' == "2017-02";
res_join = group res_jan by ('Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month', 'Weekday') full, res_feb by ('Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month', 'Weekday');
res_join = foreach res_join generate coalesce(res_jan.'Weekday', res_feb.'Weekday') as 'Weekday', coalesce(res_jan.'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month', res_feb.'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month') as 'Year-Month';

The coalesce part of the res_join statement here just says that join the two streams res_jan, res_feb on Weekend. And the result is:

Weekday with Year and Month

Grouping this a second time by Weekday and Year-Month results in:

q = load "Taxi_Trips";
result = group q by ('Trip_Start_Timestamp_Year', 'Trip_Start_Timestamp_Month', 'Trip_Start_Timestamp_Day');
result = foreach result generate q.'Trip_Start_Timestamp_Year' + "-" + q.'Trip_Start_Timestamp_Month' as 'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month', q.'Trip_Start_Timestamp_Day' as 'Trip_Start_Timestamp_Day', number_to_string(day_in_week(toDate(max(q.'Trip_Start_Timestamp_sec_epoch'))), "##") as 'Weekday';
res_jan = filter result by 'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month' == "2017-01";
res_feb = filter result by 'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month' == "2017-02";
res_join = group res_jan by ('Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month', 'Weekday') full, res_feb by ('Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month', 'Weekday');
res_join = foreach res_join generate coalesce(res_jan.'Weekday', res_feb.'Weekday') as 'Weekday', coalesce(res_jan.'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month', res_feb.'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month') as 'Year-Month';
res_join = group res_join by ('Weekday', 'Year-Month');
res_join = foreach res_join generate 'Weekday' as 'Weekday', 'Year-Month' as 'Year-Month';
Final Dimension Group

Case 2— Measures:

Simple, easy to do! As the Oracle document states, when it replaces the null, if present with the very first numerical value. So, let’s say, we have three numbers and we use coalesce on them, these will be the result.

  1. A == 3, B == 5, C == 8; Result: 3 since 3 is the first non null number.
  2. A == null, B == 5, C == 8; Result: 5 since 5 is the first non null number.
  3. A == null, B == null, C == 8; Result: 8 since 8 is the first non null number.

In Tableau CRM, or most programming languages, if we add anything to null, the result is also null. Thus, coalescing the numbers help prevent these scenarios. From the above example,

  1. A + B + C; Result: 16 since there are no nulls.
  2. A + B + C; Result: null since A is null.

To prevent this from happening, we use the following statement:

coalesce(A,0) + coalesce(B,0) + coalesce(C,0)

This results in:

  1. Result: 16 as there are no nulls
  2. Result: 13, since coalescing A,0 makes the null 0.
  3. Result: 8, since coalescing A,0 and B,0 makes the nulls 0.

Let’s put these into use and add two measure columns to our above use case. Number of rides and Trip Total.

Before we start filtering the data into two streams. This is what the query looks like:

q = load "Taxi_Trips";
result = group q by ('Trip_Start_Timestamp_Year', 'Trip_Start_Timestamp_Month', 'Trip_Start_Timestamp_Day');
result = foreach result generate q.'Trip_Start_Timestamp_Year' + "-" + q.'Trip_Start_Timestamp_Month' as 'Trip Start Time (Year-Month)', q.'Trip_Start_Timestamp_Day' as 'Trip_Start_Timestamp_Day', number_to_string(day_in_week(toDate(max(q.'Trip_Start_Timestamp_sec_epoch'))), "##") as 'Weekday', count(q) as 'Rides', sum('Trip_Total') as 'Trip_Total';
Adding Count of Rows (Rides) and Trip Total to the query

Then we use coalesce for the numeric part in addition to the dimensions to receive the next part:

q = load "Taxi_Trips";
result = group q by ('Trip_Start_Timestamp_Year', 'Trip_Start_Timestamp_Month', 'Trip_Start_Timestamp_Day');
result = foreach result generate q.'Trip_Start_Timestamp_Year' + "-" + q.'Trip_Start_Timestamp_Month' as 'Trip Start Time (Year-Month)', q.'Trip_Start_Timestamp_Day' as 'Trip_Start_Timestamp_Day', number_to_string(day_in_week(toDate(max(q.'Trip_Start_Timestamp_sec_epoch'))), "##") as 'Weekday', count(q) as 'Rides', sum('Trip_Total') as 'Trip_Total';
res_jan = filter result by 'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month' == "2017-01";
res_feb = filter result by 'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month' == "2017-02";
res_join = group res_jan by ('Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month', 'Weekday') full, res_feb by ('Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month', 'Weekday');
res_join = foreach res_join generate coalesce(res_jan.'Weekday', res_feb.'Weekday') as 'Weekday', coalesce(res_jan.'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month', res_feb.'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month') as 'Year-Month', coalesce(sum(res_jan.'Rides'), sum(res_feb.'Rides')) as 'Rides', coalesce(sum(res_jan.'Trip_Total'), sum(res_feb.'Trip_Total')) as 'Trip_Total';
Rides and Trip Total Grouped by Weekdays and Year-Month

Last but not the least, we group this again by Weekday and Year-Month!

q = load "Taxi_Trips";
result = group q by ('Trip_Start_Timestamp_Year', 'Trip_Start_Timestamp_Month', 'Trip_Start_Timestamp_Day');
result = foreach result generate q.'Trip_Start_Timestamp_Year' + "-" + q.'Trip_Start_Timestamp_Month' as 'Trip Start Time (Year-Month)', q.'Trip_Start_Timestamp_Day' as 'Trip_Start_Timestamp_Day', number_to_string(day_in_week(toDate(max(q.'Trip_Start_Timestamp_sec_epoch'))), "##") as 'Weekday', count(q) as 'Rides', sum('Trip_Total') as 'Trip_Total';
res_jan = filter result by 'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month' == "2017-01";
res_feb = filter result by 'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month' == "2017-02";
res_join = group res_jan by ('Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month', 'Weekday') full, res_feb by ('Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month', 'Weekday');
res_join = foreach res_join generate coalesce(res_jan.'Weekday', res_feb.'Weekday') as 'Weekday', coalesce(res_jan.'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month', res_feb.'Trip_Start_Timestamp_Year-Trip_Start_Timestamp_Month') as 'Year-Month', coalesce(sum(res_jan.'Rides'), sum(res_feb.'Rides')) as 'Rides', coalesce(sum(res_jan.'Trip_Total'), sum(res_feb.'Trip_Total')) as 'Trip_Total';
res_join = group res_join by ('Weekday', 'Year-Month');
res_join = foreach res_join generate 'Weekday' as 'Weekday', 'Year-Month' as 'Year-Month', sum('Rides') as 'Rides', sum('Trip_Total') as 'Trip_Total';
Final Result

Converting to a Stacked Pyramid chart, it looks like this:

--

--