Let’s Transpose in SAQL!

Sayantani Mitra
CRM Analytics
Published in
5 min readDec 27, 2020

What’s Transpose??

Per dictionary.com, Transpose¹ means “to change the relative position, order, or sequence of; cause to change places”.

But what does that mean when we are dealing with data?

We may have learnt the matrix system in school or college and a big part of that was transposing the matrix. A small refresher on transpose using that may help understand what we are trying to achieve here.

From mathisfun.com², “”Flipping” a matrix over its diagonal. The rows and columns get swapped.”

Transpose

Why transpose?

In the world of data where stacked charts are involved, this becomes immensely useful! We may not be able to stack the bars or columns without transposing the data in the table.

Let’s take an example. We will go back to our taxi data again for this. Say, we want to divide the total amount that the taxis received per week by Fare, Extras, Tips and Tolls. When we start the lens this shows 4 separate columns for each initially.

Initial View

The above image does give us the required details but wouldn’t it be nice to just stack them up together instead of letting each be it’s own component? Let’s see what happens when we try making this a stacked bar/column and a few other chart types.

Stacked Bars Don’t Work

Solution?!

Of course there is, as always! Let’s get our hands dirty with some SAQL magic!

To solve this, we need to transpose the data meaning, instead of Sum of Extras, Sum of Fares etc. being separate columns, we need them to be rows.

CompareTable (Sum of Extras, Sum of Fares as Columns)

To do that we will do the following:

  • Create a new stream for each column from the above.
  • Each of these new will have a new column called Type.
  • The Type will be Extras, Fare, Tips, Tolls.
  • And one column for Charge

SAQL before any transformation!

q = load "Taxi_Trips";
result = group q by 'Trip_Start_Timestamp_Week';
result = foreach result generate q.'Trip_Start_Timestamp_Week' as 'Trip_Start_Timestamp_Week', sum(q.'Extras') as 'A', sum(q.'Fare') as 'B', sum(q.'Tips') as 'C', sum(q.'Tolls') as 'D';
result = order result by ('Trip_Start_Timestamp_Week' asc);
result = limit result 2000;

We will change the column names for better understanding.

q = load "Taxi_Trips";
result = group q by 'Trip_Start_Timestamp_Week';
result = foreach result generate q.'Trip_Start_Timestamp_Week' as 'Trip Start Week', sum(q.'Extras') as 'Extras', sum(q.'Fare') as 'Fare', sum(q.'Tips') as 'Tips', sum(q.'Tolls') as 'Tolls';
result = order result by ('Trip Start Week' asc);
result = limit result 2000;
Updated Column Names

Now, we take each of these columns and create a stream.

q = load "Taxi_Trips";
result = group q by 'Trip_Start_Timestamp_Week';
result = foreach result generate q.'Trip_Start_Timestamp_Week' as 'Trip Start Week', sum(q.'Extras') as 'Extras', sum(q.'Fare') as 'Fare', sum(q.'Tips') as 'Tips', sum(q.'Tolls') as 'Tolls';
res_extras = foreach result generate 'Trip Start Week', "Extras" as 'Type', 'Extras' as 'Charge';
res_fare = foreach result generate 'Trip Start Week', "Fare" as 'Type', 'Fare'
as 'Charge';
res_tips = foreach result generate 'Trip Start Week', "Tips" as 'Type', 'Tips'
as 'Charge';
res_tolls = foreach result generate 'Trip Start Week', "Tolls" as 'Type', 'Tolls'
as 'Charge';
-- result = order result by ('Trip Start Week' asc);
-- result = limit result 2000;

Then, we union these streams, group the new union and then generate the final statement that will give us the desired result.

q = load "Taxi_Trips";
result = group q by 'Trip_Start_Timestamp_Week';
result = foreach result generate q.'Trip_Start_Timestamp_Week' as 'Trip Start Week', sum(q.'Extras') as 'Extras', sum(q.'Fare') as 'Fare', sum(q.'Tips') as 'Tips', sum(q.'Tolls') as 'Tolls';
res_extras = foreach result generate 'Trip Start Week', "Extras" as 'Type', 'Extras' as 'Charge';
res_fare = foreach result generate 'Trip Start Week', "Fare" as 'Type', 'Fare' as 'Charge';
res_tips = foreach result generate 'Trip Start Week', "Tips" as 'Type', 'Tips' as 'Charge';
res_tolls = foreach result generate 'Trip Start Week', "Tolls" as 'Type', 'Tolls' as 'Charge';
res_fin = union res_fare, res_extras, res_tips, res_tolls;

Result of Union causes all the Types to come after one finished. In this case all fares come first. It is not properly grouped by the Trip Start Week and then Type.

Union

So, we group by the union-ed stream

q = load "Taxi_Trips";
result = group q by 'Trip_Start_Timestamp_Week';
result = foreach result generate q.'Trip_Start_Timestamp_Week' as 'Trip Start Week', sum(q.'Extras') as 'Extras', sum(q.'Fare') as 'Fare', sum(q.'Tips') as 'Tips', sum(q.'Tolls') as 'Tolls';
res_extras = foreach result generate 'Trip Start Week', "Extras" as 'Type', 'Extras' as 'Charge';
res_fare = foreach result generate 'Trip Start Week', "Fare" as 'Type', 'Fare' as 'Charge';
res_tips = foreach result generate 'Trip Start Week', "Tips" as 'Type', 'Tips' as 'Charge';
res_tolls = foreach result generate 'Trip Start Week', "Tolls" as 'Type', 'Tolls' as 'Charge';
res_fin = union res_fare, res_extras, res_tips, res_tolls;
res_fin = group res_fin by ('Trip Start Week', 'Type');
res_fin = foreach res_fin generate 'Trip Start Week' as 'Trip Start Week', 'Type' as 'Type', sum('Charge') as 'Charge';
Result of Union followed by Group

One last change we will do is update the Charge column to be Currency using formatting.

Charges formatted as ‘Currency’

Lastly, we change this chart to a stacked bar or column!

Stacked Bars and Columns!

A caveat we should remember in this case is, it works very well when you know what each of the initial columns represent. If that is dynamic, then this process may need binding and a few other tricks up our sleeves.

--

--