Let’s Transpose in SAQL!
What’s Transpose??
Per dictionary.com, Transpose¹ means “to change the relative position, order, or sequence of; cause to change places”.
But what does that mean when we are dealing with data?
We may have learnt the matrix system in school or college and a big part of that was transposing the matrix. A small refresher on transpose using that may help understand what we are trying to achieve here.
From mathisfun.com², “”Flipping” a matrix over its diagonal. The rows and columns get swapped.”
Why transpose?
In the world of data where stacked charts are involved, this becomes immensely useful! We may not be able to stack the bars or columns without transposing the data in the table.
Let’s take an example. We will go back to our taxi data again for this. Say, we want to divide the total amount that the taxis received per week by Fare, Extras, Tips and Tolls. When we start the lens this shows 4 separate columns for each initially.
The above image does give us the required details but wouldn’t it be nice to just stack them up together instead of letting each be it’s own component? Let’s see what happens when we try making this a stacked bar/column and a few other chart types.
Solution?!
Of course there is, as always! Let’s get our hands dirty with some SAQL magic!
To solve this, we need to transpose the data meaning, instead of Sum of Extras, Sum of Fares etc. being separate columns, we need them to be rows.
To do that we will do the following:
- Create a new stream for each column from the above.
- Each of these new will have a new column called Type.
- The Type will be Extras, Fare, Tips, Tolls.
- And one column for Charge
SAQL before any transformation!
q = load "Taxi_Trips";
result = group q by 'Trip_Start_Timestamp_Week';
result = foreach result generate q.'Trip_Start_Timestamp_Week' as 'Trip_Start_Timestamp_Week', sum(q.'Extras') as 'A', sum(q.'Fare') as 'B', sum(q.'Tips') as 'C', sum(q.'Tolls') as 'D';
result = order result by ('Trip_Start_Timestamp_Week' asc);
result = limit result 2000;
We will change the column names for better understanding.
q = load "Taxi_Trips";
result = group q by 'Trip_Start_Timestamp_Week';
result = foreach result generate q.'Trip_Start_Timestamp_Week' as 'Trip Start Week', sum(q.'Extras') as 'Extras', sum(q.'Fare') as 'Fare', sum(q.'Tips') as 'Tips', sum(q.'Tolls') as 'Tolls';
result = order result by ('Trip Start Week' asc);
result = limit result 2000;
Now, we take each of these columns and create a stream.
q = load "Taxi_Trips";
result = group q by 'Trip_Start_Timestamp_Week';
result = foreach result generate q.'Trip_Start_Timestamp_Week' as 'Trip Start Week', sum(q.'Extras') as 'Extras', sum(q.'Fare') as 'Fare', sum(q.'Tips') as 'Tips', sum(q.'Tolls') as 'Tolls';res_extras = foreach result generate 'Trip Start Week', "Extras" as 'Type', 'Extras' as 'Charge';
res_fare = foreach result generate 'Trip Start Week', "Fare" as 'Type', 'Fare' as 'Charge';
res_tips = foreach result generate 'Trip Start Week', "Tips" as 'Type', 'Tips' as 'Charge';
res_tolls = foreach result generate 'Trip Start Week', "Tolls" as 'Type', 'Tolls' as 'Charge';-- result = order result by ('Trip Start Week' asc);
-- result = limit result 2000;
Then, we union these streams, group the new union and then generate the final statement that will give us the desired result.
q = load "Taxi_Trips";
result = group q by 'Trip_Start_Timestamp_Week';
result = foreach result generate q.'Trip_Start_Timestamp_Week' as 'Trip Start Week', sum(q.'Extras') as 'Extras', sum(q.'Fare') as 'Fare', sum(q.'Tips') as 'Tips', sum(q.'Tolls') as 'Tolls';res_extras = foreach result generate 'Trip Start Week', "Extras" as 'Type', 'Extras' as 'Charge';
res_fare = foreach result generate 'Trip Start Week', "Fare" as 'Type', 'Fare' as 'Charge';
res_tips = foreach result generate 'Trip Start Week', "Tips" as 'Type', 'Tips' as 'Charge';
res_tolls = foreach result generate 'Trip Start Week', "Tolls" as 'Type', 'Tolls' as 'Charge';res_fin = union res_fare, res_extras, res_tips, res_tolls;
Result of Union causes all the Types to come after one finished. In this case all fares come first. It is not properly grouped by the Trip Start Week and then Type.
So, we group by the union-ed stream
q = load "Taxi_Trips";
result = group q by 'Trip_Start_Timestamp_Week';
result = foreach result generate q.'Trip_Start_Timestamp_Week' as 'Trip Start Week', sum(q.'Extras') as 'Extras', sum(q.'Fare') as 'Fare', sum(q.'Tips') as 'Tips', sum(q.'Tolls') as 'Tolls';res_extras = foreach result generate 'Trip Start Week', "Extras" as 'Type', 'Extras' as 'Charge';
res_fare = foreach result generate 'Trip Start Week', "Fare" as 'Type', 'Fare' as 'Charge';
res_tips = foreach result generate 'Trip Start Week', "Tips" as 'Type', 'Tips' as 'Charge';
res_tolls = foreach result generate 'Trip Start Week', "Tolls" as 'Type', 'Tolls' as 'Charge';res_fin = union res_fare, res_extras, res_tips, res_tolls;
res_fin = group res_fin by ('Trip Start Week', 'Type');
res_fin = foreach res_fin generate 'Trip Start Week' as 'Trip Start Week', 'Type' as 'Type', sum('Charge') as 'Charge';
One last change we will do is update the Charge column to be Currency using formatting.
Lastly, we change this chart to a stacked bar or column!
A caveat we should remember in this case is, it works very well when you know what each of the initial columns represent. If that is dynamic, then this process may need binding and a few other tricks up our sleeves.