Making Multi-Layer Radial Bar Chart for Duke University Baseball Team with SQL & Tableau
This was an independent study project (outside of the regular curriculum) that I completed while I was a student in the Duke University, Fuqua School of Business’ MQM program. The project involved building a novel multi-layer radial bar chart in Tableau using data from a large MySQL database. The visualization is built using a variety of SQL queries that make use of CTEs and window functions.
1. Objective & Final Output: Multi-Layer Radial Bar Chart
The request from our client, Duke Baseball Team’s coaching staff, was that they wanted to evaluate
how pitch performance is related to a pitch’s “tilt”, a value that measures the direction that the ball spins immediately when it is released by the pitcher. So, for a specific pitch type thrown by a pitcher at a specific date, the coaching staff is interested in examining if different tilt values are associated with different (positive or negative) outcomes.
Because tilts can be shown in a circle shape (e.g. 12:00 direction, 3:00 direction, 6:00 direction, 9:00 direction) and the number of pitches and the measures of performance indicators can be shown in bar lengths, I concluded that the “Radial Bar Chart” is the most effective way of presenting the needed values. And the radial bar chart consists of multiple circles (layers); the inner circle represents the number of pitches and the outer circle represents the performance indicators.
Also, as the client wants to show the combined (inner circle + outer circle) circles for two pitchers at the same dashboard for comparison, I made duplicated worksheets for each circle, made two combined circles, and positioned them in one dashboard. And the final outputs are as follows:
2. Dashboard Creation Journey
There were a lot of problems when making the multi-layer radial bar chart. In this section, I will explain the problems I faced and how I solved each problem during the journey of creating the final output
1) Initial Trial - Stacked Bar Chart
Initially, I came up with the idea of showing a “Stacked Bar Chart” for the needed values. However, as you can see in the below graph, this bar shape doesn’t seem to be effective for our purpose. In fact, because we were not interested in seeing the proportion of different pitch types among the pitches thrown in a specific tilt, the stacked characteristic was not needed.
Therefore, I deleted this initial stacked bar chart, came up with the idea of using a radial bar chart, and started all over again.
2) Wrong positioning/order of tilts: Solved by reverse index
The very first problem I faced was that the order of tilts wasn’t correct. We wanted the tilts to be placed around a circle in the same way the numbers on the clock are placed in order. However, initially, the tilts were placed in the opposite way. Therefore, to fix this, I re-understood the basic concept of Radius.
After studying, I found out that I needed the following things; 3:00 needs a 6.28 radius, 6:00 needs a 4.71 radius, 9:00 needs a 3.14 radius, and 12:00 needs a 1.57 radius. Also, each index I used to fix the positions of tilts was worth 0.13 radius.
Therefore, I made a new calculated field “Reverse_index (57 — index())” so that I can get the appropriate radius for the corresponding tilt.
3) Tilt positions changing according to filters: Solved by SQL Cross-Join Function
By solving the first problem, the tilts were in the right position. BUT ONLY WHEN ALL DATA WAS SHOWN. In other words, if filters for the pitcher, pitch type, and date were used, the index for each tilt changed, thereby completely destroying the positions.
I looked into the reason why this problem was happening and found out the problem was the NULL values in pitches for certain tilts. More specifically, because every pitcher throws only at certain tilts, the tilts at which a pitcher has never thrown a ball contain NULL values for the “pitches” column. And this null problem happened according to not only the pitcher but also the pitch type and the date.
After days of ideation, I came up with the solution; what we need was a combination of all values in the 3 filters (“pitcher”, “pitch type”, “date”) and “tilt” and transformation of all NULL values into 0. And this surely had to be done at the SQL level, not the tableau level.
Step 1: Cross-joining all values in “pitcher”, “pitch type”, “date”, and “tilt” to generate a whole set of possibles combinations. Each “pitcher-pitch type-date” combination will have 48 tilts.
Step 2: To the original dataset I left joined another dataset that contains the data of “Exit Speed”.
Step 3: To the dataset created in Step 1, I left joined the dataset created in Step 2 on “pitcher”, “pitch type”, “date”, and “title” to connect all baseball performance data to each combination.
Step 4: As we know NULL values are causing the index misorder problem, I used the coalesce function to make all NULL pitches into 0.
4) Bar lengths getting out of chart by being too long: Generate a new column “Pitches Proportion”
While the tilt positions are fixed and the number of pitches for each tilt is shown correctly, I faced a new problem regarding the bar length. This problem happened because of the formula I used to calculate the bar length.
Initially, the length of the bar is defined as [NextImp] * [Pitches]. Here, the parameter [NextImp] indicates the number that is multiplied by the number of pitches, and this parameter is needed to draw the bar. However, because certain values in “Pitches” are really high, even having a very small [NextImp] number that is required to make bars appear for tilts with small “Pitches” values can make bars for tilts with huge “Pitches” values too long.
To solve this problem, we came up with the idea of using the “Pitches Proportion” as the base of the bar lengths. Because the proportion cannot be bigger than 100, I can calculate the maximum possible bar length and set the [NextImp], X-axis, and Y-axis accordingly.
round((Pitches / SUM(Pitches) OVER(Partition by Pitcher)), 3) * 100 as Pitches_Proportion
The above query was added, thereby preventing bar length from becoming too long out of the chart range. And with this calculation of pitches proportion, the inner circle was completed to present.
5) “Pitch Result” column showing the random result of the pitch within the group I made in SQL: Fixed by using CASE function
To get the values for the “Total Strike Percentage” and “Swinging Strike Percentage”, I need new columns that count the number of instances of each pitch result (e.g. Strike, Swinging Strike). And the results of each pitch can be found in the “PitchCall Column”.
Initial Query for the Pitch Result
Select Pitcher, Tilt, FinalPitchType as Pitch_type, Date, COUNT(PitchUID) as Pitches, PitchCall as Pitch_Result
From dukepitch
GROUP BY Pitcher, Tilt, FinalPitchType, Date
Initially, I used the above query to get the pitch result for each combination. However, as you can expect, the query returns the random value for the pitch result within the group I made.
This problem was easily solved by using the CASE function in the query. So I replaced the initial “PitchCall as Pitch_Result” query with the CASE query that returns values for each pitch result.
SUM(CASE When PitchCall=’BallCalled’ Then 1 Else 0 End) AS Balls, SUM(CASE When PitchCall=’FoulBall’ Then 1 Else 0 End) AS Fouls, SUM(CASE When PitchCall=’StrikeSwinging’ Then 1 Else 0 End) AS Swinging_Strike, SUM(CASE When PitchCall=’InPlay’ Then 1 Else 0 End) AS InPlay, SUM(CASE When PitchCall=’HitByPitch’ Then 1 Else 0 End) AS HBP
6) Cannot use AVG() function with the CASE function to show different pitch performances: Solved by using LOD expression (or Fixed function)
For the outer circle, I used the “Pitch Performance” parameter to allow the client to choose what pitch performances (Total Strike Percentage, Swinging Strike Percentage, Average Exit Speed, Average IVB) will be shown in the dashboard. To do so, I made a calculated field with the same name “Pitch Performance” that uses CASE function to show different pitch performances according to the client’s choice.
However, the problem was I needed to calculate the AVERAGE of each pitch performance according to the filters (pitcher, pitch type, date). However, the CASE function doesn’t allow us to use the aggregate function with it.
To solve this problem, I made new calculated fields with the LOD expression or Fixed function. By fixing each pitch performance to the Pitcher, Tilt, Date, and Pitch type, I was able to successfully show the correct pitch performances for each tilt according to the filters (pitcher, pitch type, date).
The detailed steps are as follows: