Making Multi-Layer Radial Bar Chart for Duke University Baseball Team with SQL & Tableau

Changhyeon Yoo
8 min readApr 16, 2022

--

Pitching Performance Comparison Dashboard with Multi-Layer Radial Bar Chart

This was an independent study project (outside of the regular curriculum) that I completed while I was a student in the Duke University, Fuqua School of Business’ MQM program. The project involved building a novel multi-layer radial bar chart in Tableau using data from a large MySQL database. The visualization is built using a variety of SQL queries that make use of CTEs and window functions.

1. Objective & Final Output: Multi-Layer Radial Bar Chart

The request from our client, Duke Baseball Team’s coaching staff, was that they wanted to evaluate

how pitch performance is related to a pitch’s “tilt”, a value that measures the direction that the ball spins immediately when it is released by the pitcher. So, for a specific pitch type thrown by a pitcher at a specific date, the coaching staff is interested in examining if different tilt values are associated with different (positive or negative) outcomes.

Because tilts can be shown in a circle shape (e.g. 12:00 direction, 3:00 direction, 6:00 direction, 9:00 direction) and the number of pitches and the measures of performance indicators can be shown in bar lengths, I concluded that the “Radial Bar Chart” is the most effective way of presenting the needed values. And the radial bar chart consists of multiple circles (layers); the inner circle represents the number of pitches and the outer circle represents the performance indicators.

Inner circle (Left) shows the number of pitches thrown at a specific tilt for a specific pitcher, date, and pitch type / Outer circle (Right) shows the performances of pitches thrown at a specific title for a specific pitcher, date, and pitch type

Also, as the client wants to show the combined (inner circle + outer circle) circles for two pitchers at the same dashboard for comparison, I made duplicated worksheets for each circle, made two combined circles, and positioned them in one dashboard. And the final outputs are as follows:

Comparison between Pitcher 9 and 10 regarding the swinging strike percentage of the Slider they’ve thrown
Comparison between pitcher 8 and pitcher 15 regarding the total strike percentages of the Fastball they’ve thrown
Comparison between pitcher 3 and pitcher 15 regarding the Average Exist Speed of the Fastball they’ve thrown

2. Dashboard Creation Journey

There were a lot of problems when making the multi-layer radial bar chart. In this section, I will explain the problems I faced and how I solved each problem during the journey of creating the final output

1) Initial Trial - Stacked Bar Chart

Initially, I came up with the idea of showing a “Stacked Bar Chart” for the needed values. However, as you can see in the below graph, this bar shape doesn’t seem to be effective for our purpose. In fact, because we were not interested in seeing the proportion of different pitch types among the pitches thrown in a specific tilt, the stacked characteristic was not needed.

An initial trial with a stacked bar shape, which apparently failed

Therefore, I deleted this initial stacked bar chart, came up with the idea of using a radial bar chart, and started all over again.

2) Wrong positioning/order of tilts: Solved by reverse index

The very first problem I faced was that the order of tilts wasn’t correct. We wanted the tilts to be placed around a circle in the same way the numbers on the clock are placed in order. However, initially, the tilts were placed in the opposite way. Therefore, to fix this, I re-understood the basic concept of Radius.

https://www.math.csi.cuny.edu/~ikofman/Polking/drg_txt.html Referenced Concept of Radius

After studying, I found out that I needed the following things; 3:00 needs a 6.28 radius, 6:00 needs a 4.71 radius, 9:00 needs a 3.14 radius, and 12:00 needs a 1.57 radius. Also, each index I used to fix the positions of tilts was worth 0.13 radius.

Therefore, I made a new calculated field “Reverse_index (57 — index())” so that I can get the appropriate radius for the corresponding tilt.

With each tilt worth 0.13 radius, the original index has to be reversed (57-original index) so that each tilt has the right radius
Tilts in the right place after adjusting the index and corresponding radius

3) Tilt positions changing according to filters: Solved by SQL Cross-Join Function

By solving the first problem, the tilts were in the right position. BUT ONLY WHEN ALL DATA WAS SHOWN. In other words, if filters for the pitcher, pitch type, and date were used, the index for each tilt changed, thereby completely destroying the positions.

I looked into the reason why this problem was happening and found out the problem was the NULL values in pitches for certain tilts. More specifically, because every pitcher throws only at certain tilts, the tilts at which a pitcher has never thrown a ball contain NULL values for the “pitches” column. And this null problem happened according to not only the pitcher but also the pitch type and the date.

After days of ideation, I came up with the solution; what we need was a combination of all values in the 3 filters (“pitcher”, “pitch type”, “date”) and “tilt” and transformation of all NULL values into 0. And this surely had to be done at the SQL level, not the tableau level.

Step 1: Cross-joining all values in “pitcher”, “pitch type”, “date”, and “tilt” to generate a whole set of possibles combinations. Each “pitcher-pitch type-date” combination will have 48 tilts.

Step 2: To the original dataset I left joined another dataset that contains the data of “Exit Speed”.

Step 3: To the dataset created in Step 1, I left joined the dataset created in Step 2 on “pitcher”, “pitch type”, “date”, and “title” to connect all baseball performance data to each combination.

Step 4: As we know NULL values are causing the index misorder problem, I used the coalesce function to make all NULL pitches into 0.

Using the new dataset generated from the 4 steps, I was able to fix the tilt positions regardless of the filters (pitcher, pitch type, date) that the client will be using

4) Bar lengths getting out of chart by being too long: Generate a new column “Pitches Proportion”

While the tilt positions are fixed and the number of pitches for each tilt is shown correctly, I faced a new problem regarding the bar length. This problem happened because of the formula I used to calculate the bar length.

Initially, the length of the bar is defined as [NextImp] * [Pitches]. Here, the parameter [NextImp] indicates the number that is multiplied by the number of pitches, and this parameter is needed to draw the bar. However, because certain values in “Pitches” are really high, even having a very small [NextImp] number that is required to make bars appear for tilts with small “Pitches” values can make bars for tilts with huge “Pitches” values too long.

If the number of pitches is set to be the base of the bar lengths, there are cases where the bar length becomes too long

To solve this problem, we came up with the idea of using the “Pitches Proportion” as the base of the bar lengths. Because the proportion cannot be bigger than 100, I can calculate the maximum possible bar length and set the [NextImp], X-axis, and Y-axis accordingly.

round((Pitches / SUM(Pitches) OVER(Partition by Pitcher)), 3) * 100 as Pitches_Proportion

The above query was added, thereby preventing bar length from becoming too long out of the chart range. And with this calculation of pitches proportion, the inner circle was completed to present.

While using the same filter values and bar length control parameter values as the above graph, the bars are now within the chart range.

5) “Pitch Result” column showing the random result of the pitch within the group I made in SQL: Fixed by using CASE function

To get the values for the “Total Strike Percentage” and “Swinging Strike Percentage”, I need new columns that count the number of instances of each pitch result (e.g. Strike, Swinging Strike). And the results of each pitch can be found in the “PitchCall Column”.

Initial Query for the Pitch Result

Select Pitcher, Tilt, FinalPitchType as Pitch_type, Date, COUNT(PitchUID) as Pitches, PitchCall as Pitch_Result
From dukepitch
GROUP BY Pitcher, Tilt, FinalPitchType, Date

Initially, I used the above query to get the pitch result for each combination. However, as you can expect, the query returns the random value for the pitch result within the group I made.

The initial query returned a random value for the Pitch Result, which was a mistake

This problem was easily solved by using the CASE function in the query. So I replaced the initial “PitchCall as Pitch_Result” query with the CASE query that returns values for each pitch result.

SUM(CASE When PitchCall=’BallCalled’ Then 1 Else 0 End) AS Balls, SUM(CASE When PitchCall=’FoulBall’ Then 1 Else 0 End) AS Fouls, SUM(CASE When PitchCall=’StrikeSwinging’ Then 1 Else 0 End) AS Swinging_Strike, SUM(CASE When PitchCall=’InPlay’ Then 1 Else 0 End) AS InPlay, SUM(CASE When PitchCall=’HitByPitch’ Then 1 Else 0 End) AS HBP

The new CASE query successfully returns the values for each pitch result

6) Cannot use AVG() function with the CASE function to show different pitch performances: Solved by using LOD expression (or Fixed function)

For the outer circle, I used the “Pitch Performance” parameter to allow the client to choose what pitch performances (Total Strike Percentage, Swinging Strike Percentage, Average Exit Speed, Average IVB) will be shown in the dashboard. To do so, I made a calculated field with the same name “Pitch Performance” that uses CASE function to show different pitch performances according to the client’s choice.

However, the problem was I needed to calculate the AVERAGE of each pitch performance according to the filters (pitcher, pitch type, date). However, the CASE function doesn’t allow us to use the aggregate function with it.

In the CASE expression, you cannot mix the non-aggregate and the aggregate

To solve this problem, I made new calculated fields with the LOD expression or Fixed function. By fixing each pitch performance to the Pitcher, Tilt, Date, and Pitch type, I was able to successfully show the correct pitch performances for each tilt according to the filters (pitcher, pitch type, date).

The detailed steps are as follows:

Step 1. Create a new calculate field that calculates the correct average
Step 2. Created a new calculate field that fixe each performance to the filters
Step 3. To use the parameter, create the calculated field with the same name and use the CASE expression with the fields created in Step 2

--

--

Changhyeon Yoo

Duke University, Fuqua School of Business: Master’s of Quantitative Management, Class of 2022