Tennis Note #24

IBM Keys Unlocked

By Nikita Taparia and Stephanie Kovalchik

The image here is based on the official IBM image that illustrates 3 keys — checklist for the set that each player must achieve for general success. They are entitled IBM “Keys to the Match.” Back in 2013, Jeff Sackmann, who runs Tennis Abstract, and Carl Bialik wrote their criticisms of the system. However, when Stephanie tweeted this image, we both immediately agreed that we needed to talk about this again. We set out to take a deep dive into the keys with data, visualization, and audio. Enjoy the data-driven story in real time!

The audio below is a shortened edited version. We originally recorded our conversation in real time and it was 1.5 hours! If you would like to gain access to the uncut audio, please sent me a tweet or email! We had a lot of things on our mind and it took me entire day to figure out what to cut. This is audio commentary as we talk about each data visual one by one. Enjoy!
So what is the problem? The fact that when you read the IBM display, what is your immediate thought? More keys = set winner, right? But what do we even know about the keys? How many are there? How often do we see certain keys? How is the order determined? There are so many questions to consider and we are going to answer all of them today. [0:00–2:57]
Click the statement above to be redirected to the list of keys. You can use this list to actually figure out which keys are displayed based on the next graph. [2:57–3:30]
There are 156 ‘fill in the blank’ IBM keys. Of these only a subset appear — some more than others. Each player has 5 major keys and only 3 appear. Here, we plotted the distribution of these keys. Specifically, if you look at the top of the black bar, this is the total and if you look at the top of the purple bar, this is the for top 3 keys. We split the distribution for ATP and WTA within these top 3 keys to illustrate how often certain keys appear for each tour. The next graphic discusses how these top 3 keys are picked…[3:30–5:55]
As stated above, of the 5 keys, only 3 appear for a player and these 3 keys are determined by key strength. As far as we can tell, key strength has nothing to do with the typical statement that shows up below each key: “Player A won []% of their sets when they achieved this key.” It is a rather mysterious quantity but it determines the rank of these keys. Remember, the order of these keys matters. We can take comfort that a majority of the top 3 keys are centered around ‘very good’…whatever that means. The next graphics focus on the wording of the top 3 keys. [5:56–7:30]
In order to generate keys, you need to set goals, as seen in the generic list we talked about previously. These messages use three major comparison terms: less than, between, and more than. As you can see, a majority of the time, the player must perform above and beyond the goal IBM sets for them. In fact, Stephanie’s analogy was they are like a light switch and have two states: on and off. Why am I [Nikita] against that? Well, it may have tested nicely with users but it gives zero perspective. Instead, it would be nice to get a win and loss average to develop a range. Now, what exactly are these key messages looking at? Go to the next chart. [7:31–9:17]-

Don’t forget to follow the publication, ‘recommend’ and share!

What is most striking about this distribution: (1) winners and unforced errors seem meaningless for IBM and yet they are often what commentators and fans use to explain why a person won or lost, and (2) there were only two keys about break points but it was mostly about preventing break point opportunities on serve rather than capitalizing on them. [11:03–15:26]
Now, we can compare the findings about the IBM keys to data from the Match Charting Project. Charles Allen made a great interactive graphic in which he compared the averages for winners vs losers in the ATP and WTA for various different parameters, similar to what tennis statistics are typically tracked during a match.The winners are better in all aspects than the losers, expect in two places: 1st serve and 2nd serve %. Don’t be fooled by the huge gaps in certain areas, either. Go to the interactive for exact numbers. It includes this fantastic feature where you can compare each individual parameter ranges between tours when you hover over them! Just click the image to be redirected or the link in the caption. [15:26–17:30]
Stephanie presented a new model to predict match win% and it is based on the Pythagorean theorem! Surely you remember it from math class…it has to do with a right triangle and if you know the length of the two straight edges, you can figure out the length of the long edge. Instead, she tested 14 different tennis parameters and found that break points won explained win % the best and yet compare that to IBM data earlier about the keys. Check out her two articles to see how poorly certain parameters predicted win%: Converting Clutch into Wins and Are Women as Pythagorean as Men? [17:31–21:41]
Overall, by the end of the match, the keys were mostly met by the match winner and mostly unmet by the match loser. The shape of the distribution is symmetric and it appears that this distribution is the same for set winner. If were were to quantify this, the largest bar is ~ 30–35%, followed by ~15–20% for the next bar. So what’s the problem? Scroll down. [21:42–24:26]
Consider the number of keys met in the match by the set winner and set loser. In a perfect world, with the philosophy more keys = set winner, you would could imagine 100% of the keys met by the set winner and 0% met by the set loser. However, if we look at the data, we see another interesting symmetry in which the set loser’s distribution is towards the lower end and the set winner is towards the upper end. There is an overlap in the middle which explains why it seems the set winner sometimes has less keys than the set loser. But let’s dive a little deeper. [24:27–27–31]
Consider two other examples from this years US Open in which the set loser has more keys than the set winner. We can look at the above analysis and say that the chance Kukushkin would win his set or the chance that Robson would win her set without meeting any keys is 2–5%. But how often does the set loser win more keys than the winner in the ATP and WTA? [27:32–29:27]
The chances you will see the loser of the set with more keys than the actual winner are pretty high. If we approximate ~10% likelihood and an average number of sets played to be ~ 3302 (based on absolute min and max), then to put things in perspective: you would expect ~330 sets to exhibit this phenomena! Please note, this is not a stacked bar graph and are absolutely values so just look at the top of the bar. [29:28–31:09]
May the odds be ever in your favor. [31:10–34:26]
The way Medium works: If you don’t click the heart, than you are not trending. So make sure to click the heart, share with your friends, follow the publication, and send us your comments!

Thanks to Stephanie Kovalchik for jumping on this project! The makings of this idea most likely came from our first conversation. Check out her new blog and make sure to read her posts on the Pythagorean theorem of tennis because it is absolutely fascinating! You can also read about her other work on match format and inconsistency on the Tennis Notebook or on her own blog. Special thanks to Charles Allen for running analysis for the radar graph. Music from Free Music Archive. Actual artists include: Jahzzar, Joao Picoito, Jon Luc Hefferman, and Kevin Macleod.

If you enjoy reading these tennis notes, make sure to follow the publication, ‘Recommend’ and share! Check us out on Facebook! Made a cool observation? Interested in certain topics and writing? Are you a tennis photographer? Comment, add notes, and check out the submission guideline. Cheers!