# Tennis Note #30

--

## Sometimes the Shots Just Don’t Add Up

In Tennis Note #29, I presented a number of visualizations to demonstrate that Set Scores in tennis often fail to give an accurate view of how close a match may have been. In the note, I introduced the use of an “Exploding” Box Plot to show a range of Point Totals for each Set Score (6–0 to 7–6), and a “Points-to-Set” Chart to illustrate the dynamics of 6–0 sets that contain more points than 6–4 sets.

Rally lengths can provide a fresh perspective on deconstructing set scores. Why? Different surfaces, balls, playing style, and even playing conditions [altitude, temperature, humidity] can produce points which vary widely in their tempo and duration. Thus, two sets with an equivalent score and the same number of points played may differ greatly in average rally length per point, sometimes even by a factor of two or more.

In the graphic above set scores are color-coded, ranging from yellow (6–0) to blue (7–6). When arranged by points per-set vs. shots per-set it appears orderly; when arranged by average points per-game vs. average shots per-point, it is chaotic in nature. However, drilling down it is possible to compare sets from the same match. Sometime there are clusters indicating the same level of play for different score outcomes. In the match between Nadal and Federer depicted below the 6–2 set (red) had slightly longer rallies for each point than the 7–6 set (blue).

Sometimes, there are huge disparities. In the match between Halep and Janković shown below, the 6–0 set contained points with 35% longer rallies than the 6–2 set, 20% longer rallies than the 6–4 set, and an average of 23% more points played per game. Did the final score reflect the intensity?

To most coaches and many fans, none of this is a surprise, and yet it is common to see (and hear) match scores described as containing “bagels”, “double bagels”, “bicycles” and “breadsticks”. People do still seem to marvel that a 6–0 set can be followed by a 7–6 set, or vice versa. Or that two players who face each other in matches separated by only a few weeks can produce drastically different outcomes.

It is certainly possible to pick through professional match stats to compare the number of points that were played in each set, to make a rough assessment of whether a 6–0 set was a blowout or a battle, but it is not often done. It is even possible to get out a calculator and divide the match time by the number of points played to get an average point duration… it is rarer still, but not unheard of. What usually happens is that a single stat or a handful of stats is held up to explain a loss, or a win — depending on the player being subjected to analysis.

I’m not claiming that it is in any way invalid to search for an explanatory stat, only that short-hand descriptions are usually thought stoppers, and that too much weight is given to set scores in assessments of match outcomes.

My goal with TennisVisuals.com is to use data visualizations to put explanatory stats in a broader context, to make the barrage of numbers more digestible, to make it easier to distinguish between matches that were won and matches that were lost. When it is possible at a glance to see that in fact a 6–0 set stretched beyond the number of points typical of a 6–4 set, or that a 6–1 set had the same number of points-per-game as a 7–6 nail-biter, with slightly longer rallies, it may not be so easy to assume that one player dominated the other; one might be inspired to conclude that more subtle lines of reasoning might be worthy of pursuit.

Yet there is a significant challenge faced by those who delve into such matters: despite that fact that numbers are constantly flung at viewers, rattled off by announcers, and poured into officially sanctioned tables (and even a few bar charts) on websites, very little raw data is readily available. Apart from the Grand Slams, the number of winners, forced errors and unforced errors is almost never provided; point progressions can be found on betting sites, but are not offered by the ATP or WTA; rally lengths are nowhere to be found. IBM’s Slamtracker provided rally lengths for a few years but no longer does. Why?

Until the advent of the Match Charting Project, it was generally necessary for coaches and/or researchers wishing to analyze rally lengths to gather the data themselves, or go begging. Today there are more options for gathering match data, but many of those options still don’t collect data on rallies.

## The Rally Reality

Studies of rallies in tennis tend to be found in articles and books related to sports performance, focused on the physiological impact of varying rally lengths. Averages for different surfaces (different Grand Slams, really) have at times been calculated, and there is no shortage of discussion of tactics within rallies, but there is not much that can be found which investigates how the length of rallies relates to other match statistics.

One article that stands out is Jeff Sackmann’s 2011 article on the persistence of a server’s advantage. Sackmann looked at the percentage chance that a player would win a point either serving or receiving if it reached a certain rally length; looking at Djokovic’s numbers he stated, “when he gets his return back in play, he’s more than likely to win the point.” Looking at the RallyTree above it is clear that the assessment holds for 2015, at least according to the data available for matches in the MCP data.

You can read more about how to interpret the RallyTree visualization at TennisVisuals.com.

## Do the Shots Add Up?

I started this series of articles on Set Scores to support a claim I made a few weeks ago in reference to the yet another assertion about the ability of interactive data visualizations to enable critical moments in matches to be identified and further explored. In order to write the articles, I had to enable MCP data to be viewed in several new ways. The scatterplot below is an interactive tool I created which enables sets from MCP matches to be compared by per-set totals for points and shots, per-game averages for points and shots, and per-game averages for points vs. per-point averages for rallies. Selecting a set highlights, the other sets in the same match and displays Points-to-Set charts which depict point progressions and the relative length in total points of each set.

The graphic above compares a match with a 6–3 set that is roughly equivalent to a 7–6 set in both total points and total rally shots; the 6–3 set, however, is composed of games where the average rally was roughly 25% longer than the average rally for the 7–6 set. Below is an example of how much two 6–0 sets can vary in both average points-per-game and rally shots-per-point.

Without doubt there are numerous considerations that are not yet captured by the visualizations of sets that I have done thus far. For instance, I believe a set’s average rally length can vary significantly depending on who served first. While the tools are still immature, I think they can be pushed quite a bit further as aids in “drilling down” into the critical aspects of tennis matches, especially as they are linked to other interactive visualizations. I’m particularly excited by the prospects of what may come out of the Tennis Data Storytelling Challenge.

I hope the visuals I have provided make it a bit easier (and more fun) to compare set scores and help you differentiate between bagel flavors!

If you’re inspired to learn more about match statistics in general, I encourage you to try out CourtHive, a free match charting tool available at TennisVisuals.com. You can read more about match charting and why I think it is important for all tennis fans to “give it a go” in my article “How to Chart a Match”.

All of the code and data for the visualizations in this article is available on GitHub and bl.ocks.org. Feel free to contact me with any questions or ideas and to play around with the source, if you’re so inclined.

If you enjoy reading these tennis notes, make sure to follow the publication, ‘Recommend’ and share! Check us out on Facebook! Made a cool observation? Interested in certain topics and writing? Are you a tennis photographer? Comment, add notes, and check out thesubmission guideline. Let me know which visuals are good and which are not so great. Cheers!

--

--

Tennis Parent, Ecological History, #dataviz, #DataVisualization, #sportsanalytics, #d3js