An example of a network for 15 players with highest percentage of wins in Grand Slam tournaments, taken from the Tennis Prestige website. The original journal article can be found here, it considers matches between 1968–2010. If you have no idea what this means, don’t fret! We are going to discuss it here.

Tennis Note #25

The Complex Networks of Tennis

Nikita Taparia
The Tennis Notebook
10 min readNov 1, 2015

--

Who is the greatest tennis player ever? — this is the most common question discussed among commentators, journalists, and fans. As years pass, more individual seek to visualize the answer to this question. For example, FiveThirtyEight looks at ELO ratings or equivalently, Financial Times looks at cumulative win percentage. However, there is a completely different system I would like to highlight today called Tennis Prestige, developed by Prof. Filippo Radicchi. I asked the professor a few questions in order to explain how his system works. Before we dive into how prestige score is calculated, you need to understand how networks work and how to build a tennis network. If you already know about the theory, then scroll done for interactive visualizations of prestige scores for all the players in the last 47 years. Let’s dive in!

What is a network?

Networks are everywhere. In order for me to get to the US Open, I take the bus to the downtown to the light rail station, which connects me to the airport. Depending on the airline, there is a specific route that gets me to my destination airport. From there, I most likely will take whichever mode of transportation to get me to the 7 train, and this will lead me to the Billie Jean King National Tennis Center. Everything I just described is a series of transportation networks, all established within each system but also connected to one another. Other examples include social networks like Facebook or disease spreading through the population. Let’s consider another major example: the Web — a series of pages linked to other pages but not necessarily both ways.

It is essentially a directed graph. Each page is a node and the arrows indicate how they link to one another but how do you find these pages? When I search Rafael Nadal in Google [incognito mode to remove bias], the first three major pages are Wikipedia, Twitter, and ATP. [Unfortunately, Tennis Note 7 and 8 are not quite up there in the ranks.] How is this order picked?The ranking algorithm is called PageRank. There is a fantastic d3 visualization to goes into the mathematics of PageRank, if you want to engage in an interactive exercise. For a more intuitive understanding, one of my favorite textbooks on Networks had a great description:

Think of PageRank as a kind of fluid that circulates through the network…node to node along edges and pooling at nodes that are most important.

Specifically, what is a tennis network?

Now that we have established the concepts behind networks and rank, let’s apply this to tennis. At the time Prof. Radicchi published his work, he examined all tennis matches from 1968 to Oct. 2010. Consider a single match — it connects two players, a winner and a loser. In many cases, these players have come across each other more than once. Use this to establish a weighted directed edge, in which the weight is defined by the number of losses. Here are two examples. The first example looks at our recent 2+ slam winners and their head to head against each other at the slams only.

The cover photo above is a more complete network of the top 15 with high win% at slams and how they connect to one another.

Is it starting to make sense? Consider another example — a generic tournament draw [this example was in the paper]. In the end, there is one winner but a draw is essentially a network of winners and losers, which leads ‘the fluid’ to one location — the winner.

Once Prof. Radicchi established the network, he used a modified version of the PageRank algorithm to quantify the importance of each tennis player — their Prestige Score. Embrace the math (or skip to my simple explanation)!

What does this all mean? Going back to the analogy of fluid in a network. The connection between pool i and pool j has a slight slope based on the weight. This controls the diffusion of fluid flow. Now imagine all this fluid distributed throughout a huge network with different connections both ways and of different slopes. When the system finally settles, which nodes have the most fluid? This is the Prestige Score.

I asked Prof. Radicchi to explain Prestige Score in an intuitive manner. He states:

A player has high prestige score based on the quality of the players he has beaten, not just the number of opponents defeated. In this case, it is more intuitive to think that “tennis credit” is flowing in the network, jumping from player to player based on the results of tennis matches. Prestige Score quantifies the percentage of credit that each player owns. Beating a good player (with large credit) thus may be better than beating many other not so good player (low credit).

Clearly, this is done on the computer but let’s apply this to our tennis draw example. The math is completely done in the paper but I have provided the simplified term only.

Notice, that if you total the prestige score you get 4(0.05)+2(0.1)+1(0.2)+(0.4) or P =1! Remember, this was a constraint applied to essentially keep everything on a 0–1 scale.

This is for a mini-tournament but imagine doing this for all matches played in the last 47 years! The calculations are up-to-date on the website I linked earlier. An interesting note about this calculation, it is based on the number of players and number of tournaments. If you go to the Tennis Prestige website, you can see how these numbers have changed. It seems there were more tournaments and also more players in the early years but in this current era of tennis, the numbers are lower. Keep this in mind for the analysis I do towards the end of the article.

Prestige Rank vs. In Strength Rank and ATP Rank

Prior to 2011, Prof. Radicchi identified Jimmy Connors as the best player ever, taking into account every single year and the prestige rank, as shown below (Fig A). He compared prestige rank to in-strength rank [number of match wins]. Rafael Nadal sticks out because while he has not have as many victories, his prestige rank would suggest the players he faced in these victories were higher quality. What was more interesting was how much prestige rank differed from ATP rank (Fig B).

Figure taken from the original paper by Prof. Radicchi. The paper is under CC license. The in-strength rank is based on the number of victories.

The actual ATP ranking is based on the amount of points collected by players during the season. Each tournament has an a priori fixed value and points are distributed accordingly to the round reached in the tournament. In our approach differently, the importance of a tournament is self-determined: its quality is established by the level of the players who are taking part of it.

I did a similar analysis for the Top 50 Prestige Rank Players as of 5–25–2015. Note, this is not a square axis [0–50 on Prestige Rank does not mean 0–50 ATP Rank]. In fact, with the fitted trend line, it makes it seem like a player ranked 40 in ATP is ranked 30 in Prestige. If you zoom into the top 10, it maps perfectly linearly with the ATP rank. As stated stated by the professor, this has more to do with how ATP ranking works itself.

The outliers are rather interesting. Take Grigor, who’s rank dropped but still has a relatively high prestige rank for the last 52 weeks. Tommy Robredo is even worse and has done better with his Prestige Rank. Again, this has more to do with the quality of wins these two have acquired in the annual time span, not the ATP ranking points.

I also asked the Professor how this system would compare with ELO ratings and he states:

The main difference is that the ELO system is not based on the construction of a network. I expect however that ELO and Prestige Score provide highly correlated rankings.

Lastly, Prof. Radicchi stressed:

Players still in activity are penalized in a global ranking. The same is also true for players who started to play before the beginning of the ATP era (e.g., Rod Laver). In this sense, the use of Prestige Score is more fair when based on single years.

In other words, we will not know where our current era lies in comparison to previous eras. The Prof. Radicchi will rerun this analysis after this golden era of tennis ends. In the mean time, because Prof. Radicchi has provided the Top 100 prestige scores for each year since 1968, I decided to make this visual interactive story via Tableau. For your convenience, I will highlight a few points through animated GIFs and let you explore on your own!

Prestige Scores from 1968-Present (May 2015)

The first panel of the interactive let’s you visualize all the prestige scores v. prestige rank for each year. You can limit your field of view for a particular rank and year. You can also search for an individual player or group of players.

This first panel illustrates all the prestige scores. One thing I noticed is how the curve changed in the lower ranks relative to one another. The second panel takes the difference in prestige scores between one rank and the next rank. This illustrates the spread of competition. Typically, there is higher spread in the top ranks and it essentially decreases, as expected — the top players dominate a lot more and the rest are fighting for a spot. Let’s zoom in on the top players!

We can take the difference in prestige scores between one rank and the next, in order to determine how close one player is to the next. For instance, in May 2015, Djokovic is 1.12% higher in prestige score than Federer. Federer is 1.36% higher than Murray. So on, so forth. What is interesting is the early days: Nadal and Federer literally dominated so the huge peak was at rank 2 — meaning that the top two players were close but the rest of the competition is far away.

Top 25 Prestige Scores of All Time

You can almost see the ebb and flow of top players through the years. You will have to go the Tableau viz to see the entire timeline for the big 4.

After looking at how the spreads change over the years, I identified the top 25 prestige scores of all time and displayed the corresponding players in panel 3 so you can visualize their prestige rank and score in a timeline fashion.The list included players who may not have won slam titles, but have had a wonderful careers. One player that popped out was Hewitt in 2013. Hewitt ended the year with an ATP rank of 60 but with the but because he had five top 10 wins, his prestige score and rank was much higher.

Prestige Scores and Ages

Who was the most dominate player for each age?

ELO identified certain top tennis players of all time. I took this list and compared their prestige score with an estimated age. You can sort by ages and look at who had the highest score for their age. Nadal’s dominance through his early 20s is the absolute best while Djokovic dominates in his late 20s. Federer and Connors dominate later years. Time will tell how this visual will change in the later stages of the big 4’s career.

Changes in Cumulative Match Win % and Prestige Score

Cumulative match win percentage is typically used to illustrate the growth of players over time. However, if you take the difference between the year of interest and the next year, then you can see if there was growth relative to the past. In the image above, this is illustrated through the size of the box while the prestige score is in a similar color scheme.

The last question I asked Prof. Radicchi is for all those who found this entire analysis fascinating: What type of classes does someone need to take in order to learn something like networks?

I am myself teaching a course name “Performance Analytics” whose purpose is to introduce undergraduate students to mathematical and statistical methods used in the analysis of data from professional sports. Unfortunately, there are not many colleges that offers classes on network science. The number of graduate programs in complex systems and networks is, however, growing so I expect it will become easier to find specialistic courses in the near future everywhere. Traditional courses where graph (network) theory is partially covered are in math, statistics, and computer and social sciences.

Luckily for you I took an actual class entitled Networks in my freshman year and provided a link to an early draft of their textbook at the very bottom [from my professor’s website]. There are too many stories to tell from all of these charts. Once the year ends, I will update the interactive. I would love to hear your interpretations and observations so please feel free to leave a response with your findings by clicking the response tab below and do not forget to check out Tennis Prestige and the original paper!

Special thanks to Prof. Filippo Radicchi for taking the time to talk with me. Did you find the ideas behind networks intriguing and want a more in depth understanding of this paper? You are in luck. I had the pleasure of taking a class in Networks in Spring 2009 and my professors provided a draft of their textbook online for free [if you want some light reading]!

If you enjoy reading these tennis notes, make sure to follow the publication, ‘Recommend’ and share! Check us out on Facebook! Made a cool observation? Interested in certain topics and writing? Are you a tennis photographer? Comment, add notes, and check out the submission guideline. Cheers!

--

--

Nikita Taparia
The Tennis Notebook

Engineer. Scientist. Data Nerd. Cookie/Coffee Addict. Educator. Tennis/WoSo. Photographer. Musician. Artist. Whiteboards. Writer.