Insight into a Tennis Prodigy

Alice Miles
INST414: Data Science Techniques
5 min readApr 8, 2024


In the competitive world of tennis, there are millions of players who aspire to be the best. But only a handful of players are able to accomplish being at the top and staying at the top. One player who has caught the attention of many for all his accomplishments at a young age is Carlos Alcaraz. At just 20 years old, Carlos Alcaraz has accomplished so much that the majority of players can only dream of. I aim to conduct an exploratory analysis of Carlos Alcaraz’s achievements compared to the rest of his top competitors.

Stakeholder and Question

A question that can be answered from this dataset is what achievements has Carlos Alcaraz accomplished and how does his skills compare to his competitors? The main decision that the answer to this question will inform is Alcaraz’s potential for future success in comparison to his competitors.

Data Description

The data that could answer this question contains Carlos Alcaraz’s results from tournaments throughout the years since he started playing on the professional circuit and the achievements and titles that he has accomplished. The fields that are relevant to this dataset are tournament name, year, age at the time of the tournament, surface, round reached, and his milestones. This data is relevant to my question because it is important in understanding Alcaraz’s overall performance, his accomplishments, and his career potential which can be compared to his fellow competitors.

Data Collection

There is a lot of information on Alcaraz and all his statistics and data from different sources. But a trusted source that I usually look at when I want to see player’s statistics and it is the one I used to collect my data, is the ATP tour player stats website. This website includes the basic information for Alcaraz such as his age, the year he turned pro, his height, weight, and his coach. The website also includes all his titles that he has won and the tournaments he has participated in and the rounds he has reached for those tournaments since turning pro. For Alcaraz’s milestone accomplishments, I also used Wikipedia to help collect that piece of information because it has a detailed description of each milestone and at what age he accomplished them.

Exploratory Data Analysis

In my code, I first read the excel file that I made for Carlos Alcaraz. The file includes the tournament, the year that the tournament took place, the age that Alcaraz was at the tournament, the surface (clay, grass, hard), the round he reached, his rank at the time of the tournament, and the milestones he reached during the tournament. Then I displayed the dataframe, so that it can be easily readable with all the columns labeled. Next, I calculated the total number of age milestone achievements that Alcaraz has accomplished by getting the sum of the Age Milestone column. I also calculated the number of tournaments that Alcaraz has won since starting in the professional circuit until now. In the three code chunks before the graph, I calculated the number of wins that Alcaraz has produced on each court surface. Lastly, I displayed a bar graph to visually show the wins of each surface so that it is easier to understand and compare. All these calculations can be used to compare to Alcaraz’s competitors’ stats which can help show what his career potential is compared to the rest of his competitors.

Data Cleaning

When data is gathered, it can not just be used right away. It has to go through cleaning processes to ensure that the data is easily readable and understandable. Then, it can be used to complete the data analysis. When I first looked at the stats and data on the ATP tour players webpage, the data was easy to read and in chronological order. But it is not in a clear format since it was on a webpage. So, to make the data more clear and easier to access with code, I organized all the data onto an excel sheet. Here, I broke the information into different columns such as the tournament name, year, the surface that tournament was played on, the age that Alcaraz was at the time of the tournament, the round that he reached, his rank at the start of the tournament, and the milestone that he achieved. A common issue I found with my data after organizing it into the excel file was that in the milestone column, there were missing values because Alcaraz did not achieve a milestone in every tournament he played in. But I did not think I needed to fix the missing values because it did not affect the rest of my data. Besides the missing values, there were no other issues with my data. Since I organized it and cleaned the data, all the columns are consistent in formatting.


These findings on Alcaraz’s wins and achievements, provide insights into his strengths and which court surface he excels on. Based on this data, Alcaraz is stronger on clay surfaces while most of his competitors excel on hard courts. But Alcaraz has won titles on all surfaces, indicating his adaptability to different surfaces, making him a versatile player. Overall, these findings allow us to analyze Alcaraz’s potential success in his career in comparison to his competitors.


When it comes to analyzing any kind of data, there are always limitations that come with it. Some limitations that come with this dataset is that this might not be the complete dataset as this information was gathered from what was already available to the public, so some pieces of information might be missing. Also, the data does not include other factors that could impact Alcaraz’s performance such as coach changes, injuries, and opponent information, which can give a more comprehensive insight into Alcaraz’s career and potential. Lastly, this data can contain bias and selective achievements, depending on who gathers the data and how they perceive Alcaraz. Despite these limitations, this dataset can give an estimate into how successful Alcaraz’s career could be, given his achievements he has accomplished so far.



Carlos Alcaraz: Player activity: ATP tour: Tennis. ATP Tour. (n.d.).

GeeksforGeeks. (2023, September 26). Data Visualization in Jupyter Notebook. GeeksforGeeks.

Wikimedia Foundation. (2024, April 7). Carlos Alcaraz. Wikipedia.

