Do Soccer Competitions Have Any Similarities

Alejandro Tarazona
INST414: Data Science Techniques
5 min readMay 6, 2022

There are many different types of soccer competitions around the world. Most of which are played at the same time. If you ask any soccer fan, they will have their favorite league. Some say that the premier league is the most competitive, while others prefer the South American leagues. Some people will guide their preferences based on their favorite soccer team. I have strong favoritism for the Fc Barcelona, so I will always prefer the Spanish league. For anyone that watches soccer, it is safe to say that the premier league is the league that goes to the wire. Most of the league champions are decided on or after the last match. For example, Leicester’s historic run in the 2015–2016 season.

European Leagues

For this assignment, I focused on all the European leagues’. Not only since this was the only data that I could pull with my trial, but it was the league that I wanted to focus my report on. I grew up with soccer and have watched it since I could remember. My primary source of data was my. sportsmonks.com. Which launched in 2016, and it has been one of the primary sources of football stats online.

Data Collection

I retrieved the data from the my.sportmonks.com API for this project. All the information was gathered by using a get request. I also used Pandas to help me put the different data categories into a data frame. Since the data was mainly focused on a Scottis and Netherlands soccer league, I was forced to create an account and pay a small subscription fee. I had to do a 14-day trial to gain more access, which opened up all the information for the European leagues. My first goal was to focus on comparing teams, but I quickly noticed that I was a bit too hard to pull. So I decided to reach the leagues themselves. I wanted to the regression of the data based on the exported information.

Data Analysis

So from the data I was able to gather, I noticed that I could only pull the data from the European leagues. Anything more than that, then I would have to pay for it. Keeping it simple was my goal. Bellow will be a quick preview of the breakdown of all the leagues that I could pull. As you can see, there are some well-known big leagues on this list and some leagues that are not so well known. I utilized the API for pulling the most necessary information like type, legacy_id, country_id, and name. These are the primary data points from this database.

With the data, I wanted to analyze the legacy_id and country_id quickly. Although this does not do much in the point, I am trying to prove. You can see some of the information and the mean legacy_id. The legacy_id is a unique league rating that the API gives each leach. It was a quick analysis of what that means for this data.

Using the data frames, I formulated a scattered plot to show this unique information. On the left-hand side, you get to see all the leagues on the list and their country_id. The goal of using this graph was to show the readers how similar the leagues are. By having similar, if not the same country_id, you can notice a trend in the tournament/ leagues from the same country. The graph is a great way to show the regressive scattered plot. This was used to show how similar the leagues are. However, there are a couple of outliers due to the API not giving me full access to the data. One can still see how similar some competitions are. The graph on the right is the rating and the country_id. Although we cannot say that the two are related, there are some similarities, especially for those tournaments from the same country.

Problems, Issues, and Bugs

Most of the issues that I ran into while extracting the data was the minimal amount of data that the free API would offer. I wanted to focus on all the European soccer leagues, but as you can see from above, there were a ton of leagues/ tournaments missing from this list. Then I had many problems with the different diagrams and trying to figure out which diagram would portray the information the best way possible. At first, I forgot to upload the datafram_image library, so I had to look around to find out how to do it. With the help of Benjamin Lam’s Assignment one breakdown, I was able to get an understanding of what I was missing. The last problem that I was running into with this assignment was finding a good API that would work. Some soccer APIs were super limited. Some only gave information on one league. This took much research to find the correct one.

Conclusion

It is crazy to see how the soccer leagues differ so much. This API database had its way of tracking every league using the legacy_id. With this, I could figure out, based on the website’s rating, how similar each league was. Although there are a lot of different leagues, I was still able to see how similar some could be. If you do not watch soccer, you would not know how similar the premier league and FA Cup are or La Liga and La Copa Del Rey. This regression graph showed me how many tournaments/ leagues there are that are super similar to each other. Most of the data opened my eyes to see how similar some tournaments are. Although they are not super related, there are still two or three super identical tournaments.

Link to code: https://medium.com/@Atarazona/do-soccer-competitions-have-any-similarities-b5302f5ed577

--

--