Accessing Fantasy Premier League data using Python
How to access data from the FPL API
About the author
I first started playing Fantasy Premier League (FPL) in 2007. I was in high school at the time, and I didn’t really follow the Premier League. However, most of my friends did, and I was tired of not being able to join them in slacking off through their team selection discussions at the back of the classroom. I remember selecting my very first team, which had a midfield consisting of Fabregas, Gerrard, Lampard and Cristiano Ronaldo and then a bunch of defenders from bottom-of-the-table clubs like Portsmouth and Wigan, as they were the best I could afford with my remaining budget.
As time went on, FPL heightened my interest in the premier league, while the knowledge I accrued from watching live games benefitted my FPL team. Over the next few seasons, I found myself consistently near the top of my mini leagues. I finished within touching distance of the top 10k back in 2015/16, but largely I would say I have been okay from a global overall ranking perspective.
During my time at university, I learnt to code using Python and now work as a data scientist. I have recently been wondering if I can use some of my skillset to make my FPL decisions in a more data-driven way. I stumbled upon a Reddit thread which mentioned an API (Application Programming Interface) for FPL and this article will provide a beginner’s how-to guide for accessing this API in Python.
Fantasy Premier League API
The good news is that the API is completely free to use (provided you don’t use it for anything that earns you money). This API consists of a number of endpoints which, together, can provide users with a full picture of the FPL game. An endpoint is basically just a URL that responds with some data after you send an HTTP request to it.
A summary of these endpoints is given below:
Bootstrap-url
Let’s start off with the first endpoint in our list; try pasting its endpoint (https://fantasy.premierleague.com/api/bootstrap-static/) into your browser address bar — you should get back a rather messy response that looks something like this:
{"events":[{"id":1,"name":"Gameweek 1","deadline_time":"2020-09-12T10:00:00Z","average_entry_score":50,"finished":true,"data_checked":true,"highest_scoring_entry":4761681,"deadline_time_epoch":1599904800,"deadline_time_game_offset":0,"highest_score":142,"is_previous":false,"is_current":false,"is_next":false,"chip_plays":[{"chip_name":"bboost","num_played":112843},{"chip_name":"3xc","num_played":225426}],"most_selected":259,"most_transferred_in":12,"top_element":254,"top_element_info":{"id":254,"points":20},"transfers_made":0,"most_captained":4,"most_vice_captained":4},{"id":2,"name":"Gameweek 2","deadline_time":"2020-09-19T10:00:00Z","average_entry_score":59,"finished":true,"data_checked":true,"highest_scoring_entry":6234344,"deadline_time_epoch":1600509600,"deadline_time_game_offset":0,"highest_score":165,"is_previous":false,"is_current":false,"is_next":false,"chip_plays":[{"chip_name":"bboost","num_played":94615},{"chip_name":"freehit","num_played":111968},{"chip_name":"wildcard","num_played":494000},{"chip_name":"3xc","num_played":221133}],"most_selected":259,"most_transferred_in":302,"top_element":390,"top_element_info":{"id":390,"points":24},"transfers_made":14637421,"most_captained":4,"most_vice_captained":254} ...
JSON data format
The response you get back from the API is in a format known as JSON. It is very widely used, especially in the world of APIs, so knowing how to work with data in this format is a really useful skill.
Luckily, we have a few Python libraries we can use to sift through these data more easily, namely requests, json and pandas.
Using the requests library
To create a new request, we use the requests.get()
function. and use the .json()
method on the response to parse it correctly.
If you try to print the value of the response, your notebook editor will become dramatically slower as it trys to render the full cell output. At this stage, we are only interested in understanding the schema and don’t need to see every single piece of data printed out onto our screen. The pprint
module offers a neat solution to this problem, and we will use it to show only the top-level columns in the API’s response.
Now we can see that this endpoint contains a few nested fields. Let’s take a closer look at some of these fields.
Player data
The elements field contains data for each premier league player in the current season of FPL. We can access these data just like we would access data associated with a key in a dictionary. The response is a list of more dictionaries — one per player.
Let’s get the elements data and then show the information about the first player in the list:
Easier inspection with Pandas
At this stage, it would be more useful to us if these data were in a tabular format. Pandas is a library that was made for exactly this purpose. Loading JSON data into a dataframe is can be done with the json_normalize()
function:
Now we have a table containing information about every player, but we don’t know which teams they play for or what positions they play, as we only numeric ID values for those columns.
Supporting data
We can get the names and strength ratings for teams by extracting the teams field from the base response into a dataframe:
Similarly, for player positions, we will use the element_types field:
Let’s combine these three tables to get a single view of a player. We will use the merge()
pandas function to join tables on their related columns. Players can be joined to teams using the players.team
and teams.id
columns:
Then we can join the player positions, too:
Note the two different uses of the
merge
function above. It can either be called as a static function (e.g.pd.merge(left_df, right_df, on= ...)
), or as a method on an existing dataframe (e.g.left_df.merge(right_df, on=
).
Player gameweek history
Now that we have some basic information for players, teams and positions. Let’s get the gameweek points from the current season.
We can do this in two ways:
- For each gameweek GID, get all player data from https://fantasy.premierleague.com/api/event/{GID}/
- For each player PID, get gameweek history from https://fantasy.premierleague.com/api/element-summary/{PID}/
Since we already have all players in one dataframe, let’s go with option 2 and get data on a per-player basis.
The element-summary endpoint contains three fields at the top level:
- fixtures contains upcoming fixture information
- history contains previous gameweek player scores
- history_past provides summary of previous season totals
We can define a function called get_gameweek_history()
which takes a single argument, player_id
, and returns a dataframe of their scores for all previous gameweeks:
Above is an example of calling the function to get the history for the player with ID=4, Pierre-Emerick Aubameyang. We can see he started the season with a goal and an assist in his first two games, before a run of three games without any goal contributions.
Player season history
We can write a similar function to get the previous season’s numbers for a player:
We can see above that Mesut Ozil’s best season was 2015/16, when he scored 200 points. Sadly, for Arsenal fans, he was not able to replicate this form in later seasons.
Bringing it all together
We will finish off by creating a points
table, which contains all gameweek points for all players in the game for this season.
First off, create a single dataframe with player names, teams and positions
Next, use pandas’ progress_apply()
dataframe method to apply the get_gameweek_history()
function to every row in our players
dataframe.
Let’s use our new dataframe to find the top 5 points scorers for this season.
What next?
Hopefully, this should provide you with a decent starting point for your own analyses. I will be publishing a follow-up article to this one and will add a link to it here when I am done.