Accessing Fantasy Premier League data using Python

How to access data from the FPL API

James Leslie
Analytics Vidhya
5 min readMay 5, 2021

--

About the author

I first started playing Fantasy Premier League (FPL) in 2007. I was in high school at the time, and I didn’t really follow the Premier League. However, most of my friends did, and I was tired of not being able to join them in slacking off through their team selection discussions at the back of the classroom. I remember selecting my very first team, which had a midfield consisting of Fabregas, Gerrard, Lampard and Cristiano Ronaldo and then a bunch of defenders from bottom-of-the-table clubs like Portsmouth and Wigan, as they were the best I could afford with my remaining budget.

As time went on, FPL heightened my interest in the premier league, while the knowledge I accrued from watching live games benefitted my FPL team. Over the next few seasons, I found myself consistently near the top of my mini leagues. I finished within touching distance of the top 10k back in 2015/16, but largely I would say I have been okay from a global overall ranking perspective.

During my time at university, I learnt to code using Python and now work as a data scientist. I have recently been wondering if I can use some of my skillset to make my FPL decisions in a more data-driven way. I stumbled upon a Reddit thread which mentioned an API (Application Programming Interface) for FPL and this article will provide a beginner’s how-to guide for accessing this API in Python.

Fantasy Premier League API

The good news is that the API is completely free to use (provided you don’t use it for anything that earns you money). This API consists of a number of endpoints which, together, can provide users with a full picture of the FPL game. An endpoint is basically just a URL that responds with some data after you send an HTTP request to it.

A summary of these endpoints is given below:

Bootstrap-url

Let’s start off with the first endpoint in our list; try pasting its endpoint (https://fantasy.premierleague.com/api/bootstrap-static/) into your browser address bar — you should get back a rather messy response that looks something like this:

JSON data format

The response you get back from the API is in a format known as JSON. It is very widely used, especially in the world of APIs, so knowing how to work with data in this format is a really useful skill.

Luckily, we have a few Python libraries we can use to sift through these data more easily, namely requests, json and pandas.

Using the requests library

To create a new request, we use the requests.get() function. and use the .json() method on the response to parse it correctly.

If you try to print the value of the response, your notebook editor will become dramatically slower as it trys to render the full cell output. At this stage, we are only interested in understanding the schema and don’t need to see every single piece of data printed out onto our screen. The pprint module offers a neat solution to this problem, and we will use it to show only the top-level columns in the API’s response.

Response is in JSON format, very similar to a nested Python dictionary

Now we can see that this endpoint contains a few nested fields. Let’s take a closer look at some of these fields.

Player data

The elements field contains data for each premier league player in the current season of FPL. We can access these data just like we would access data associated with a key in a dictionary. The response is a list of more dictionaries — one per player.

Let’s get the elements data and then show the information about the first player in the list:

Each nested dictionary contains information about a particular player

Easier inspection with Pandas

At this stage, it would be more useful to us if these data were in a tabular format. Pandas is a library that was made for exactly this purpose. Loading JSON data into a dataframe is can be done with the json_normalize() function:

Pandas dataframes are useful for showing multiple player’s in a single view

Now we have a table containing information about every player, but we don’t know which teams they play for or what positions they play, as we only numeric ID values for those columns.

Supporting data

We can get the names and strength ratings for teams by extracting the teams field from the base response into a dataframe:

Each team has strength ratings for attack and defence, home and away — could be useful for analysing friendly upcoming fixtures?

Similarly, for player positions, we will use the element_types field:

Let’s combine these three tables to get a single view of a player. We will use the merge() pandas function to join tables on their related columns. Players can be joined to teams using the players.team and teams.id columns:

Combined view of players and their respective teams

Then we can join the player positions, too:

Combined view of players, teams and positions

Note the two different uses of the merge function above. It can either be called as a static function (e.g. pd.merge(left_df, right_df, on= ...)), or as a method on an existing dataframe (e.g. left_df.merge(right_df, on=).

Player gameweek history

Now that we have some basic information for players, teams and positions. Let’s get the gameweek points from the current season.

We can do this in two ways:

  1. For each gameweek GID, get all player data from https://fantasy.premierleague.com/api/event/{GID}/
  2. For each player PID, get gameweek history from https://fantasy.premierleague.com/api/element-summary/{PID}/

Since we already have all players in one dataframe, let’s go with option 2 and get data on a per-player basis.

The element-summary endpoint contains three fields at the top level:

  1. fixtures contains upcoming fixture information
  2. history contains previous gameweek player scores
  3. history_past provides summary of previous season totals

We can define a function called get_gameweek_history() which takes a single argument, player_id, and returns a dataframe of their scores for all previous gameweeks:

Above is an example of calling the function to get the history for the player with ID=4, Pierre-Emerick Aubameyang. We can see he started the season with a goal and an assist in his first two games, before a run of three games without any goal contributions.

Player season history

We can write a similar function to get the previous season’s numbers for a player:

We can see above that Mesut Ozil’s best season was 2015/16, when he scored 200 points. Sadly, for Arsenal fans, he was not able to replicate this form in later seasons.

Bringing it all together

We will finish off by creating a points table, which contains all gameweek points for all players in the game for this season.

First off, create a single dataframe with player names, teams and positions

Create a players dataframe with position and team information

Next, use pandas’ progress_apply() dataframe method to apply the get_gameweek_history() function to every row in our players dataframe.

Get all players’ gameweek scores using progress_apply()

Let’s use our new dataframe to find the top 5 points scorers for this season.

What next?

Hopefully, this should provide you with a decent starting point for your own analyses. I will be publishing a follow-up article to this one and will add a link to it here when I am done.

--

--

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com