Fantasy Premier League (FPL) Analytics

Exploring FPL statistics, visualizing trends, and predicting points

Shaylen Hira
12 min readDec 9, 2022

What is Fantasy Premier League?

Fantasy Premier League (FPL) is a free online game that casts you in the role of a fantasy manager of Premier League players.

The objective of the game is to select a squad of 15 players from the Premier League who score points for your team based on performances for their clubs in Premier League matches.

Prices are given to players based on the number of FPL points they are projected to deliver, with players ranging in cost from £4.0M to £13.0M, and you are limited to a budget of £100M for your 15-man squad.

FPL has been in existence for over 10 years and there are currently over 10M active FPL managers throughout the world. Visit the official FPL website for more information on the game.

Data-driven strategy

Competitive FPL managers rely on both their experience and in-depth knowledge of the game to strategically select players, captain choices, make transfers, and play their chips throughout a Premier League season. While this approach may work for seasoned FPL experts, it possesses personal bias and favoritism. On the other hand, a data-driven strategy has the potential to help any FPL manager rise up the overall ranks.

Several data-oriented analyses have been conducted by FPL enthusiasts to experiment with the game’s statistics. The outcomes of these analyses highlight the value in using FPL data in innovative ways to optimize FPL points returns. For example, in an article by Dilyan Kovachev¹, an algorithm is used to generate an optimal FPL squad based on a player’s value, which is their FPL points returned per FPL currency spent. This approach was successful, with the data-driven method beating a personal and biased squad selection.

A current challenge for FPL managers is that the official FPL website only offers a basic view of the game’s statistics in a tabular format, making any form of data analysis tedious and inefficient. Given this limitation, there is an opportunity to make FPL data accessible in a form that is more useful for FPL managers so that they can easily visualize trends in player and team performances throughout the season.

In addition, given the variety and depth of FPL data available, there is scope to perform several types of machine learning tasks to enhance the analytics capabilities for FPL. One key opportunity is to enable the ability to predict FPL points for an upcoming game.

Are there existing solutions?

Several sophisticated FPL analytics services exist and are publically accessible online, with most of them requiring a paid subscription. The popular ones are listed below and offer an incredible range of statistics, interactive features, and blog posts from FPL experts.

  1. Fantasy Football Fix
  2. Fantasy Football Hub
  3. Fantasy Football Scout

Introducing FPL Assist

FPL Assist is a solution that empowers FPL managers with access to enhanced analytics and data visualizations for FPL, serving as an online tool with a series of interactive dashboards. Machine learning forms an integral part of FPL Assist’s capabilities, enabling a view of expected points for every player’s result and next fixture.

A screenshot from FPL Assist is shown in Figure 1, displaying a summary of Harry Kane’s season after Gameweek 16, showing that he is expected to score 5.7 points against Brentford in his next fixture.

Figure 1: FPL Assist — Harry Kane’s Season Summary

How does FPL Assist work?

FPL Assist is hosted in Google’s Looker Studio as a series of live, interactive dashboards. Figure 2 visualizes the cycle of processes that are automated using Python scripts and is executed after every gameweek to reproduce or update FPL Assist.

Figure 2: FPL Assist Process

1. Data Extraction and Transformation

All FPL data is accessible via the Fantasy Premier League API. FPL Assist has a documented Postman API collection that is publicly accessible. Several schemas are available via the API containing both aggregated and granular data at a player, team, FPL manager, and fixture level.

One limitation of the FPL API is that fixture-level data from previous seasons is not available. However, historical season data is accessible from a public GitHub repository created by Anand Vaastav². The first version of FPL Assist only considers data for the current Premier League season (2022/2023).

The essential statistics for FPL such as points, minutes played, goals, assists, clean sheets, saves, yellow and red cards, BPS (Bonus Points System) points, ICT index (influence, creativity and threat), transfers, player ownership, player cost, and more are all available.

Using a Python script for data extraction and transformation, a consolidated FPL Assist dataset is generated with every row representing a unique player’s FPL statistics for a fixture in the Premier League season (both completed games and future fixtures). This dataset serves as the foundation for all visualizations and machine learning models used in FPL Assist.

In FPL, form is associated with a player’s average points scored within a calendar month (total points divided by the number of games played by the player’s team in that period). However, FPL Assist calculates and defines its own form metrics associated with all FPL statistics, which are represented as a rolling average of a statistic from the previous 6 games played. This is necessary to produce a holistic view of a player or team’s form throughout the season. A period of 6 games is specifically selected to ensure that players will have enough minutes captured and will have faced opponents, both home and away, with varying difficulties.

Model and Feature Selection

A linear regression model is selected as a baseline for predicting expected FPL points for an upcoming fixture. Given that players in each position score points differently, the FPL Assist dataset is split by player position so that a unique regression model can be created and trained for each player’s position.

The following rules are applied when filtering the FPL Assist dataset for model training and evaluation:

  1. Select completed fixtures
  2. Select players with at least 20 minutes played
  3. Select players with FPL points ranging from 0 to 20

These rules are applied to ensure that outliers are excluded and that players have sufficient minutes played to capture a reliable amount of FPL statistics.

Before an optimal regression model can be selected, a linear regression assumptions test, created by J. Macaluso³, is conducted iteratively to test several features in the baseline model. The following features are identified as the most important:

  1. Home Fixture: Is the game played at home or away?
  2. Team Fixture Difficulty: Index ranging from 1 to 5
  3. Team Goals Scored: The number of goals scored by a player’s team
  4. Team Goals Conceded: The number of goals conceded by a player’s team
  5. Player Minutes: The number of minutes a player plays in a game
  6. Player BPS Total: The BPS score for a player in a game
  7. Player Creativity: The Creativity Index score for a player in a game
  8. Player Threat: The Threat Index score for a player in a game

An example of the test results for the Forwards model, with the latest data after Gameweek 16, is represented in Figure 3.

Figure 3: Linear Regression Assumptions Test Results for the Forwards Model after Gameweek 16

The results of the test suggest that the following linear regression assumptions are satisfied for the selected features:

  1. There is a linear relationship between the target variable (FPL points) and the selected features.
  2. There are no cases of multicollinearity among features.
  3. There is little to no autocorrelation in the dataset.
  4. Residuals have relative constant variance.

However, one assumption that is not completely satisfied is that the residuals are not normally distributed, meaning that confidence intervals are likely to be affected. This risk is accepted, but in a future release of FPL Assist, this can be potentially avoided by performing a nonlinear transformation on the features.

A 5-fold cross-validation is then performed on several types of regression models to determine how each model performs throughout the season as more data becomes available.

The cross-validation scores are visualized in Figure 4, where the Gradient Boost regressor is identified as the optimal model selection. The visualization shows that the model performance is unstable over the first few gameweeks, when little data is available, but improves over time until it eventually stabilizes with relatively high mean cross-validation scores ranging between 0.88 and 0.95.

Figure 4: Model Cross-validation Scores

2. Model Training and Predictions

A Python script is executed to extract sets of feature matrices and target values for each player position from the FPL Assist dataset. When training the regression models, only historical data from all prior completed fixtures is used. This ensures that there is no risk of data leakage.

When predicting the expected points for a player in an upcoming gameweek, the most recent form metrics associated with the selected features are used. This supplies the model with an approximation for what a player’s performance would be in the next gameweek, based on the player’s historical performance over the previous 6 gameweeks.

3. Data Upload and Visualization

The FPL Assist dataset is updated with the latest expected points predictions before being split into several tables that are then uploaded (or updated) in Google BigQuery. The Google Cloud platform is specifically selected as the data storage and processing platform for this project as it offers opportunities to seamlessly connect with Looker Studio, which is where the front-end of the first version of FPL Assist is hosted. The Google Cloud platform also enables the opportunity to continue growing the FPL Assist service over time and to potentially host the front-end on a website with more enhanced functionality in a future version.

FPL Assist currently offers a series of interactive dashboards as seen in the navigation menu in Figure 5.

Figure 5: FPL Assist Navigation Menu

A demonstration of an analysis on each dashboard is used to highlight the value that FPL Assist provides in helping FPL managers adopt a data-driven strategy.

All Players

The All Players dashboard is typically a starting point in the FPL Assist data analysis journey. Figure 6 shows a screenshot of the unfiltered view, showcasing some of the inviting visuals and interactive elements.

Figure 6: All Players Dashboard

The ability to select various filters on the dashboard enables FPL managers to zone in on various player segments to identify key insights. The use of simple horizontal bar charts is particularly helpful to efficiently identify the ordinal ranking of players for various FPL statistics. The layout of the bar charts offers the ability to compare how players rank against others for every FPL statistic, including the form metrics that were engineered in the FPL Assist dataset.

Figure 7 showcases a more sophisticated visualization, yet an effective one for identifying player segments in FPL.

Figure 7: Price vs. Ownership

The Price vs. Ownership chart is known as a bubble chart, with every bubble representing a player on a two-dimensional plane. The size of each bubble represents a player’s FPL points scoring form. In FPL, there are known player segments referred to as Premium, Differential, and Budget players. Each of these player segments can be visualized on the bubble chart within the areas marked adjacent to the threshold value lines. For example, Kieran Trippier is highly owned with 63.98% ownership, but has a relatively low cost of £5.9M. This means that he is a budget player that is highly-owned. Given that Trippier’s bubble size is relatively large at this point of the season, the visualization clearly points out that Tripper is of high-value as an FPL asset.

The All Players dashboard also facilitates efficient navigation to a player’s detailed dashboard with hyperlinks being available to navigate to a pre-filtered view on a Player Detail dashboard (Goal Keepers, Defenders, Midfielders, Forwards).

Player Detail

The Player Detail dashboard is designed for in-depth player analysis. Figure 8 displays what the player summary communicates on the Midfielders dashboard and how the interactive filter can be applied to search for any FPL player.

Figure 8: Gabriel Martinelli on the Midfielders Dashboard

Figure 9 represents one of the most useful visualizations from the Player Detail dashboard. This visualization highlights an example of how expected points can be compared against actual points scored throughout the season for any player. Arsenal’s Gabriel Martinelli has a total of 77 actual points after Gameweek 16, with a total expected points tally of 70.5 points.

Figure 9: Actual vs. Expected Points

The difference between actual and expected points (EP) is referred to as ∆EP or “Delta” EP. When ∆EP is greater than zero, it can be interpreted as a player over-performing, and when ∆EP is less than zero, it essentially means that a player is under-performing. Interpreting this metric in this way offers additional insight for FPL managers as it provides a statistically proven method to identify players that are predictable or unpredictable. In this case, Martinelli appears to be performing consistently well this season as his expected points trend line is not far away from the actual points trend line. With a low ∆EP of +6.5, Martinelli is fairly predictable and is performing better than expected.

The Player Detail dashboard also displays other useful time-series trends showing key FPL statistics throughout the season. As shown in Figure 10, these visualizations offer an efficient means for identifying shifts in a player’s form, transfer activity, and minutes played. These insights are extremely valuable for any FPL manager that needs to understand a player’s historical performance (or recent form) in one view.

Figure 10: Additional Time-series Visualizations

Figure 11 shows how the Player Detail dashboard can be used to see historical gameweek statistics as well as upcoming fixtures and their difficulty ratings.

Figure 11: Tabular visualizations on the Player Detail Dashboards

Set Pieces

The Set Pieces dashboards offer a simple view of who the primary and secondary set piece takers are for each Premier League team. Figure 12 shows an example of the latest penalty takers.

Figure 12: Penalty Takers

Teams

The Teams dashboards offer a supplementary route for analysis. It is obvious that there is a direct relation between a team’s FPL potential and the individual players within the team. However, it is often a difficult task to decide on whether to double or triple up on player selections from a specific team. The All Teams dashboard is similar to the All Players dashboard in structure and offers an immediate view of how teams compare against each other for all FPL statistics. Figure 13 is particularly useful to highlight teams that are in and out of form, by identifying where they are positioned on the bubble chart grid. In this example, Newcastle is seen as the team with the best clean sheet potential as well as a team that is in excellent form.

Figure 13: Team Goals vs. Clean Sheets

The Team Overview dashboard offers a more visual approach to identifying key players for each team as well as an efficient means for understanding a team’s form trend throughout the season. Figure 14 shows an example of Liverpool’s Team Overview dashboard.

My Squad

The My Squad dashboard tab is unavailable in the first version of FPL Assist. However, there is scope to develop features that fetch and retrieve an actual FPL manager’s squad so that more enhanced machine learning tasks and analytics can be introduced to offer additions such as transfer recommendations, captain choices, and other useful insights.

The Future of FPL Assist

The first version of FPL Assist is a success. The tool demonstrates the potential that FPL analytics has in being able to transform FPL data so that it becomes more meaningful and useful for FPL managers.

The ability to predict points, visualize trends, and engage with an interactive tool can certainly help drive a more data-driven mindset for FPL managers at any level.

Additionally, as more FPL managers get their hands on the ability to improve their rank, the game will become more competitive, and fun, which is the ultimate goal and assist”.

The next version of FPL Assist will offer even more value and will be made publically accessible on a website.

Get the latest FPL Insights

Follow @fpl_assist on Instagram for regular FPL insights and more news and updates on the official launch of the FPL Assist website.

FPL Assist aims to continue growing its community of competitive FPL managers and to create the best digital content that FPL fans love.

References

[1]: Dilyan Kovachev. (January 29 2020). How our AI got Top 10 in the Fantasy Premier League using Data Science https://towardsdatascience.com/how-our-ai-got-top-10-in-the-fantasy-premier-league-using-data-science-ba88b185b354

[2]: Anand Vaastav. (August 2022). FPL Historical Dataset https://github.com/vaastav/Fantasy-Premier-League

[3]: Jeff Macaluso. (May 27 2018). Testing Linear Regression Assumptions in Python https://jeffmacaluso.github.io/post/LinearRegressionAssumptions/

--

--