Fantasy Premier League With Snowflake, Streamlit & Python

Alexander Jaballah
Streamlit
Published in
9 min readFeb 16, 2023

This project and article has been created by:

Alexander Jaballah — Partner Sales Engineer, Snowflake

Johan Bäcklin — Sales Engineer, Snowflake

Harald Grant — Data Scientist, Schibsted

Go To Application

Fantasy Premier League is a game that has become extremely popular. The idea around FPL is to get your chance to become a Premier League Manager for a season. Last season there were 9.1 million managers playing the game hoping to have the winning team at the end of the season. As a manager you must pick your team of 15 players from the current season of the Premier League. Each player in the team will gain points for the team based on different achievements in their existing Premier league clubs. As a manager you are restricted by a budget of £100.0m for picking your 15-man squad. To read more about how points are awarded go to this link Fantasy Premier League.

By leveraging Snowflake, Streamlit and Python, you will be able to become one of the top managers in FPL. But first let us explain Snowflake, Streamlit and the ML models that have been leveraged in this article.

Streamlit

The past years there has been a big buzz about Data Science, Machine Learning and Data Analytics and the code language that binds all these techniques together, Python. There has also been a massive uptake in development of applications, especially applications that are being developed using Python.

Streamlit allows users to develop python applications and deploy machine-learning models in just a few minutes. Streamlit also makes it easy to manage and share the application with your peers or ecosystem in a seamless way.

Streamlit was acquired by Snowflake in 2022 and plays an exciting part in Snowflake’s data platform. Where Snowflake helps with everything regarding accessing, managing and sharing data, Streamlit will provide native data visualization to the platform.

Python

Python has hit all the barriers when it comes to programming languages,and has now become the most popular programming language with the numbers of users growing. One reason for its popularity is the simplicity to learn and get started with python. Python also has a great and mature community and provides great flexibility.

Snowflake provides the availability to run python in Snowpark, taking advantage of Python’s rich ecosystem of open-source packages and libraries accessible for data scientists, data engineers, and application developers. But this is not all, Snowflake has also announced worksheets for Python’s support (now in private preview). This will enable users to develop pipelines, ML models, and applications directly in Snowsight (Snowflake’s user interface), using Python and Snowpark’s DataFrame APIs for Python. This will also allow users to take advantage of streamlining development with code auto-complete and the ability to productize custom logic in just seconds.

Snowflake

Snowflake is the data platform and the cornerstone for this project, by leveraging Snowflake the development and the whole process for this project has been very smooth. Snowflake provides an open and agile data platform with huge flexibility. Snowflake is leveraged in order to process and prepare semi structure data on a daily basis, as well as providing a reliant and high performance data engine for the Streamlit application. In this first state of the application Snowflake is only partly leveraged, there are multiple more ways to take advantage and leverage the Snowflake platform. To see more about what can be done please see the “What’s Next” section in this article

Gathering Data

Fantasy Premier League games are very data driven, there is a great API provided by Fantasy Premier League where the majority of the data is coming from. To learn more about the API and the different end points look at this blog below. There is also some data fetched from a player database to complete all the data needed.

Fantasy Premier League API Endpoints: A Detailed Guide | by Frenzel Timothy | Medium

Architecture

The architecture is made of three different building blocks, first AWS in order to extract the data to json files stored in a S3 bucket. The Snowflake is used in order to load and transform the data to serve the application running in Streamlit Cloud.

AWS Lambda

In this project we chose AWS Lambda to extract data and store it in an S3 bucket. The script will create four different json files and will be running once every day.

AWS S3

AWS S3 was chosen to serve as the external stage for this project. The S3 bucket serves as a cloud repository where files will be landed and then loaded into the Snowflake. It is very easy to set up a S3 bucket and leverage it as a target for AWS Lambda and then use the bucket as an external stage for Snowflake. To simplify the usage of the S3 bucket as an external stage, a role was created with an associated policy for allowing read, write and delete permission on the bucket.

Python

Optimisation

The idea of creating an optimized team was inspired by other projects and articles out there, mainly (Linearly) Optimising Fantasy Premier League Teams | by Joseph O’Connor | Medium as well as inspiration from Five ways to combine Mathematical Optimization and Machine Learning | by Hennie de Harder | Towards Data Science. For the optimisation of the team an optimization-algorithm is leveraged for linear programming using the PuLP-library for python. Linear programming itself is powerful and easy to implement, especially for problems with clear constraints which the game of fantasy football has.

The basic idea is to pick players to form a team within budget and other team constraints which generates the highest total expected score. Modeling this type of problem is fairly straightforward. Each constraint is added one at a time to the model and can look like the snippet below:

# Constrains the model to pick a total of three forwards
model += sum(decisions[i] + sub_decisions[i] for i in range(num_players) if positions[i] == 4) == 3

Modeling “expected”-value

After all constraints have been modeled the “real”-problem emerges “How to predict an “expected”-value for a player?”. The expected-value will be the target for the model to optimize towards so it is dependent on the estimate to be fairly correct. The “expected”-value function is based on a couple of data points matching some principles:

Player points excluding bonuses (Historical player performance).

The idea is to avoid anomalies which get highlighted by bonus-points.

Number of games where the player received at least 5 points (Historical player performance)

To champion players who have a steady output of points and preferably play a lot of games, the cutoff was made after looking at the points gathered distribution.

Expected goals and clean sheets for the player team in the X nearest upcoming games (Expected team performance)

By adding this it will ensure that the player will have a good opportunity to continue to play well in the upcoming games. Future fixtures also account for double-gameweeks which are great opportunities to gather a lot of points

Compiling the custom scoring has been done in a couple of different ways and is still a work in progress. However, as long as all data points are included the resulting “expected”-value will hopefully be relatively robust. To provide some flexibility in the choice of being more or less long term in player decisions the custom expected-value function has been modeled for either one week, three weeks or five weeks forward.

As a backup to the custom “expected”-value functions the FPL:s own expected-score has been added to the app as a possible option.

Snowflake

Storage integration

The data is stored in S3 bucket and the best way to access that data is to set up a Storage integration. The Storage integration will store the generated identity and access management (IAM) entity for the Amazon S3 bucket. This also allows users to avoid supplying credentials when creating stages or when loading and unloading data.

Serverless tasks

Serveless tasks are used to load data from the S3 bucket into the database, the great thing with serverless tasks compared to a regular task is there is no need to assign a dedicated warehouse to the task. As seen in the SQL below, 4 different serveless tasks are created and executed every morning at 07:00 CET and will load the data in two 4 different raw tables while also adding the current timestamp of the data load.

Raw Tables

There are 4 different raw tables(EG_RAW, CS_RAW, PLAYER_RAW,GAMEHISTORY_RAW) which all have one column with the data type variant. In these tables the json data are loaded as is from the S3 bucket by the serveless tasks. There is no need for any transformation when loading the json data into the tables.

Modeled Views

From the 4 different raw tables (EG_RAW, CS_RAW_PLAYER_RAW,GAMEHISTORY_RAW) 3 new models views (EXPECTED_GOALS, EXPECTED_CS, PLAYER) are created. The two raw tables PLAYER_RAW & GAMEHISTOY_RAW will be the sources for the PLAYER view. CS_RAW is the source for EXPECTED_CS and EG_RAW for EXPECTED_GOALS view. There is also one final view FPL_MASTER which serves as the source view for the Streamlit application and uses the 3 views together with two dimensional tables as the sources.

Dimensional Tables

There are two dimensional tables used in the architecture(TEAM_DIM, TEAM_METADATA) and are used in order to do the mapping of each team’s name, team shirt and team goalkeeper shirt, as well as the team badge.

Streamlit Cloud

The application is hosted in Streamlit Cloud which is a free platform for the community to develop and deploy applications easily and swiftly. When the application is deployed it is also very easy to share it with anyone.

Streamlit Cloud has an integration with github which makes it fast and easy to get the applications up and running directly from your github repository. Within your Streamlit Cloud account you are able to create a connection to your Github account, when the connection is established you are able to see your Github repositories and select the files to use when deploying your application.

It is seamless to set up connections between Snowflake and Streamlit. The snowflake connector is available via PyPI and the only thing you need to do is to include the connector to your requirements.txt file. When deploying the application you are able to select the appropriate version of Python for your application and provide any secrets like your Snowflake account.

-r https://raw.githubusercontent.com/snowflakedb/snowflake-connector-python/v2.7.9/tested_requirements/requirements_39.reqs

#Include in your requirements file

https://docs.streamlit.io/knowledge-base/dependencies/snowflake-connector-python-streamlit-cloud

Whats Next

There are a lot of new cool innovations all the time, for this project these are few things that would be very cool and interesting to leverage and will be covered in this section.

Data From Snowflake Marketplace

Since Snowflake provides a fantastic public data listing (Snowflake Marketplace), there is a great opportunity to leverage the marketplace to enrich the data for the application from different data providers in a cheap and seamless way.

Streamlit in Snowsight

A very interesting feature for this project is running Streamlit in Snowsight which now is in Snowflakes Private Preview. This will enable users to develop, run, deploy and share an application without having to leave the Snowflake platform.

More More and More Machine learning

There are tons of different machine learning models available that one can leverage for different use cases. In this project there have been considerations to expand the results by allowing the “expected”-value to consider things like “ball touches during a game”, “ball touches within the box”, “Injuries’” etc. Another obvious addition would be to add Chatgpt, davinci3 or similar to give a short description of the team and player statistics.

The full code is available on github: https://github.com/harald-page-profile/fpl-optimizer

Enjoy and Good Luck with your Team!

//Alexander, Johan & Harald

--

--