Can Amazon Sagemaker AutoPilot be used to predict the outcome of a tennis match?
How Sagemaker AutoPilot works on real life use case ? How to take full advantage of Sagemaker AutoPilot? Are the performances as good as those obtained by a datascientist? We will answer these questions in this article.
- Introduction to AutoPilot
- The use case : Machine Learning & Tennis
- AutoPilot without any preprocessing
- AutoPilot with processed dataset
I. Introduction to AutoPilot
What is SageMaker AutoPilot ?
Auto-ML (AutoPilot) is an integrated tool in SageMaker, announced by AWS at Reinvent:2019 which we were able to attend. The main goal of Auto-ML is to simplify model creation to a point where it can be done by non-technical people. We just provide SageMaker with a dataset, specify the type of problems and the target attribute. Auto-ML will be able to build models accomplishing the given task.
How SageMaker AutoPilot works ?
Auto-ML perform three different actions when an experiment is started :
- Data exploration
- Candidate selection
- Model Tuning
Auto-ML will make some basic analysis on the provided dataset such as statistical analysis or NaN values handling. The objective of this part is to understand what the data represents (continuous values, discret values, categorical values etc).
Once SageMaker understood data, it can select what it calls candidates. Each candidate is a combination of a preprocessing script and a Machine Learning Model. SageMaker will select a bunch of candidates based on the observation it has done on the data. The Machine Learning models can be for example xgboost or logistical regression. Preprocessing scripts are combinations of previously developed preprocessing strategy such as RobustImputer (which converts missing values to nan), ThresholdOneHotEncoder or RobustStandardScaler (all of those scripts can be found on Amazon SageMaker public Github).
Now that SageMaker have selected candidate, it will lunch some fine tuning tasks on those candidates. Basically it will simultaneously run models training with slightly different hyper parameters combinations.
One very interesting thing with Auto-ML is the model deployment. You can deploy your model in 1 click and AutoPilot returns an endpoint reachable via a REST API.. You can chose the type of instance you want to use and how many of them. This is where we see the real power of connected AWS services.
II. The use case : Machine Learning & tennis
You can easily find on the web AutoML example using really basic datasets or what we commonly call the “Hello World”s datasets (Titanic Dataset, MNIST, iris-dataset). The question is : how it works on real life use case. At Pronoo.ai we are working on sports results prediction such as tennis matches.
We achieved a huge data scrapping on tennis matches from 1970 to 2020 and records on the first 250 best tennis players. Today we have around 75000 matchs records compiled with both players stats.
Here is the dataset we use (see Github link for detailed dataset -> https://github.com/matrousseau/AWS-AutoPilot-on-real-life-use-case-) :
To evaluate model performance, we are training our models on all the matches before 2020, and we test our models on 2020’s matches. With ou self-made model et data pipeline we can reach 76.6% of accuracy on our test set. The goal here is to see if AutoPilot can beat that score or at least reach it.
III. AutoPilot without any preprocessing
For this first experience, we use our raw dataset, without any processing. So we don’t have to do the feature selection phase and we don’t have to deal with the features engineering phase either.
As you can see, the results are very surprising. The precision is almost 100% on the training and validation set (remember that with our own models we get 76.6% accuracy).
When we deploy the model and test with a test set, the accuracy is 50%. Even worse, when we make a prediction with training data the accuracy is 45%.
Our interpretation : For a large number of columns and no feature engineering, AutoPilot has difficulty converging models. It is therefore preferable to make your own feature selection of columns before the training.
IV. AutoPilot with processed dataset
Auto-ML didn’t succeed in understanding our scratch dataset. This is mainly due to the fact that all of the features are duplicated (as there is two players in a match). Let’s see if we can help SageMaker understand beter the problem.
To help SageMaker we will perform a bunch of hand made pre-processing :
- Remove useless features
- Convert columns to numerical values
- One hot encode categorical columns
- Compute difference between both players on their stats
We know have a much cleaner dataset :
With this version of the dataset, AutoPilot is able to build models that reach 73% (maximizing accuracy) and 74% (maximizing F1-score) of accuracy on predicting the games of 2020..
We can know try to add a scaller to our own pre processing to see if it helps Auto-ML increasing its score. The answer is no, actually it’s pretty much the opposite. Adding a scaller is not recomended as Auto-ML will most likely select a candidate with a Scaller inside. Scalling already scalled datas with a different methode will result in weird value distribution and bad results on our model.
Finally, it appear that AutoPilot is able to understand pretty well complexe problems. But hand crafted pre processing still have a valuable impact on results.
The use of AutoPilot also shows how important it is to use a human eye and business logic on feature selection. For the purely technical part, the tool is extremely powerful and saves a lot of time. It is therefore a tool to use when you have a very short time to market and a very good knowledge of your data.
Co-authored by Jérémie Basso (https://www.linkedin.com/in/j%C3%A9r%C3%A9mie-basso-498144165/)