Git Gud — Chapter 9

matteia
11 min readJun 26, 2024

--

Perfectspotting

In the previous chapter, we have observed that subtle differences do exist amongst the patterns of groups of records created by different matchmaking algorithms. The next step and possibly a more intriguing approach to investigating the patterns would be to attempt to determine which matchmaking algorithm was used on an individual record basis. In other words: ‘Given a record, is it possible to build classifiers that recognise what was used ?’. If so, some of the questions that follow would be: ‘how possible is it (or how well does it perform) ? Can it serve as the primary tool for such a purpose ? How complicated(eg. deep) would an ideal model be ? What are the limitations ?’

What are we classifying ?

- PerfectMM, NearPerfectMM, StreakMM and Live Data

Initially, there are four classes of matchmaking algorithms that need to be identified by a classifier, namely PerfectMM, NearPerfectMM, StreakMM and Live Data. We may start by building models that attempt to correctly separate the first three types(PerfectMM, NearPerfectMM and StreakMM).

The data used for training will be records of 300 games per player. Each record will be loaded into a tensor that is 15 rows X 20 columns similar to how images in previous chapters were 15 pixels high and 20 pixels wide. Then, each sample is to be fed into a neural network model for it to decipher which matchmaking algorithm was used.

Two widely known general models will be used for several different purposes. The first ones will be simple neural networks and the latter ones convolutional neural networks, with a few adjustments and variations added for experimentation.

Performance Metrics

- Accuracy, Precision & Sensitivity

The result of each testing by a model will be organised into a confusion matrix. It will then be examined to provide guidance for further experiments and modelling. The metrics mostly utilised will be accuracy, precision and sensitivity.

The First Neural Networks

- First Basic NN

This neural network is a simple basic model, hence the name “First Basic NN”. It has only one hidden layer between the input layer and the output layer. The input size is 300(=15x20) and its hidden layer has 120 nodes. The ReLU activation function was used for the output of the hidden layer. For fair comparisons, every basic neural network model will share this structure.

- PerfectMM vs NearPerfectMM vs StreakMM

The purpose of this model was to verify if the three artificially made patterns could be correctly identified. The model was trained for various epochs for comparison.

The confusion matrix above shows that the model (First basic NN, 10 epochs) is 65.89% accurate. However, a closer inspection reveals that this value is misleading. The extremely high rates for precision and sensitivity (0.968 and 0.99, respectively) and its ability to correctly decipher StreakMM patterns remarkably better inflates the overall accuracy of the model. In contrast, the model performs poorly when attempting to separate PerfectMM patterns from NearPerfectMM ones.

As in the table above, the model in question was trained for 50 epochs in order to check if a higher number of epochs would change the result. Unfortunately, the overall accuracy fell to 55.6% as the model somehow lost its ability to even identify StreakMM patterns from the rest. Also, NearPerfectMM patterns and PerfectMM patterns still seem to be too challenging for the model to recognize the difference.

The last confusion matrix table contains the results by the model when trained for 100 epochs. The overall accuracy rose again to 64.2% but for the same reasons as shown in the first confusion matrix. The trend of not being able to correctly identify patterns from NearPerfectMM and PerfectMM seems to continue. This is understandable as we have seen in previous chapters that their collective statistics are extremely similar as they were designed to be.

Accepting this fact naturally leads to being curious whether our model architecture is indeed capable(or how much more capable) of picking out the right StreakMM patterns from the rest.

- PerfectMM & NearPerfectMM vs StreakMM

The model in this section was designed to correctly identify StreakMM patterns from the rest. Patterns from PerfectMM and StreakMM were merged into one group and labelled the same for the model to train on. Again, the model was trained for various epochs for comparison.

As demonstrated in the confusion matrix above, the model is 98.74% accurate. So far, this is a clear sign that it is possible for our model to exploit the distinct differences in the two types of records. In addition, precision and sensitivity scores are substantially higher(all above 0.95) than the values from contrasting the three artificial patterns separately. However, the model was trained for more epochs.

Even after 300 epochs, the model performs in a similar fashion(slightly better) as every score calculated is approximately 0.99. This suggests that we should always combine the two matchmaking algorithm samples into one label(class) before continuing with our modelling. The next model should help us consolidate this belief.

- PerfectMM vs NearPerfectMM

The purpose of the model in this section was to determine whether the two classes, namely PerfectMM and NearPerfectMM, should be combined into one class. If the model does not even achieve the minimum level of performance, it should be safe to do so.

As observed in the confusion matrix above, the model, trained for 10 epochs, does not seem to be able to coherently decipher anything. The precision and sensitivity scores, along with the overall accuracy suggest that the model may be randomly guessing the class of each sample.

This table also displays almost identical traits. The accuracy is near 50% where there are only two classes to predict from, given an almost equal number of samples for each class. Precision scores have not changed much when compared to the values after 10 epochs. However, for sensitivity, it almost appears as though the scores have swapped places. This may indicate that the current model and its architecture cannot appropriately handle samples from the two matchmaking algorithms being discussed.

Therefore, from here and below, we will merge samples from PerfectMM and NearPerfectMM into one class, named PerfectFam(Perfect Family), and label them the same accordingly.

- PerfectFam vs StreakMM vs Live Data

The purpose of the model in this section would be to identify the three classes: Live Data, PerfectFam and StreakMM. Again, the model was trained for various epochs for comparison.

The model to classify the three matchmaking algorithms demonstrated 66.86% accuracy. However, this metric is also misleading for the same reasons we have experienced. Extremely high number of true positives, high sensitivity and precision when handling samples from StreakMM make up for the poor performance when dealing with samples from Live Data or PerfectFam. Precision and sensitivity scores for Live Data and PerfectFam are also not high.

As shown in the table above, the model, trained for 100 epochs, does not necessarily perform much better. It is interesting to see that precision and sensitivity scores for Live Data and PerfectFam seem to be in a trade-off relationship as a rise in one can be observed with a fall in the other. However, it is worth noting that, even when Live Data is added to the mix, the model is still able to consistently separate the StreakMM samples from the rest.

Convolutional Neural Networks (2D)

- PerfectFam vs StreakMM vs Live Data

The convolutional neural network(CNN) used in this section was designed to correctly identify the three classes(Live Data, PerfectFam, StreakMM). Unlike the first basic NN, the data of a sample will be fed not in one dimension(1D, 300) but in two dimensions(2D, 15 x 20). It has three 2D convolutional layers, two max pooling layers (2 x 2), two fully connected linear layers and one dropout layer. Each convolutional layer has a 3 x 3 kernel. The first two convolutional layers are followed by a max pooling layer, halving the width and height. The last three layers are the dropout layer followed by the two linear layers. ReLU was mainly used for the activation function.

After training for 10 epochs, the accuracy of the model is 67.45% but a familiar trend is still apparent. However when dealing with Live Data and PerfectFam, precision and sensitivity have become more stable as the trade-off relationships seem to be weaker.

This model, trained for 300 epochs, also does not seem to perform any better as accuracy, precision and sensitivity have all decreased. We may need some adjustments to the model architecture.

Convolutional Neural Networks (1D)

- PerfectFam vs StreakMM vs Live Data

The CNN used in this section was designed for the same task as in the previous section. The difference here was that it receives data in one dimension (300) instead of two as the first basic NN. The five consecutive convolutional layers are also 1D. Each convolutional layer, except the last one, is followed by a 1D max pooling layer of size 2. Then the last four layers consist of alternating dropout layers and fully connected linear layers. ReLU is the main activation function.

The confusion matrices above for this model, trained for 50 epochs and 300 epochs, respectively, show that the current model architecture shares the trend. However, the precision and sensitivity scores for Live Data and PerfectFam seem to show signs of improvement as more of them approach the 0.5 mark.

Convolutional Neural Networks (1D) + LSTM

- PerfectFam vs StreakMM vs Live Data

This CNN is almost identical to the one in the previous section but has one LSTM layer in between the two fully connected linear layers.

The first confusion matrix suggests that the model, trained for 10 epochs, does not perform significantly better than all the previous models, despite the 70.84% accuracy as sensitivity when handling Live Data is below 0.4. However, when the model was trained for 100 epochs, it reached 0.518. Apparently this was at the expense of other scores(Live Data precision, PerfectFam precision, PerfectFam sensitivity)

Interpretation

We may give a record to a classifier mainly for two reasons, one not necessarily excluding the other. The first reason could be to determine what matchmaking algorithm was used to produce such a pattern because it is not yet known. The second reason could be to verify if the classifier is identifying correctly with the knowledge of which matchmaking scheme is being applied. Although guaranteeing the effectiveness of a model requires all performance metrics to be above a certain threshold, for the former, precision is the metric we should prioritise on. For the latter reason, sensitivity would be our key value.

In the first case, precision for PerfectFam(Precision 1) is almost always higher than precision for Live Data(Precision 0). This suggests that the model is more confident in its claim when it predicts the class to be PerfectFam. In the second case, sensitivity for PerfectFam(sensitivity 1) is almost always higher than sensitivity for Live Data(sensitivity 0). This may mean that the model knows about the characteristics of the Perfect Family better than it knows about those of Live Data. In short, for the model, the Live Data samples are more confusing as they sometimes resemble PerfectMM or NearPerfectMM and sometimes not.

In almost every confusion matrix, the precision score for the class ‘Live Data(labelled 0)’ has proven to be the hardest value to improve. This is due to the fact that most models predict a sample to be from that class when in actual fact it is from Perfect Family(PerfectMM or NearPerfectMM) more frequently than it is ideal. However, in the case of sensitivity score for the same class, it is slightly more feasible to improve. Various models, when we do know that the given sample is of the Live Data class, will often classify them correctly above 50% of the times.

What does this mean ?

When classifying records on an individual basis, it is more challenging to determine which matchmaking algorithm was used. More sophisticated models that require heavier computation may be able to perform better. This is evident as we have seen that adding adequate layers appropriately enhanced performances in exchange for more resources for training.

However, even these models will eventually plateau. The main reason would be that any pattern is possible by any matchmaking scheme. For example, StreakMM can theoretically yield a record of 300 games where a win and a loss alternates for its entirety. The only issue here would be that this is extremely improbable (yet possible). Or in another inconceivably impossible case, PerfectMM can produce a record of 300 wins despite the probability of this happening being 1/2300 or 4.9090935x10–91.

For this reason, too big of a dataset will result in more samples that are identical or almost identical across two or more classes. This will confuse a model in its training process and may knock it off the right track(s) while the training(or validation) loss is being minimised. If more than one class share similar traits, as shown when comparing Live Data and the Perfect Family using their collective statistical test values, it may mean that they possess numerous similar patterns. This is somewhat proven as StreakMM, a scheme that has clear distinctions in its collective statistical test values that separate it from the others, is almost always ‘classifiable’ by almost every model.

In addition, somewhat related to this, it is also interesting to note that a higher number of epochs will often degrade the performance of a model. More epochs means that the model being trained will be more exposed to duplicate or near duplicate samples with different labels. This will only create more confusion for the model, forcing it to resort to mere random guessing when it sees a sample in which it finds no distinct features for it to judge on.

Next: Chapter 10

Previous: Chapter 8

--

--