Evaluating Vector Predictions — The 2 vs 2 Test

Rohan Saha
Samur.AI
Published in
5 min readDec 31, 2021
Photo by Nuno Antunes on Unsplash

In this post, we are going to discuss an evaluation metric called the 2 vs 2 test. Although there are no formal prerequisites, it would be helpful to have a basic understanding of how a supervised machine learning model is trained. The easiest way to understand the concept would be to start with a problem.

When to use this metric?

Especially when the predictions from a machine learning model are vectors or a list of real numbers. It’s generally used to assess the quality of word vectors (such as those from Word2Vec or GloVe). We will use the Word2Vec model in this article as an example.

Task

Imagine we have some input data (X) and some output data (Y), and we want to train a supervised machine learning algorithm on X to predict Y.

X can be from any data source, but for simplicity let’s consider it to be having N samples and 20 features. We can represent this in the form of a matrix of dimensions N x 20. The output Y can be of any dimension but since we are going to use Word2Vec, the output vector is 300-dimensional long. So for each input sample of length 20, we have one output sample of length 300.

We divide our input data X into X_train and X_test, and divide Y into y_train and y_test. For our explanation, it really doesn’t matter what X_train and X_test are as long as they are some data on which a machine learning model can be trained on. For the output y_train, we will use word vectors from the Word2Vec language model. Then we use X_train and y_train to train a machine learning model. We won’t go over the training process here to keep our focus on the metric, but let’s imagine we already have the trained model.

So now we have an ML model that is trained on some data X_train to predict Word2Vec word vectors y_train.

To actually demonstrate the 2 vs 2 test, let’s obtain a few vectors from the pretrained Word2Vec. We will call this y_test and these represent the ground truth vectors using which we will evaluate how good our predictions are.

Let’s retrieve the word vectors for the following words: {‘baby’, ‘cup’, ‘google’, ‘apple’}

I’ve converted y_test into a numpy array because of we want to use some inbuilt functions in our 2 vs 2 function.

Also, after the model training process, we evaluate our model performance using X_test, which gives us some vector predictions. We’ll call the set of predictions as y_preds. Again, for simplicity, let’s create some sample predictions using numpy.

Now that we have y_test and y_preds, how do we assess the quality of our word vector predictions? This is where the 2 vs 2 test can be useful.

The 2 vs 2 test

Before actually understanding how the 2 vs 2 test works, let’s look at an alternative.

Cosine distance (or cosine similarity) is a popular metric to assess the quality of our predictions. In general, lower the cosine distance, the closer our vector predictions are to the original vector in the n-dimensional space. Cosine distance is great when we are predicting word vectors. Using cosine distance on word vectors is useful because it tells us whether the words are synonymous or not synonymous.

But the 2 vs 2 test gives us a ability to turn the cosine distance metric into an accuracy measure. Which is more easily interpretable.

Okay, here’s how the test works.

  1. Choose two vector predictions and two corresponding ground truth vectors.
  2. Calculate the cosine distance between the matching pairs of vectors and cosine distance between the non-matching pairs of vectors.
Formula for the 2 vs 2 test. The left side of the ‘<’ sign signifies the sum of the cosine distances of matching pairs of vectors, whereas the right-hand side of the ‘<’ signifies the non-matching pairs of vectors. ‘i’ and ‘j’ are the two choices of ground truth vectors and the corresponding predictions are represented with a cap on top of the letter ‘y’.

3. If the sum of cosine distances of the matching pairs of vectors is less than the sum of the non-matching pairs of vectors, then the test passes and you get a point, else, the test doesn’t pass.

4. Repeat steps 1 to 3 and get an accuracy measure by adding up all the points (where the test passed) and dividing it by the total number of tests (where the test passed and fail).

To make things simpler, here’s a visual of the procedure.

The 2 vs 2 test. Green lines show the calculation of a cosine measure between matching pairs and the red dashed lines show the cosine measure between the non-matching pairs.

So now putting everything together, let’s use our y_pred and y_test to obtain 2 vs 2 accuracy.

Here’s a python function that implements the 2 vs 2 test.

Python code for the 2 vs 2 test. Make sure y_test and y_preds are numpy ndarrays.

The first value returned denotes the number of times the test passed. The second value denotes the total number of tests conducted and third value is the ratio between the first and second value (or the 2 vs 2 accuracy).

In the function, we use the cosine distance metric to get a measure of how good our predictions are. If we look carefully at the function, we see that the 2 vs 2 formula is performed for all the possible pairs of vectors. So in our case, the pairs are (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3,4), a total of six pairs. Note that we aren’t including pairs like (2,1) and (4,1) because these are duplicate pairs.

And that’s it! A quick and simple introduction to the 2 vs 2 test, which is a simple yet effective method.

To conclude, the 2 vs 2 test is a great way to evaluate vector predictions and obtain an accuracy measure; can be especially useful when working with word vectors from language models such as Word2Vec.

If you like my writing and it’s valuable to you, consider buying me a coffee :)

--

--

Rohan Saha
Samur.AI

I write about byte sized articles on machine learning and how to survive academia.