Next Word Prediction Using ALBERT

3 min readAug 30, 2021

Detailed Implementation using ALBERT

ALBERT is not designed for next word prediction . Still I tried to approach it as a multi class classification model .This is in continuation with the main article Next Word Prediction using Swiftkey Data

Feature Development

There is a certain format in which BERT models take input . It takes in 3 inputs .

Encoded Data : In this type of input , the sequence of words sent as input are tokenized and encoded using ALBERT tokenizer . Also as it is a classification based approach , a [CLS] should be added at the beginning of each sequence .
Masked Input data : When we give data to any version of BERT model , we need to specify max_length of the input . If length of the sequence of words is less than the max_length then we have to pad it . The mask allows the model to cleanly differentiate between the content and the padding. The mask has the same shape as the encoded data , and contains a 1 anywhere the encoded data is not padding.
Input Type :Generally BERT is used for next sentence prediction where 2 sentences are given as input . So in this input data , the non-padded region contains a 0 or a 1 indicating which sentence the token is a part of. But here there is only sentence , so it will have all 0s .

The labels ie. the next words are one hot encoded .

Code Sample

Model Development

I have fine tuned ALBERT base model to make it a classification model using my data . I will discuss the architecture in detail in the following sections .

Why did I use ALBERT instead of BERT ?

ALBERT Base has much lower parameters as compared to BERT because of cross-layer parameter sharing . As a result , it will take much lower memory and as I had memory constraints so this was the best option .

Architecture

The data generated above is sent to the ALBERT Model . The pooled_output generated from ALBERT is sent to a Dense layer with 300 neurons which is finally sent to the output layer .I then trained it for 10 epochs .

Code Sample

Github Link : https://github.com/kurchi1205/Next-word-Prediction-using-Swiftkey-Data/blob/main/Al-BERT%20Model.ipynb

Test Results

Test Loss : 2.88

Test Accuracy : 0.18

The test loss is quite low , which is good but coming to next word predictions , the results are not coherent .

Next Word Predictions

I suppose ALBERT is not suitable for next word predictions . So I had to moved on GPT for next word predictions . For detailts , please refer to the main article .