Detailed Implementation using ALBERT
ALBERT is not designed for next word prediction . Still I tried to approach it as a multi class classification model .This is in continuation with the main article Next Word Prediction using Swiftkey Data
Feature Development
There is a certain format in which BERT models take input . It takes in 3 inputs .
- Encoded Data : In this type of input , the sequence of words sent as input are tokenized and encoded using ALBERT tokenizer . Also as it is a classification based approach , a [CLS] should be added at the beginning of each sequence .
- Masked Input data : When we give data to any version of BERT model , we need to specify
max_length
of the input . If length of the sequence of words is less than themax_length
then we have to pad it . The mask allows the model to cleanly differentiate between the content and the padding. The mask has the same shape as the encoded data , and contains a1
anywhere the encoded data is not padding. - Input Type :Generally BERT is used for next sentence prediction where 2 sentences are given as input . So in this input data , the non-padded region contains a
0
or a1
indicating which sentence the token is a part of. But here there is only sentence , so it will have all0s
.
The labels ie. the next words are one hot encoded .
Code Sample
Model Development
I have fine tuned ALBERT base model to make it a classification model using my data . I will discuss the architecture in detail in the following sections .
Why did I use ALBERT instead of BERT ?
ALBERT Base has much lower parameters as compared to BERT because of cross-layer parameter sharing . As a result , it will take much lower memory and as I had memory constraints so this was the best option .
Architecture
The data generated above is sent to the ALBERT Model . The pooled_output
generated from ALBERT is sent to a Dense layer with 300 neurons which is finally sent to the output layer .I then trained it for 10 epochs .
Code Sample
Github Link : https://github.com/kurchi1205/Next-word-Prediction-using-Swiftkey-Data/blob/main/Al-BERT%20Model.ipynb
Test Results
Test Loss : 2.88
Test Accuracy : 0.18
The test loss is quite low , which is good but coming to next word predictions , the results are not coherent .
Next Word Predictions
I suppose ALBERT is not suitable for next word predictions . So I had to moved on GPT for next word predictions . For detailts , please refer to the main article .