ދެން ބުނާނީ ކީކޭބާ؟

Predicting the next word in a sentence can be a bit of a challenge. One common way of doing this was using a text corpus and identify the most common n-grams/trigrams. The same can be achieved and can produce better results if a Machine Learning model is used, in this case a RNN.

source: https://www.superdatascience.com/the-ultimate-guide-to-recurrent-neural-networks-rnn/

Hochreiter and Schmiduber introduced the Long Short Term Memory networks, usually called “LSTMs” . This is a Recurrent Neural Network (RNN) that is trained using backpropagation through time and overcomes the vanishing gradient problem. LSTMs are widely used for speech recognition, language modelling, sentiment analysis and text prediction.

RNNs are capable of a number of different types of input / output combinations.

source: https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/recurrent_neural_networks.html

You can read more on RNN here and here.

A Quick Implementation of Keras LSTM TimeDistributed for Thaana Next Word Prediction.

For this I used 10,000 lines of Dhivehi news headlines as the training dataset.

If you want to get better accuracy on the output, you can use a bigger dataset(data.txt). Since I don’t have that much computing power on my MacBook Pro, I used Google Colab to train the model using TPUs. However since there is a restriction on the memory limit, I was not able to train on the full dataset.

The other challenge was Colab only gives free 12 hours of processing per session, so I had to make sure the model was a trained in that time frame or less. Meaning the dataset had to be small and model needed to be optimised. If I ran the training locally (with the full dataset) on the MacBook Pro (on CPU), I guess it would take me a few months or more to get it trained.

The Model Architecture

What we have is basically a sequence learning problem. The model setup is a multi layer LSTM in Keras with 4 LSTM layers. The input to the LSTM is the last 3 words and the target for LSTM is the next word. The next 3 layers are TimeDistributed on the output layer to wrap a fully connected Dense layer. This layer allows us to build the model that can do one-to-many and many-to-many architectures. This is because the output function for each of the “many” outputs is the same function applied to each timestep.

The final layer of the model is a softmax layer. This predicts the likelihood of each word.


Code — TPU Version

Next Step

Explore alternate model architecture that allow training on a much larger dataset to produce better results. You can look at some of these strategies in the paper “Strategies for Training Large Vocabulary Neural Language Models”- https://arxiv.org/abs/1512.04906

Converting the model to tensorflow.js for web deployment.

Further Reading

TimeDistributed in the Keras API

Understanding LSTM Networks

A Beginner’s Guide to LSTMs and Recurrent Neural Networks

HowTo Start Using TPUs From Google Colab in Few Simple Steps