Making an AI to Generate Text using LSTM Networks

Published in

Bucknell AI & CogSci

8 min readDec 6, 2018

By: David Schwartz, Dylan Zucker, Ryan Pasculano, Melanie Cheng

To explore and gain a better understanding of how to use Natural Language Processing to reflect human-level intelligence, we decided to create a text generator. Based on our introduction to text generation in an earlier class involving Markov Chains, we extended our knowledge by building a neural network to generate text. Previously, we had had exposure to recurrent neural networks from our AI and Cognitive Science class and chose to use a Long Short-Term Memory (LSTM) model for our network. The LSTM model allows for selective memory propagation, which allows for a model that more closely resembles the brain compared to other neural network models.

For this project, we wanted to build a neural network and train it on a particular author. This would result in text generation that would ideally mimic the word choice and sentence structure of the author it was trained on. We decided on the two best authors we could think of: JK Rowling and God. So for the training, we used sections of Harry Potter and the Deathly Hallows and The Bible. In hopes of creating a fully functional chatbot in the future, we figured that a text generator would be the first step.

We used Keras with the Tensorflow backend to implement our solutions. Originally, we wanted to use a character based model for the implementation but realized that it would be difficult to do with the allotted time given to us. Additionally, the model would also have to learn grammar, and there was the possibility that it would generate sequences of fake words. Character encoding makes the output to the network smaller overall because the output vector only has to account for the 26 letters and punctuation, but the grammar aspect would add a level of complexity to the network, making the middle layers much larger. One cool aspect of the character based model is that you can watch the network learn to spell as the training goes on and it continues to improve. The size of the network caused training to take an inconvenient amount of time relative to the time frame we were given for this project. We ultimately decided to use a word based model so that the output would be spelled correctly and training time would be reduced. The utilization of one-hot encoding allowed for the accurate mapping of words to numbers, which reduces the size of the vector but requires us to hit more specific output values.

We tried several structures for our model and tested them to see if the output made sense and if it was overtrained. Our final model ended up with a structure that consisted of two LSTM layers followed by two Dense layers of 150 nodes each. We trained this model with a batch size of 128 for 50 epochs. Here is an example of our model generating text based on The Bible. The model requires a sequence of 50 words to start so the beginning is actually from The Bible.

And God said, Let there be a firmament in the mids and god said unto him i am the lord tribe and they perished against the lord god not not respect unto him and said i will not do a man and a man said unto me i am the people and i will be my fathers and i will be my servants and he said unto him i am the lord which i have given thee and i will go out of the land of egypt and i will give thee a covenant of the lord and i will be thine and the lord hath said unto me i am the lord and said unto him i am the lord and the lord said unto me i am the lord your god which i command thee this day and the lord said unto moses go not up and i will give thee the land of egypt and i will give thee the land of egypt and i will give thee the land of egypt and i will give thee the lord your god and i will be thine and thy sons

RNNs are useful when there are multiple inputs or outputs that rely on each other. In the following picture, we have 4 different models. The first is a regular NN with an input, hidden layer, and output.

Figure 1: Types of RNNs. Image from http://karpathy.github.io/2015/05/21/rnn-effectiveness/

The one to many model is good for generating multiple outputs from a single input, such as generating a sentence to describe a single image. The many to one model is used when taking in multiple inputs and producing a single output, such as determining the sentiment of a sentence where each word is its own input and the output is the overall sentiment of the sentence. The many to many model, which we used in our project, has multiple words as the input and then generates multiple words based on the propagation of the network. RNNs are used in everyday life. For example, the predictive text feature learns what words you are most likely to use next and displays them at the top of your keyboard.

We used LSTMs in our model. There are three gates: forget, input, and output. Each gate is controlled by the current cell inputs as well as the previous activation of the cell, thereby making it context dependent. The forget gate is depicted below:

Figure 2: Forget Gate. Image from http://colah.github.io/posts/2015-08-Understanding-LSTMs/

It determines what proportion of the previous context to remember and alters the memory accordingly. The proportion is determined by an equation utilizing the previous context and current input. The next gate is referred to as the input gate.

Figure 3: Input Gate. Image from http://colah.github.io/posts/2015-08-Understanding-LSTMs/

The input gate works by adding to the cell’s current activation. Therefore the gate strengthens or inhibits the influence of the high level information that the cell represents. The last gate is the output gate.

Figure 4: Output Gate. Image from http://colah.github.io/posts/2015-08-Understanding-LSTMs/

The output gate determines how much context the cell should pass onto the rest of the network. It is important to note that all of the connections in the figures are weighted, so the gates function differently even if they look the same. To summarize, an LSTM cell: can forget the previous context, add new context, and determine whether or not to pass information on.

Figure 5: LSTM in action! Gif from http://harinisuresh.com/2016/10/09/lstms/

We used LSTM cells to handle long term dependencies. Our model has to generate text, but the last word in a sentence doesn’t always imply the next. For example, consider “I am an athlete so I play sports.” If you only looked at “I play,” a variety of words could be next, such as video games and chess. The presence of the word athlete implies sport, or a specific sport, should be mentioned. Using LSTM cells in our network allows the network to keep track of which words provide important contextual information in a sentence.

Currently, it is up for debate who is credited as the author of the writing: the network itself, its developers, or the writers who generated the training data. Our project demonstrates the importance of this issue. The first network we developed learned how to write by reading Harry Potter novels. The network learned both the content of the books as well as J.K. Rowling’s writing style. As such, the network sounds like Rowling and uses her characters. Therefore, Rowling was the only person who invented the content used to create the new text, so one could claim that she should receive credit as the author. At the same time, she had no direct involvement in producing the text generated by the network; we were the only people to put in effort to generate the new text, so we should be the authors. This is similar to the fact that another writer could be inspired by Rowling’s work without having to transfer credit to Rowling. However, that argument doesn’t delve into the issue of whether the model should be accredited as the author.

The issue of the network being accredited as an author comes down to the issue of whether it is viewed as a program or independent being. If it is considered as just a computer program, then it is no more than a tool constructed by the developers to make the text. Therefore, credit would be given to the developers. On the other hand, if the AI is viewed as its own being, then it should receive the same creative rights as a person, and as such be considered the author. The network we developed is very narrow in scope and doesn’t have the ability to make independent decisions. As such, it is not the author. The question of authorship — whether the author is Rowling, us, or both — doesn’t have a concrete answer and is left for the reader to decide.

Some issues that we ran into when developing the model was that we weren’t able to control what was written. The network could very well start in the middle of a sentence but continue to generate text. One way that we thought about addressing this issue was to provide a start word for the neural net and then allow it to run through the algorithm. Recurrent neural networks function differently from traditional Neural Networks because the LSTM models have loops by nature. This increases the size of the overall model, and therefore more computational power is needed to effectively train these networks because the input is a sequence rather than a single value. Overall, our text generator is one step in the right direction towards building a chat bot. We were able to learn a lot about RNNs, how their structure effects overall accuracy of the model, and current research about memory.

Feel free to explore our code at: https://gitlab.bucknell.edu/mdc025/AICogSciFinalProject

Disclaimer: This article may or may not have been written by a neural network. (Who really owns this article?)

Making an AI to Generate Text using LSTM Networks

Written by Melanie Cheng