It’s Alive! Rescuing Dead Text from those Pesky “Annals of History”
By quickly generating poetry with textgenrnn
More than once, I have waxed philosophic about a future in which it is regular practice for a human and machine to work together to produce writing. I tend to daydream of a human writer tinkering in a shop with several of their mini-robot companions. A mix of Ada Lovelace and Mary Shelley, with a dash of Gepetto, writing and crafting and generating in an alternate universe.
For now, I’ll settle for customizable recurrent neural networks in my (fingers-crossed) disease-free, isolated office. On my not-exactly-cute laptop, created in the omnipresent, invisible cloud.
Simplistic background definitions
For those tl;dr’ers (pronounced tulldurrers) out there, skip to the Do it yourself section.
Recurrent Neural Networks (RNNs)
Recurrent neural networks (RNNs) are a type of neural network that work very well for processing language. The crucial difference between traditional neural networks and RNNs is that vanilla neural networks treat inputs independently, whereas RNNs treat inputs sequentially. That is to say, what an RNN learns from one input will affect what it learns in the next input. Or… an RNN will use its output from each input to help it train on each additional input. It sounds circular because it basically is.
And this makes it ideal for natural language processing tasks because language is indeed sequential. Certain words are more likely to follow other words and certain characters are more likely to follow other characters. An RNN can discover these relationships and probabilities of usage because it learns every step of the way, remembers what it learned before, and utilizes that memory.
Long short-term memory (LSTM)
In theory, this sequential learning and memory could happen over all of the inputs, but traditional RNNs tend to run into an issue known as exploding gradients. This occurs when an RNN learns so much that it literally blows its own mind. Okay, maybe not literally. But basically. Error gradients pile up and cause the model to become unstable, but there’s a solution!
LSTMs are a type of RNN that are able to hold information without it constantly affecting everything. The memory within an LSTM is sort of held off to the side and the model can selectively forget and remember what it’s previously learned, based on what it perceives to be important.
But enough of trying to simplify complex topics, let’s get to the fun stuff!
textgenrnn
RNNs have been widely used to predict and generate text, but they can be difficult to set up, and tuning them can be a bit opaque. Luckily, some people in this world are very nice and willing to make it easy for you at zero cost.
Introducing Max Woolf’s textgenrnn: a simple, usable, and re-trainable RNN that can start generating text based off of a source text of your choosing with only a few keystrokes and a bit of time. It’s built on top of Keras and TensorFlow and uses character embeddings to generate text.
Character embeddings
That’s right! You’ve heard of word embeddings and Word2Vec. But did you know that you can apply the same logic to individual characters? I sure didn’t.
textgenrnn creates character embeddings, or high-dimensional (it defaults to 100-dimensional) vectors, and then throws those values into multiple LSTM layers. And yadda-yadda-Attention-Layer-yadda, it outputs the averages of the weights of probabilities that each character will be the next one in the sequence. It uses these probabilities to generate text that is similar to its source, in both formatting and spirit.
Word-level model
It is also possible to train a “word-level model”, although I’m not sure how exactly it differs from the default setting. It’s possible that it tokenizes the source text into words, but at first glance, it appears to simply use a smaller window size (which corresponds to the max_length
setting in the model). For example, a max_length
of 20–40 is recommended for the character-level model, whereas a max_length
of 5–10 is recommended for the word-level model.
A corpus of sufficient size is necessary for this. I tried using a smaller corpus of my own writing and was met with a lot of white space and repetition, which honestly, might say more about my writing than it does about the model.
Do it yourself
So here’s where those simple keystrokes (or clicks) come in. If you’ve used Google Colab before this will be easy as pie. If you’ve used Jupyter Notebooks before, this will be easy as a different pie. And if you’re new to it all, this will be as easy as, you guessed it, ice cream cake. Not that much harder than pie. In fact, quite soft. And delectable.
Navigate to this page (you may have to click Open in Google Colab or something similar), click File and then Save a copy in Drive, then rename the file if you’d like (what monster wouldn’t want to rename the file??).
Cell #1
The first cell is:
%tensorflow_version 1.x
For you Jupyter die-hards, click in there and press Shift+Enter to run the cell. Or you can click the Play button. This cell just makes sure you don’t use TensorFlow 2, which can apparently cause some issues.
Cell #2
Run the next cell to download and import necessary packages. Don’t worry about any TensorFlow warning messages.
Cell #3
The next cell is where you can set parameters. If this is your first time running it, I’d recommend just keeping it how it is. For a second go, I would recommend trying out a word-level to compare with the default character-level model.
To train a word-level model, set: 'word_level': True
, 'max_length': 10
, and depending on the number of unique words in your corpus, I would consider increasing the value for 'max_words'
.
Everything else can stay the same, though there is a lot to play around with in subsequent runs, and the code is nicely commented, so you can read those for guidance.
Cell #4 (and upload a txt file!)
Here’s where you can adjust the name of your model, as well as point to the txt file you uploaded.
The file you upload is very important and should be formatted how you’d want your ideal generated text to look. If you want to generate scripts, upload a text file of one or more scripts that are formatted to look like scripts; if you want to generate poetry, make sure your text file is formatted like poetry — line breaks, titles, and all. The model is remarkably good at copying your source’s style.
To actually upload a file, go allllll the way to the left of the page and click the little folder icon, then drag the file from you local computer into that Files window. After it uploads, change the file by setting:
file_name = 'your_files_name.txt'
Run the cell!
Cell #5
This is the big one, so you want to make sure that you are connected to a Google Colab GPU. If you see the following in the upper right corner of your window, then you’re good:
If not, it should say Connect in that spot. Click that and it should change to the above image.
Buckle up, and run the cell! It will take a lot less time than running all this on your local computer, and depending on the size of the file you uploaded, it should take less than a half hour. You will see the training’s progress, so if it seems like it’s taking an unusually long time, you can always stop it, fiddle with the parameters or make sure you’re using the GPU, and run it again.
For me, I used roughly 4,300 poems of varying lengths, which took over two hours on my local computer and less than a half hour in the Google Colab. If you start increasing the parameters’ values, training time will most likely increase, but it’s always much more reasonable in the Colab notebook than on your local machine.
Warning!
Dang, should’ve saved that gif…
Word to the wise: Google Colab works through resource-sharing, which means there is always the possibility that your notebook (and the training of your model) can get interrupted if resources get allocated elsewhere. In my experience, this doesn’t happen often. But it’s always a possibility, so keep it in mind without worrying about it too much.
Cell #6
Nothing to change here for the first go round. You can generate very different results by changing the values in the temperature
variable. In my experience, values of 1.0 produce the most interesting results, but it always depends on your use case. Values closer to 0 may tend to mimic your source text more closely or even theoretically reproduce it.
In short, this cell generates text and saves it to a file. Run it!
Cell #7
No need to change anything here. This is where you save the actual model itself. Run it!
Using your new model
To use your model on your local machine, and continue generating text, move the files you downloaded in the previous cell into the same folder as a Jupyter notebook, and run the following code (be sure to swap out MODEL_NAME
):
from textgenrnn import textgenrnntextgen = textgenrnn(weights_path='MODEL_NAME_weights.hdf5',
vocab_path='MODEL_NAME_vocab.json',
config_path='MODEL_NAME_config.json')
textgen.generate_samples(max_gen_length=1000)
textgen.generate_to_file('textgenrnn_texts.txt', max_gen_length=1000)
Keep experimenting and have fun!
BONUS: Here are some choice generations.