Using Machine Learning to (Poorly) Write Video Scripts

Jen Person
7 min readSep 15, 2022

All it took was an LSTM text generator on TensorFlow.js.

Machine learning has so many real-world applications. From protecting the Great Barrier Reef, to optical character recognition, to MRI brain segmentation, the possibilities are just about endless! I am grateful to be in a role that allows me to show developers how they can use machine learning for the web to build amazing things. With all that being said, I also get to build fun projects that would be completely pointless if they didn’t help me showcase machine learning. This is one such project. I had a blast and learned a lot, and I hope by the end of this post, you will too.

TECH WOMAN Gets A Stupid IDEA, What Happens Is Shocking

There is something delightful about generative models. I love the results of generating images using GAN technologies. For example, Laurence Moroney’s Stargate script is super inspiring. I decided to try my hand at generating my own text using TensorFlow.js. With TensorFlow.js, you can run existing JavaScript models, convert TensorFlow and TensorFlow Lite models to run in the browser or backend, retrain existing models with your own data, and train your own models. TensorFlow.js has an amazing LSTM text generation example on GitHub using Node.js. LSTM stands for long short-term memory. It is a type of neural network designed for sequence prediction. It can be used for prediction or generation, and is often used to generate text like in the TensorFlow.js example linked above. The original TensorFlow.js example started with a list of four text sources from which you could train. Thanks to a codebase contribution from Clemens Wolff, you can now supply your own text to the generator.

So what great works would I use to train my model? There are seemingly endless possibilities for choosing an accomplished author with a body of work large enough to train my model. But of course, as the title of this post suggests, I decided to try to generate a Dhar Mann script. Well, that’s not entirely true. First, I decided to try to generate some Dhar Mann video titles. Here are some examples of real video titles:

Like most content on YouTube, the titles are designed to grab your attention. I won’t go into further detail about Dhar Mann’s videos here, so check them out for yourself and see if you agree that a title generator would be cool. And if you like YouTube commentary, Jarvis Johnson does a fun streaming series called Dhar Mann Bingo.

Long story short, I tried to train a model to generate YouTube video titles. Here’s how that went:

A Developer Relations Engineer TRIES PLAN A, She Instantly Regrets It

First, I had to scrape all of the titles for Dhar Mann’s videos. I used the YouTube Data API for the task and to be honest, I used such a stupid method to get the titles from the data that I don’t want to share it. I’m certain that you could come up with a better way to do it, and if I tell you how I did it, the stupidity of my idea is so potent, it might just dissolve your better knowledge of how to do it.

Once I had all of my titles assembled in a single text file, I got to training. The LSTM model example has a couple of commands you can run to get going. For example:

yarn train dharmann-titles.txt \--lstmLayerSize 128,128 \--epochs 120 \--savePath ./dharmann-titles-model

The above command trains a model using the data in the dharmann-titles.txtfile. In addition to the lstmLayerSize and epochs parameters, you can also specify the following:

  • sampleLen: the length of the input sequence to the model, in number of characters.
  • sampleStep: how many characters to skip between one example extracted from the text data to the next.
  • learningRate: Learning rate to be used during training.
  • examplesPerEpoch: Number of examples to sample from the text in each training epoch.
  • batchSize: Batch size for training.
  • validationSplit: Validation split for training.
  • displayLength: Length of the sampled text to display after each epoch of training.

As you can see, there are a number of variables that you can manipulate to customize your model training. At first, my loss was a little higher than I would have wanted. I ended up with results like this:

Actually, some of these aren’t half bad!

I found that by changing the learning rate from a default of 0.01 to 0.001 while lowering the batch size, I was able to get a model that performed with an accuracy loss of less than a tenth of a percent! Great, right?

Well, you already know that it wasn’t great since I ended up pivoting to a different text base. I found that any text generated from my model directly matched text found in the sample on which it was trained, even when I set the temperature to 0.99. In this context, temperature is the value of randomness introduced to the text generation, where the value is between 0 and 1. A higher value creates results that deviate further from the original text corpus. Here is an example that my new model produced:

I don’t know what happened with that first generated result. Wow, it’s wild.

With the exception of the first line, the remaining titles are exactly how they appear in the text. I thought perhaps the issue was that the seed text used to generate new text was too long and therefore leading to a particular result. I made some ugly modifications to the example code so I could set the seed text to simply “Boy”, thinking it’s a word that shows up in enough titles to possibly generate something more random. It was random, alright!

I couldn’t understand why my model generated either the exact existing text, or complete gibberish. Thankfully, I was able to talk to the creator of the code sample: Google engineer and Deep Learning with JavaScript co-author Shanqing Cai. While Shanqing has since moved to a different team at Google, he was nice enough to meet with me and help with my roadblocks. I think it’s really awesome that the person who literally wrote the book on TensorFlow.js was willing to patiently listen to my queries and respond at my level of understanding when I no doubt sounded like a true novice.

Shanqing concluded that my model was likely overfit. At the time of writing, I had 651 video titles in my text corpus. That isn’t nearly enough for the model to properly train. The good news is, new videos come out every week though, so perhaps in a couple years I can revisit the original project. I can’t wait to compile all the video titles again.

Once I concluded my text sample was too small, I pivoted my idea to a larger body of text: Dhar Mann video scripts.

COMPUTER Lady Tries ANOTHER IDEA. What Happens Next Will Shock You

For Plan B, I first had to gather some scripts from Dhar Mann’s videos. Unsurprisingly, I did this another stupid way and I’m not going to explain it because it would probably give you a headache. I did learn firsthand that you can expect the data preparation step to take longer than any other part of model training. For this LSTM text generation or any other model, it is unlikely that the data will be ready to process out of the box. Plan on spending a lot of time reformatting and validating your data.

After gathering several transcripts and formatting their text, I was ready to train again. It took several runs to find the correct parameters to optimize my model. I ended up with so many models that I numbered them by the date. And then I was numbering them by the date and the count of that model for the date.

That JEN PERSON Maybe SORT OF Figures It Out. She Doesn’t Regret It.

Finally after much tweaking, I ended up with a model that I decided was satisfactory. If you would like to try my model yourself, you can clone the tfjs-examples GitHub repository and download my model from its repository. Add the model to the lstm-text-generation folder, and then run the following commands:

yarn gen dharmann-scripts.txt ./dharman-scripts-model/model.json \—-genLength 250 --temperature 0.6

Then you too can have results like these:

For some reason it’s funny to me that the word sushi made it in there.

As you can see, this model alone doesn’t write a cohesive script, but it did give me a chance to find out what goes into training an LSTM text generator. I’d love to see what body of text you would use for text generation. Please try out the example for yourself using the text of your choice, and then share your results with me.

Reader Thinks PROJECT IS POINTLESS, What Happens Next Is Shocking

Obviously, my generated script isn’t going to win an Oscar. But I still count this project as a success because I made something that brought me delight. If you’re new to the world of web ML, I recommend you also create a pointless project for the sake of enjoyment. Here are some great resources to get you started on your own passion project:

Be sure to share your projects with me in the comments or on Twitter. I can’t wait to see what you build!

--

--

Jen Person

Developer Relations Engineer for Google Cloud. Pun connoisseur.