Building Your Own AI Stand-Up Comedian

7 min readSep 8, 2021

How to Train a Character-Level RNN to Tell Jokes In the Style of Different Stand-Up Comedians

(Expletive Warning: Because the neural network was trained on transcripts from stand-up comedians, it outputs some expletives)

(Co-written with Sam Greene)

(Image from Vector Stock)

Text generation through neural networks is very widely used. Most times, when you’re interacting with a chatbot, that bot is trained on a neural network to output responses based on your inputs. As a fun experiment, we wanted to see if we could train a neural network to be funny and resemble some of our favorite stand-up comedians. We wanted to be able to input the beginning of a sentence and have the “robot comedians” fill it out. In order to accomplish this, we had to go through the following steps.

Data Gathering and Cleaning

When training neural networks for text generation, it is extremely important to have a lot of data so that the model can generalize well and learn from the inputs. So the first step was gathering data for our network. Using the python package “BeautifulSoup”, we scraped the full transcripts for 7 comedians’ stand up performances from scrapsfromtheloft.com. The seven comedians were: Ali Wong, Dave Chappelle, Jim Jeffries, Hasan Minhaj, Joe Rogan, and John Mulaney. Each transcript consisted of roughly 40,000–50,000 characters, which is on the smaller side in terms of data size for this type of task, but should yield solid results.

In terms of data cleaning and processing, we experimented with both clean and unclean datasets. The cleaned data consisted of the transcripts with all punctuation, numbers, and special characters removed, and the entire text was lowercased. Meanwhile, in the uncleaned data we left the transcripts completely untouched. We actually found that the model was able to produce better results with the uncleaned data, because it was able to learn (to a certain extent) where punctuation goes, when to use capital letters, etc. The next step, was to start building and training our neural network.

Modeling — TextGenRnn

To accomplish our goal of getting our AI to tell jokes in the style of different stand-up comedians, we had to build a character level recurrent neural network (RNN). We decided to use an RNN for this task because of the model’s ability to handle sequential data. In this case, the inputs to the model were a sequence of 40 characters, and the prediction target is the 41st character. So for each comedian, their model was trained on all sequences of 40 characters from their transcripts. Below, is an example of two model inputs for the Ali Wong model:

Each unique character is also encoded to be represented by a digit, so that the neural network can handle it. RNNs are often used for predicting this type of data because of something called “sequential memory”. Essentially this sequential memory occurs in the hidden state layer of the network, where that layer stores information from past inputs, and uses that information to make predictions for the next input. Thus, the network is consistently learning from past inputs and can make predictions in a sequence. Below is an illustration of how an RNN is setup compared to a more traditional feed-forward network.

(Image from AI Wiki)

In order to build this RNN we used a package called Textgenrnn, which allows you to easily build, customize, and train RNNs on any text. It also has built in functions for text generation. Using textgenrnn, we trained a char RNN with an embedding layer, two LSTM layers, a concatenation layer, and an attention layer on each of the comedian’s transcripts. The summary of the model, and the code used to generate it is below (Joe Rogan model):

textgen_joe = textgenrnn(name = 'joerogan_model')
textgen_joe.train_on_texts(joe_df_unclean,
                            rnn_bidirectional = True,
                            rnn_layers = 3,
                            max_length = 40,
                            rnn_size = 128,
                            dim_embeddings = 300,
                            num_epochs = 20,
                            gen_epochs = 20)print(textgen_joe.model.summary())

In this structure, the embedding layer converts each character into an encoded vector, then passes those vectors to the first LSTM layer with 128 neurons. LSTM stands for long short-term memory. So these layers don’t keep the past information entirely throughout the network, instead they slowly decrease the influence of the older information in place of the information from the newer inputs. Then, the results are passed to the attention weighted average layer, which weights the most important features in determining the output, and averages them together to then determine the final character prediction.

We trained this model on the transcripts for each comedian, and were able to save the weights of the models after training. This allowed us to then play around with some text generation, and hopefully get our AI to tell some sensible and funny jokes.

Results — Text Generation

After training, we were able to see our results by building a function that allows us to input the start of a sentence, and have every robot comedian complete the next X amount of characters. Here’s what our fake comedians had to say about American politics in 100 characters, and the texgenrnn code used to generate the text (warning: expletive language):

def all_comedians(prefix, max_gen_length):
  print('Ali Wong:') 
  textgen_ali.generate(1, prefix = prefix, max_gen_length=max_gen_length)
  print('Dave Chappelle:')
  textgen_dave.generate(1, prefix = prefix, max_gen_length=max_gen_length)
  print('Bill Burr:')
  textgen_bill.generate(1, prefix = prefix, max_gen_length=max_gen_length)
  print('Hasan Minhaj:')
  textgen_hasan.generate(1, prefix = prefix, max_gen_length=max_gen_length)
  print('Jim Jeffries:')
  textgen_jim.generate(1, prefix = prefix, max_gen_length=max_gen_length)
  print('Joe Rogan:') 
  textgen_joe.generate(1, prefix = prefix, max_gen_length=max_gen_length)
  print('John Mulaney:')
  textgen_john.generate(1, prefix = prefix, max_gen_length=max_gen_length)all_comedians('American politics are', 100)

We can see for the most part, our AI comedians are not very coherent. They are able to put together some words and sections of sentences, but don’t make a lot of sense. Although, we can still see how their responses somewhat reflect themes of what the actual comedians would say in their bits. We can see that AI Dave Chappelle tries to tell a joke about an African-American man and his place in American politics, while Bill Burr seems to think of American politics as having a bully like mentality. How about if we ask them to fill out a sentence about kids today:

We can see that our AI Dave seems to love kids, while AI Bill has the opposite view on kids, and I don’t even want to know what AI Ali is thinking….

Lastly, let’s see what happens when we ask them to start telling us a story about starting with “The other day”:

I guess AI Hasan saw something surprising the other day, AI Joe had a pretty rough day, AI Dave struggled with remembering someone, and AI Bill was frustrated about a movie, so he made himself an interesting sandwich.

Conclusion

There is clearly room for improvement in our models, as state of the art text generation models are able to form complete coherent sentences and interact with real humans (think chatbots). In further improving the performance of our models, we could use more data, add more RNN layers, experiment with more data processing and cleaning, train the model on longer sequences, increase the size of the LSTM layers, etc. All in all, though, we were able to see some interesting results thanks to the extremely easy to use textgennrnn package, and maybe one day someone will build an AI as funny as the world’s best stand-up comedians. For now, I guess human Dave Chappelle will do…