When Janelle Shane Trains RNNs…

A collection of her best outputs from the endlessly entertaining blog ‘Lewis & Quark’

Jacob Younan
AI From Scratch
8 min readJul 17, 2017

--

Don’t know who Janelle Shane is? Neither did I until very recently stumbling upon this experiment by John Keefe in QZ:

Apparently, not many other folks know her either (somehow less than 2,000 Twitter followers; DECEMBER 2017 CORRECTION: 4,700+ now!) despite being featured this year on BuzzFeed, NY Mag, Gizmodo, The Atlantic and The Outline among many others. Do yourself a favor and follow her now @JanelleCShane.

Why?

You’ve heard of Recurrent Neural Networks (RNNs) before right? Following Janelle is a good way to see them in action.

She uses two versions of an open-source framework called char-RNN, to experiment with text-based training data. The RNN reads batches of characters in the training data examples trying to learn which characters are most likely to follow others. Once trained, it will attempt to mimic the patterns found in the data (i.e. generate a name after being trained on a list of them). Here’s how Andrej Karpathy explains his char-RNN:

This code implements multi-layer Recurrent Neural Network (RNN, LSTM, and GRU) for training/sampling from character-level language models. In other words the model takes one text file as input and trains a Recurrent Neural Network that learns to predict the next character in a sequence. The RNN can then be used to generate text character by character that will look like the original training data.

Following along with Janelle as she tinkers is a great way to develop some intuition about how these RNNs work. You’ll frequently see her…

  • Stop the model partway through the training set just to see it struggling to grasp fundamental patterns from an insufficient sample
  • Adjust the ‘temperature’ or creativity of the model, so that instead of selecting the most probable next letter (to prevent outright copying the training examples), it instead selects a somewhat less probable letter
  • Tweak the ‘sequence length’ or memory: this is the chunk of characters the RNN is capable of looking at together in a given training example to discern relationships between the characters

This all sounds great but the main reason you should read her stuff is that it’s not just informative…it’s hilarious! I was laughing out loud alone on my couch within minutes of reading her work. Below is a sample of what I’ve enjoyed most.

Greatest Hits

1. Paint Colors (Pt. 2)

You wouldn’t think generating paint color names would be fascinating, but that’s where we’ll start. Some background if you’re not reading her post:

  1. Each training example is both letters (paint names) and numbers (color codes) so the model produces both the name and matching color
  2. There are a few different ways to represent color numerically, including RGB, HSV, LAB; changing them has little effect on the output surprisingly
  3. Janelle iterated on her experiment with a part 2 that includes turning the temperature/creativity down a bit and bringing in a much bigger data set of paint names from multiple brands + some crowd- sourced from XKCD

This two part experiment ultimately led to a glorious list Janelle called the Hall of Fame:

Credit: Janelle Shane

What a list! Check out the entries in that last column: flipper, lemon nose, shy bather, spiced rope, dry custard. If there were a paint naming Turing Test, several outputs from the big data set model are passing.

Other personal favorites from the small data set include the accidentally hilarious bull cream, copper panty, rose colon, and shivable peach. For more results check out her full post.

2. Fortune Cookie Fortunes

These were bound to be great, thanks in part to the often cryptic nature of fortune cookie fortunes as it is. Combine that with a somewhat limited training set (unclear exactly how many examples), and magic ensues.

My favorite outputs embody a delightful model characteristic Janelle describes as “significantly more pessimistic than your average cookie fortune”:

Now is the time to go ahead and not prepare to live.
Never understand.
Never upset the friends
Love will diss your changes.
Hell! It’s the onset of a friendship
Do not have a peaceful place where you will feel better.
There’s no success and friendship.
You cannot love life until you live the life you don’t good luck.

3. Car Names

Put in 2000 car names (including brand) and the model can kind of spit out some useful iterations. Partway through training, Janelle notices a “Dracula phase, [where] it begins to try to spell some of the major car makes, but ends up (mostly) sounding sinister instead:”

Morcula Sapira
Mercult Sarii
Deriula Framno
Moroa 3-S
Darcult Sar
Mlrru Pie
Nunsgan Caakes

By the end, the model can churn out some basic names, but these clear misses will tickle you most:

Volkswagen Colon
Buick Shoat
Buick Crapara
Buick Apron
Fiat Deter
Fiat Coma
Fiat S-0-S
Fiat Doug

4. Pickup Lines

Based on Janelle’s description, this was bound to be a dumpster fire from the start given the limited availability of training data. Of course, dumpster fires can be entertaining and this was no exception. The list Janelle shared:

Are you a 4loce? Because you’re so hot!
I want to get my heart with you.
You are so beautiful that you know what I mean.
I have a cenver? Because I just stowe must your worms.
Hey baby, I’m swirked to gave ever to say it for drive.
If I were to ask you out?
You must be a tringle? Cause you’re the only thing here.
I’m not on your wears, but I want to see your start.
You are so beautiful that you make me feel better to see you.
Hey baby, you’re to be a key? Because I can bear your toot?
I don’t know you.
I have to give you a book, because you’re the only thing in your eyes.
Are you a candle? Because you’re so hot of the looks with you.
I want to see you to my heart.
If I had a rose for every time I thought of you, I have a price tighting.
I have a really falling for you.
Your beauty have a fine to me.
Are you a camera? Because I want to see the most beautiful than you.
I had a come to got your heart.
You’re so beautiful that you say a bat on me and baby.
You look like a thing and I love you.
Hello.

A great little illustration from one of Janelle’s readers, bob-artist, was introduced like this:

“ I can’t not picture these pickup lines being spoken by really pathetic robots.
So… I couldn’t resist:”

Melanie Ehrenkranz from Mic, decided to take the list of lines out for a spin on Tinder with some predictably excellent results. See link below:

A couple samples of men undeterred:

Credit: Melanie Ehrenkranz at Mic.com
Credit: Melanie Ehrenkranz at Mic.com

5. Knock Knock Jokes

This one sounded challenging for a couple reasons:

  • How big is the sample for knock knock jokes? 200 apparently
  • Can a model really replicate humor in a unique joke? Barely

With 200 results, you can imagine this didn’t really take shape until it trained on all 200 examples for a while. Janelle shows the model struggling to even learn the basic structure of the joke until the very end when something awesome happens:

And then.

It produced. An actual joke. I checked, and this one most definitely wasn’t in the dataset. There was one about Alec, but the pun was “Alec-tricity”. There was one about knock-knock jokes themselves, but it was “Irish I knew more knock-knock jokes”. And it didn’t produce this just once, but over and over again. I give you possibly the freakiest thing the neural network has done to date:

Knock Knock
Who’s There?
Alec
Alec who?
Alec- Knock Knock jokes.

6. ‘Diseases You Don’t Want to Get’ (Oct 1, 2017 Update)

Needed to update this piece with a recent favorite of mine. Turns out a list of 3,765 common names for diseases gets you a suitably terrifying output.

The best stuff here is from what Janelle calls the ‘second and third kind of diseases’:

[2nd] This disease doesn’t exist, and sounds reasonably convincing to me, though it would probably have a different effect on someone with actual medical training.

[3rd] Sounds both highly implausible but also pretty darn serious. I’d definitely get that looked at.

Here are ten examples I found either frightening or somewhat like a condition Madam Pomfrey might treat. There’s some overlap in those two categories...

  • Facial Agoricosis
  • Strecting Dissection of the Breath
  • Bacterial Fradular Syndrome
  • Loss Of Consufficiency
  • Hemopheritis
  • Joint Pseudomalabia
  • Hammon Expressive Foot
  • Clob
  • Cancer of the Cancer
  • Horse Stools

This is only six examples of several posted in the last few months, but these immediately give you some intuition about the limitations of an RNN given most publicly-available datasets.

If you’ve got a ton of data (see paint names list) you can generate something that could pass for human, especially if you incorporate thousands of crowd-sourced names packed with personality. When you’re working with very few training examples combined with a need for nuance or humor, you end up with spectacularly incoherent pick-up lines…that might even work depending on the recipient.

But the last example shows we shouldn’t write-off something like humor from an AI agent’s capabilities because we perceive it to be uniquely human. The ‘Alec’ joke may not be a gut-buster, but it works and it didn’t take much to get there. I’d be genuinely surprised if conversation histories from messaging apps or transcriptions of comedy specials couldn’t achieve genuinely funny results in the near future.

--

--