How to Train your AI
You might have seen some of those hilarious twitter posts where a comedian ‘feeds’ movie scripts to a bot, which then spits out its own amazingly funny and bizarre take. Well, those weren’t quite real — the comedian curates and then artistically interprets the real results — but this is actually something you can do yourself fairly easily! No coding skills required!

AI sample quotes
By the end of this tutorial, you can expect your AI to drop such deep knowledge-morsels as:
The world is an infinite and sensible spider.
When asked about hackers:
Hackers — they’re not terrorists. They’re people who use drugs.
And Javascript for all my coders out there:
Javascript is a cross between a crossword puzzle and a chess match.
I dunno, I’m not that technical, why should I read this?
You should follow this tutorial if you:
- Are willing to learn how to use a couple of new tools like Chrome Developer Tools and VSCode (both of which we’ll use very minimally). It helps if you’ve done a bit of coding and have used the Chrome Dev Tools before but this is not required if you’re motivated!
- Find the idea of an AI trying its very best to come up with words of wisdom hilarious or interesting
- Want to lead the resistance against our inevitable robot overlords
Check.. check, and.. uh, check? Great! Here’s what this tutorial offers you:
- An AI overview, specifically of neural networks
- A quick tutorial on finding and formatting a dataset so that you can train an AI with it
- Steps to use Google Colaboratory to train and get results from your AI without having to write any of your own code
- Some extra challenges for you to pursue
This tutorial is very minimal on the coding and you won’t need to write any of your own. If you’re already familiar with AI and neural networks, you can probably skip this next section.
Note: for experienced devs who want to dig a bit deeper, consider heading over to Max Wolfe’s repo and taking a look!
Do I need to download/sign up for anything?
Not really, but if you don’t have a code editor/IDE installed, go ahead and grab VSCode for the purposes of this tutorial. We’ll basically only be using it to open up a JSON file and (optionally) manipulate some text.
Oh and you’ll need a Google Account with Google Drive enabled (in order to use Google Colab) as well as a Kaggle account to download the dataset.
What’s an AI and why does it need my training?
Artificial Intelligence is a hyped up and loaded term these days, but in essence it’s all about prediction. The more you train it, the better a neural network gets at being able to predict something, be it the next letter in a word, the next pixel in an image, or even tomorrow’s stock price (YMMV on that last one).
Think of your own brain — if you spend years reading millions of sentences about one topic, you would hopefully start to be able to reproduce some of what you’ve read (AKA predict a sequence of words). Except an AI doesn’t need years to learn; with the right hardware it’s more like hours or days. In other words, after some training you could give an AI half a sentence and it might be able to predict the second half in a semi-intelligible way.

Could you inadvertently create Skynet? Probably not, but you can use the power of neural networks to get a computer to try its very best to predict the next character in a sequence, hopefully producing some entertaining results along the way.
First things first, we need some data to ‘feed’ to our neural network.
Finding and Formatting your Data
Rather than movie scripts like in those popular tweets, let’s use some simple English quotes from Wikiquote.
Note: there are numerous ways to accomplish the following, and I’d love to hear about it if you’ve got a cool/better way to do it!
1) Download a dataset
There’s a JSON dataset for these over at Kaggle.com that should do nicely. Go ahead and download it to your machine.
2) Open the file
Next, open up the JSON file in VSCode and copy its entire contents:
3) Open up Dev Tools and create a snippet
Now open up Chrome Dev Tools by pressing F12 from any Google Chrome page.

Next click Sources and then snippets (you may have to click a little double chevron to see the snippets option.

Click New snippet.

4) Create an run our snippet
Type the following into your newly minted, blank snippet:
var quotes = Paste the quotes that you copied from your JSON file directly after the equals sign, like so:

It’ll be huge, like 50,000+ lines huge, so you’ll want to scroll past all the quotes. Then highlight the code in this box and copy it. Next, Paste it at the bottom of your snippet:
Run the snippet by pressing CTRL + Enter or Command + Enter (Mac); you should see the following in your console:

5) Copy the data and save it as a text file
Hit that handy Copy button at the bottom of the output and paste our newly textified quotes into a new VSCode file, then save it as wikiquotes.txt. Also make sure that you save it somewhere that you can find easily later.
Optional: Shuffle the Quotes
Shuffling the quotes can be helpful if you’re hoping for more random results. The reasoning here is that the neural network will pick up on patterns where large blocks of quotes from one person or on one topic will generally have some related words or themes between them. This may or may not be desirable for your purposes.
This step can be done programmatically in chrome with the JSON object (an exercise I leave to you), or you could add the extension called permute-lines to VSCode and use that on your wikiquotes.txt file.
That’s it for data preparation! 👏

Training your AI
In this section we’ll basically just be hitting run a bunch to execute all the code that’s already there for us. The only changes we’ll be making is adding our own dataset. Easy peasy! That being said, I’m going to try and go through each step clearly since this stuff can still be intimidating at first.
You can (optionally) head on over to Max Woolf’s amazing repo to learn a bit about the massive GPT-2 language model that OpenAI released in 2019.

1) Open the notebook
First, you’ll need to open this Google Colab notebook and then click File > Save a copy to Drive. If you don’t make a copy, then you won’t be able to save your changes.
If you’re wondering, “What the heck is a notebook?”, don’t worry about it for now; I’ll talk a bit more about notebooks in the inference section below.
I think this notebook is amazing for two reasons:
- Max has made is super easy to train your own AI models; he even included an example dataset in the Colab notebook.
- It’s free to train models in this notebook thanks to Google’s free-tier services. Most cloud training services can get quite expensive.

When you first open the notebook, you should see something like the above screenshot. Max has left some helpful notes describing the different steps if you’re curious what’s going on here, but if you’re thinking, “I’m just here to make the silly AI dance”, then read on!
2) Run the first cell

Go ahead and click within the first cell (box of code), which will cause a ‘play’ button to appear on the left. Let’s click that play button to run the first part of the notebook.

There will be some spinning around the play button and you may see a warning. Don’t worry about the warning. You’ve just executed the first part of the notebook, which installs libraries that we’ll need to summon a wisdom-spitting quotable AI.
3) Check the hardware for funsies
The next section is just for verifying the hardware that’s being used by this notebook on Google’s end. Let’s run it and make sure there are no errors.

All systems go! We’re ready to download the pretrained model.
4) Download the pretrained model

This pretrained model basically gives our AI some general language smarts. We’re going to augment it with our precious training data. First we need to mount the model that we just downloaded to Google Drive.
5) Mount the model

When you run this function, you’ll need to click a link, provide some permissions, and then copy an authorization code. Paste the copied code into the field that appears within the output area and press Enter. You’re now all mounted up.
6) Upload the quotes
This is it, the time has come. Crack your knuckles for dramatic effect and follow the instructions in the notebook:

Remember when we saved our wikiquotes.txt file in a handy, easy to find place? Smash that upload button and go find it! After you do, it should appear shortly alongside all the files that were already in the directory.

7) Change the file name
Change the filename in the next cell to match our file:


Hit run, and voila! Our training data is loaded and you’re ready to train your very own glorious AI overlord. Head on down to the Finetune GTP-2 section of the notebook when you’re ready (training might take 4-ish hours, but you’ll get to see some results as it progresses!).
8) Train ALL the data!
After you hit run on the next cell, it’ll take a minute or so to spin up, then you’ll see this:

4 hours later…

You’ve just successfully trained a neural network on 40,000+ quotes!
NICE 👌
At this point I’d recommend following Max’s advice and saving the trained model to your Google Drive so that you don’t have to retrain the model. Saving it to your drive will make it easier to load if you do want to play around with it later.

Time to receive some robot wisdom! By the way, ‘round these parts, asking your model to output a prediction is called inference. And by prediction I mean silly quotes.
Inference: Talking to your AI
A note on notebooks:
It can be a little tricky to output a working AI model and then build an API to access it — notebooks to the rescue! Notebooks are so great for research and for iterative development in general because they allow you to rapidly prototype while being easy to share. The most popular notebook suite, which Google Colab is based on, is called Jupyter it’s for Python (for you Javascript devs out there, check out runkit!).
Digression aside, we’ve gone through the effort to collect data and to train a state-of-the-art language model, where’s our funny tweet material? The road to internet stardom awaits.
This part is actually super simple since we’re using Max’s Colab notebook. Scroll on down to the Generate Text From The Trained Model section and run the first cell.

This is it, this is what you came here and jumped through all the hoops for! Your fine-tuned neural network is now able to predict the next word of a sequence in a very human way. Scary!
But how did I coax mine to generate quotes on a specific topic? Scroll down once again to the next cell and enter your own text on the line that says prefix=””. I would also recommend changing the length to something small like 25, which will generate something short and quote-like.

We’ve reached peak AI intelligence here; you can now ask it about anything thanks to the fact that GPT-2 was pretrained to be generally proficient in the English language.
What you’ve learned
By finishing this tutorial you’ve learned how to:
- Download a dataset from Kaggle
- Open JSON files in VSCode
- Convert the contents to plaintext via Snippets
- Open and use Google Colab notebooks
- Upload your own dataset to an AI notebook
- Finetune (train) your AI on your own dataset
- Generate random and custom predictions from your AI
Give yourself (and this tutorial😉) a round of applause, that’s a lot of new stuff! If you’ve got any questions, feedback, or corrections to share, I’d love to hear about it. I’d also love to read about your own funny/interesting AI results.
Now go forth and do good with your newfound powers. Or get famous on twitter. Or both. Good luck and thanks for reading!
Next Steps
If you’d like to continue your AI education/tinkering, here are a few ideas on next steps:
- Find your own dataset and train an AI
- Create an API using tensorflow.js
- Take the excellent, free fast.ai course (it takes a similar, do-cool-stuff-first approach)
