Jokes at scale using GPT2 and AWS

Ricardo Elizondo
dataroots
Published in
8 min readMay 18, 2021

Creating a real-time inference joke generator with GPT2 and AWS

That third one was quite accurate, right? 👌

The goal of this article is to focus on how you can go about fine-tuning a state-of-the-art NLP model like GPT2 for joke-generation and deploy it as an API service for real-time inference using AWS services as an end-to-end solution.

If you felt like those were a whole lot of fancy words, don’t worry, buckle up and put on your blue-light coding glasses 🧑‍💻 (are those even a thing?) as we’ll go each one of them.

TL;DR: Try it live for yourself! (Currently 8:00–20:00 CET, Mon-Fri)

What even is AWS? 🕵🏻

AWS stands for Amazon Web Services. In short, it’s Amazon’s take on on-demand cloud computing services on a pay-as-you-go basis. As of 2020, AWS dominated the market share 🏅 compared to Microsoft’s Azure and Google’s GCP, with 30%, 18%, and 9% respectively.

What are web services?

Glad you asked. AWS offers almost any kind of service you may think of when it comes to the cloud. From basic storage with S3 and computing with EC2, to serverless, scalable functions with Lambda and machine learning-specific services with Sagemaker. We’ll touch upon all of these for this post.

AWS Sagemaker

As mentioned earlier, Sagemaker is AWS’s machine learning platform. It includes many services to prepare, build, train/tune, deploy and manage your machine learning models. For this post, we’ll mainly focus on Sagemaker Studio, training jobs, and endpoints. Here’s a quick overview of each of them:

  • Sagemaker Studio: a full-fledged IDE for machine learning. Here you can connect your Git repo for collaboration/versioning and easily develop on Jupyter notebooks which can be connected to a local kernel or even to an AWS instance (in case you want to prototype on a more powerful machine).
  • Sagemaker Training Jobs & Experiments 👨‍🔬️: training jobs make it incredibly easy to spin up the training of your models. You simply state some parameters like type of instance, how many instances, a time limit among others, and off you go. Then you can easily track your model’s metrics, as shown in the image below 📈.
    You can even tell Sagemaker that you want to train on spot instances and this can save you quite some 💰, just be aware of possible stopping of your instances and plan for it. Here’s the official tutorial from AWS.
Logging and tracking your model’s performance is easy with Sagemaker
  • Sagemaker Endpoints: endpoints are the way to deploy your models in Sagemaker. When creating an endpoint, you specify the model (which can be an artifact of a training job, or you can bring in your model file, e.g. from PyTorch), an entry point (think Docker containers), and instance type. Sagemaker will then provision this EC2 instance, spin up your model and expose it as an API for you.
    After your endpoint is created (may take ~5min), you can invoke it with a simple function, as shown in the image below.
Example code for invoking a previously created Sagemaker Endpoint

AWS Lambda 🧙

Lambda functions are magical. Lambda functions are serverless functions, meaning you don’t care about any of the nitty-gritty stuff like provisioning/managing servers, creating scaling logic, or managing runtimes. With Lambda, you simply give it the code you want to run (e.g. the image below) and AWS will manage the rest for you.

You can connect your Lambda functions to be triggered by almost any of the other AWS services (140, according to them). This is the heart of the project, as with Lambda we manage the invocation of the endpoint every time the user makes an API request, as well as the scheduling (creating and deleting) of the endpoint.

“Lambda functions are magical”

If all of the above hasn’t convinced you that Lambda functions are magical yet, wait to hear about the pricing. You get 1M free calls per month, if you somehow manage to need more, it’ll set you down ~$0.20 for another 1M requests. For more in-detailed specifics, look at the official pricing.

Example of a lambda function used to create a Sagemaker Endpoint

Using GPT2 💬

GPT2 is a transformer-based language model developed by OpenAI and released in February 2019. The technical details are out of the scope of this article, but if you’re interested I would recommend this post, where author Jay Alammar clearly explains the details of it while having great visuals.

Goal 🎯

GPT2’s goal, as seen in the gif below, put in simple terms is: to predict the next word, given several previous words as input. It has been trained on 8M web pages using the cross-entropy loss with a vocabulary of 50,000 English words.

“Predict the next word, given several previous words as input”

Image source: https://jalammar.github.io/illustrated-gpt2/

Fine-tuning GPT2

As mentioned earlier, GPT2’s goal ultimately is to make sentences that make sense in the English language. Our specific goal was to not only have it create sentences that made sense, but that were funny. For this, we decided to fine-tune the pre-trained model, which we got from Huggingface.

Fine-tuning GPT2 for joke-generation

Fine-tuning a model is the action of grabbing a model that has been previously trained, therefore containing pre-trained weights, and fully retraining all of its layers, with a specific dataset and with a very small learning rate, to not change the original weight values too much.

Training & final models

Using Sagemaker’s training jobs, we experimented with Huggingface’s GPT2 small (12 layers, 117M params), medium(24 layers, 245M params), and large (36 layers, 774M params) models.

To help avoid overfitting, we implemented a 5-epoch early-stopping, meaning the model is stopped if it doesn’t improve for 5 epochs. To measure performance, and we will not go into much detail, we relied on the BLEU score 🔠 which is a similarity metric (meaning higher is better) used to compare two sentences, usually used for evaluating translations. Specifically, we used PyTorch’s implementation of it. Every epoch, to decide if this model was better, we would compare this model’s cross-validation loss and BLEU score and choose it as the best model if both were at least equally good or better than the current best model.

When we were happy with our model’s performance, it was really easy to transform the training job’s artifacts to a Sagemaker endpoint to be later used as an API. For more specifics on how to do this, either having trained your model outside Sagemaker or inside, refer to this great tutorial 📝.

Best performing GPT2-Large and GPT2-Medium fine-tuned models

For completeness, here are the results in the test set for both models:

  • Medium: loss 1.76, BLEU 0.42
  • Large: loss 1.75, BLEU 0.48 🏆

“We would compare this model’s cross-validation loss and BLEU score and choose it as the best model if both were at least equally good or better than the current best model”

Data, data, data 🗃️

Any machine learning model needs some data to learn from. As our case is that of fine-tuning GPT2 for joke generation, naturally we need some jokes. For this, we used 3 different joke sources:

Visual representation of data-filtering pipeline

As you may imagine, the majority of jokes coming from Reddit, we ran into one too many NSFW jokes 🙅. We won’t get into much detail about the ETL, as there are a million posts and tutorials about how to best do this. In short, we created a custom-scraped NSFW word list, used several filters (e.g. min/max tokens, Reddit’s minimum score), and topped it all up with Spacy’s profanity filter.

In the end, we went from 238,521 jokes to a final dataset of 46,277 jokes.

Stitching it all together 🪡

If all you care about was to make use of AWS Sagemaker to build, train & deploy a fine-tuned model and make it available as a real-time inference API, you can skip this part. That is done with the creation of the Sagemaker endpoint. We wanted to deploy this as a website, so there were a couple of more services that we needed to use. We’ll skim through these:

  • API Gateway: this is, surprisingly 🤯, one of the few AWS services whose name describes what the service does. It is a fully managed service used to create, publish, monitor, and secure APIs at any scale.
  • CloudFront: AWS’s CDN (content delivery network). Needed whenever you want to deploy any kind of website, application, or API to customers globally with low latency and security.
  • S3: this is AWS’s oldest service. S3 stands for simple storage service and that is exactly what it does 💾. We use it to store not only our static website but also our models, data, and artifacts.
Final AWS Architecture

Conclusion 🍻

If you made it here, thank you for taking the time 🙏. If you’d like to learn more about a specific part of this project, please don’t hesitate to send us a message here or at our email.

If you are or want to become a Machine Learning, Data, DevOps, or Cloud Engineer and would want to apply for the best job in Belgium, please join us and apply for a job at Dataroots 🚀.

Ricardo & Ruben
ML Engineers @Dataroots

--

--