Practical introduction to language modelling

Thomas
5 min readJan 26, 2022

--

Learn by doing: build and deploy a conversational AI to thousands of users in minutes. Then understand the high level theory behind the models.

Learn by doing: experimenting with bot building on the Chai platform. Then deploy your experiment to thousands of users.

Contents:

  • Practice: Build and deploy a conversational AI to thousands of users
  • Theory: How do you do math with words?
  • Context: Why this is relevant to you

Build and deploy a conversational AI to thousands of users

The best way to learn about a new technology is to build something with it. In the next few paragraphs I will give you all the tools you need to build your first conversational AI.

At a high level we will: give your AI a name and picture, give the model powering it a prompt (which you can think of as the context, i.e. how you would describe this AI to someone), then we will hard-code the start of the conversation, choose a model, and finally tune the model parameters.

To get started: head over to the online conversational AI builder and click “build a bot”. First choose what you’d like to name the bot, and give a short description of it for the users who will speak to it. Then you can pick a “profile picture” — what do you want your AI to look like? Next, you need to hard-code the first message that the bot will be sending you, or the user.

The models powering these AIs (GPT-J, fairseq, …) are general purpose language models. I’ll explain in the theory section below what this means in detail. In practice what is relevant is that they are not made just for having a conversation. We need to engineer a prompt for the model to understand that its job is to chat with the user. The way you do this is by formatting the prompt as follows:

Jessica is a primary school teacher from New York. She is quite chatty and likes to talk about User’s feelings and about anything related to visual arts.

Jessica: Hey there :) My name is Jessica, I’m a primary school teacher.
User: Hi, how are you today?
Jessica: I’m feeling great. How about you?
User: Feeling a bit down actually
Jessica:

There are two parts here:

  1. A description of the personality of the bot
  2. A start of conversation that ends with “Jessica:”

The model is then able to understand the pattern: it has to respond as “Jessica”.

Once we’ve given the model a name, picture, personality and conversation history we can already just speak to it on the web app and deploy it to the mobile app, where thousands of users can speak to the bots.

Advanced parameters

If you’re feeling adventurous you can play with some of the model parameters, or even choose the model itself.

The model settings are the temperature and the repetition penalty. If you reduce the temperature parameter, your bot will be generating more reasonable responses that are likely to be coherent. If you increase it you may get more creative responses from your AI but it is also more likely to be incoherent.

The repetition penalty is another parameter you can tweak. If you make it lower, the model is allowed to repeat words it has seen before more often, if you make it higher it won’t do it as much. If the repetition penalty is too low then the chatbot is likely to repeat itself over and over. If it’s too high the bot will be forced to give more unexpected responses which might not be what you were hoping for.

Deploying your bot to thousands of users!

Once you’ve played with the parameters and you are pleased with the performance of your AI, you can hit the “Publish to app” button, and in about a minute it will be deployed! Then you just need to check out the bot leaderboard to find out if users liked your bot.

The chat-bot leaderboard.

Theory: How machine learning models power chatbots

What does it even mean to build a “language model”? It doesn’t mean hard-coding the rules of grammar. What it actually means is building a piece of software that you can give a sequence of words or characters, and it will output a good sequence of words of characters to continue the input sequence. For example:

Imagine there are only 4 letters in the alphabet : H, E, L, and O. If we train a model on the training sequence “HELLO” we would expect that if we gave it as input “HE” it would output “L”, if we gave it “HELL” if would output “O”.

More formally we can write this as: what character c_n maximises the probability P(c_n | c_{n-1},….,c_0)?

In the above example c_n is the letter “O”.

How do you do math with words?

Neural networks are basically big calculus machines doing partial derivatives for back-propagation and applying non-linear functions (e.g. sigmoid). If you need a refresher on this check out 3blue1brown’s lecture series. But how do you do calculus with letters? You don’t. Instead you use word embeddings, which means a schema for turning a character into a vector. The simplest of these methods is 1-to-k encoding: In our little example above our vocabulary is [H,E,L,O] so the letter H gets encoded as the [1,0,0,0] vector, the letter E as [0,1,0,0] etc… Ok, now we’ve turned characters into vectors, and we can definitely feed vectors into neural networks (this is the input layer).

And that’s exactly what’s going on under the hood, at a high level, with these language models like GPT-J.

Context: why is now a great time to learn about language modelling?

The short answer is this: because no-one else has picked up on the importance of these models. If you start now you’re ahead of the curve.

In the last year the rate of progress in the language modelling world has been dizzying. In just the last 6 months: EleutherAI released an open source version of GPT-3, DeepMind demonstrated you can build models just as powerful as GPT-3 but much smaller (Retrieval Enhanced Transformers), making it cheaper to train and easier to deploy, and Facebook AI Research built their own model which outperforms GPT-3 (fairseq).

At this rate, and considering how good the models are already, it’s very clear that in a few years everyone will clearly see the potential of this technology. And then lots of businesses will spring into existence on the back of it.

--

--

Thomas

Co-Founder of Chai Research. University of Cambridge Astrophysicist. Former quant trader.