Building bricks of ChatGPT/GPT-3 (explanation of tokens for non-tech readers)
If you have played already with ChatGPT, most probably, you were impressed by its ability to work with texts. But how does it do it?
It could be beneficial to understand the technology, at least on an elementary level. It may help you set realistic expectations, avoid false assumptions about the qualities it does not possess, and use this tool more efficiently.
What is GPT-3
As said before, this is the name of a family of models, including the famous ChatGPT. We already discussed their “family tree” in ChatGPT, GPT-3, DaVinci, Whisper, OpenAI… Who is who?
By its nature, GPT-3 is a Large Language Model, a neural network model trained to process and generate human-like text.
In simple words, this is the machine, which had read lots of text and, based on that, learned how to create new text that humans would appreciate.
How does it work
GPT takes the input text and splits it into small pieces called tokens. Then it applies mathematics to create sequences of these tokens and calculate probabilities for the subsequent tokens that may follow the input text. (I hope data scientists will forgive me for such oversimplification)
Let’s illustrate it using the example.
I sent some text to the server, asking to continue. That’s what I received (highlighted):
Now let’s check how it calculated probabilities for the first token added after my text, which happened to be “ usually”:
Let’s change the original text, add the word “usually,” and submit the request again. As you see, it calculated consequent tokens again and achieved the same result. So, here are the probabilities for the next token:
And so on.
In the experiment above, I instructed GPT-3 to select the tokens only with the highest probability. Hence, we received consistent results between the steps. But now, I will allow the machine more freedom. So let’s send the original text again with more loose parameters.
As you see, the text became completely different. Why? Because now it selected the first token with lower probability and substantially changed the whole sequence.
If I submit the same request again, it will find other random probabilities, and the text again will be different.
Costs of tokens
If you use GPT-3 via API or Playground, OpenAI uses tokens to charge you for the service. The price depends on the model you use. For example, usage of model davinci costs $0.02 per thousand tokens.
If you use ChatGPT, it does not apply, as the service is free.
Limitations of tokens
Also, GPT-3 limits users by the number of tokens processed in one request. Similar to costs, limits are applied based on the model. For example, Davinci will allow you to process a maximum of 4,000 tokens in one request.
Please, consider that this number is calculated as the sum of your prompt (instructions you send to the server) and the response.
For example, you want to summarize text that consists of 3,000 tokens and send it with your prompt. In that case, you will have only 1,000 tokens available for the response.
ChatGPT allows you to process more tokens with one request. However, precise limits have not been announced.
How to estimate the number of tokens in a text
OpenAI suggests the following: “as a rough rule of thumb, one token is approximately four characters or 0.75 words for English text.”
Of course, this is a rough approximation, but it works. Also, it will be different for other languages. We will discuss it in more detail in one of the later articles.
To be continued
I hope this post helped you understand what GPT-3 is and how it uses tokens. If so, please, click the clap button below so I can have feedback :)
The following articles will discuss what GPT-3 is not and what false assumptions many people make about it. So stay tuned and follow me here or on Twitter.