Beyond Few-Shot Learning: Fine-tuning with GPT-3
For this article I’m going to assume you have some background knowledge on GPT-3, a large text generation and classification model created by OpenAI. If not, please check out these articles https://www.technologyreview.com/2020/07/20/1005454/openai-machine-learning-language-generator-gpt-3-nlp/
https://medium.com/@colemanhindes/how-to-finesse-openais-gpt-3-api-a-tutorial-c43dc4f59c7c
Much of the hype around GPT-3 comes from its ability to generalize from limited input https://arxiv.org/abs/2005.14165. After its initial training (on a giant dataset scraped from the internet), GPT-3 is able to generate realistic answers to many questions, create compelling copy from short statements, summarize articles and more. And this is without the time and data intensive task-specific training that these types of problems usually involve. Instead of spending months scraping data, cleaning, training, and tweaking models, an engineer can get impressive results to many language generation and classification problems by writing a carefully phrased input string (known as a prompt) to the API.
However, while the results are certainly impressive given the circumstances (high performance on tasks it was not trained for), the flip side of this is that the primary API does not support any task specific training, meaning that it can’t improve its performance no matter how many times it completes the same task. The only optimization that can be done is to tweak the input string — which is limited to roughly 1500 words). This is now known as prompt engineering, and many tips and tricks have been devised for how to get the best results using this method, including stuffing a few examples into the prompt as mini-training data.
Understandably this has it’s flaws, particularly in a production environment. Most use-cases are specialized, and GPT-3 can say random, unrelated, and potentially inappropriate things. It can misunderstand the context of the question, change tone abruptly, or deviate from the expected format of the response. This means you likely have to do extensive post-processing to clean up the output for your users.
Fine-tuning, which was released as part of the Beta yesterday (July 13, 2021), should help with this. Fine tuning means that you can upload custom, task specific training data, while still leveraging the powerful model behind GPT-3. This means
- Higher quality results than prompt design
- Ability to train on more examples than can fit in a prompt
- Token savings due to shorter prompts
Lower latency requests
And lastly, an important quote from the release
“Performance tends to linearly increase with every doubling of the number of examples.”
I predict that this will lead to dramatic improvement in existing GPT-3 applications, and unlock a whole new class of problems that can be solved using this new fine tuning functionality.
If you have a problem (natural language generation, natural language understanding, classification, extraction) that you think could be solved with fine-tuning, or you have any questions or comments about the article, feel free to reach out!
