Hands-On Quickstart to Training Large Language Models

Quickly Build a ChatGPT-style model for your domain of expertise

2 min readMay 20, 2023

The publication of pre-trained models on HuggingFace, and the advent of LoRA and PEFT allows everyone to build useful models. So let’s do that!

Open this Colab notebook [1]. I have added comments and text to guide you. Good luck!

I Highly recommend one of the HuggingFace courses. They have sections that explain most aspects of training.
Decide what kind of NLP task you want to do: Causal_LM, Seq2Seq_LM, SEQ_CLS, TOKEN_CLS:

Causal_LM: ChatGPT-like, form sentences from predicting the next word from the previous words.
Seq2Seq : Transform a Sentence (or sentences) into other sentences. Translation is a classic example, summarization, explanation.
SEQ_CLS : Sequence classification. Examples: Sentiment analysis (Is this review positive?, What’s the tone of this Tweet?), Intent recognition
TOKEN_CLS: Token Classification. Example Uses: Named Entity Recognition. Is the “Bing” in “Chandler Bing” a last name, or a sound?

Tokenization — converting words into numbers so that models can reason about them
Pretrained — models that someone else has built and trained (usually to great expense — be sure to say Thank you!) that you can then improve upon (Fine-tune)
Fine-tuning — “add” to the pretrained models’ knowledge base. You wanna do this if you want the model to answer domain-specific questions e.g. Questions on your company’s codebase, HR & Hiring, meeting summaries etc.

References

[1] https://colab.research.google.com/drive/17VpyJc40y7Oy1d437Rjy7PFcNDZISynF?usp=sharing. I found the original version of this Colab notebook in a Youtube video, but I lost the link. Sorry.