Models Training Techniques

Pre-training, Continued-Pre-training and Fine-tuning

Bowen Li
1 min readJun 25, 2024

What are the foundation models?

Typically, we start with a models which have been pretrained on trillions of tokens to perform next token prediction, then fine-tuned for a particular task, like instruction or chat.

What are the instruction models?

Instruction based models are trained and optimized to follow specific directions given in the prompt while chat based models are trained and optimized to handle conversational formats over multiple turns, maintaining context and coherence throughout the conversation.

Pre-training

Pre-Training a model creates a foundation which will be used as a base for all downstream tasks. The process includes:

  • Defining the aechitecture for the model
  • Curating a dataset
  • Training the model
  • Evaluating its performance

Continued Pre-training(CPT)

CPT is a type of fine-tuning that allows enhances LLMs with deeper domain knowledge in areas such as medicine and finance. More important, CPT doesn’t require labeled training data. And it’s more cost effective than pre-training.

Instruction Fine-tuning(IFT)

IFT is used to teach a model how to perform a particular task. It typically requires thousands of examples and can be used for a specific purpose such as improving question answering, extracting key information, or adopting a certain tone.

Credit

--

--

Bowen Li

Casual researcher at RMIT and open-source contributor.