High Level LLMs Intro

Romy Adi Rotbar
3 min readJan 1, 2024

--

What is an LLM? How does it work? How can you use it for your own needs and applications? In this series, I’ll cover the main concepts behind LLMs, and hopefully, I’ll address your questions!

What is LLM?

LLM stands for Large Language Model, referring to a deep neural network that has been trained on terabytes of data with billions of parameters. These parameters are acquired during the training process and define the model’s behavior. For instance, the most well-known LLM, ChatGPT version 3.5 (developed by OpenAI), was trained with 175 billion parameters. Various companies have trained different LLMs, utilizing varying amounts of data and parameters to meet specific needs. However, even smaller models with 7 billion parameters are quite expensive to train.

The primary concept is to train a single, extensive model with the capability to generate coherent and contextually relevant responses across a broad spectrum of inputs and tasks.

LLM’s tasks

LLMs have various tasks they can handle, depending on the architecture (which will be explored further in this series). However, all of them share the same underlying concept — generating predictions for the next word.

The main tasks are:

  1. Chatbot — it’s the basic task. All the other tasks are base in this way or another, on it.
  2. Summarization of texts, including conversations
  3. Writing an essay base on an input (prompt)
  4. Translation — between two languages, or from a language to code
  5. Information retrieval
Example of translate neutral language to Java code

These tasks will be improved in the future, allowing us to develop amazing applications using them.

Known LLMs:

High level summray of LLM’s number of parameters
  1. GPT Family(Generative Pre-trained Transformer 3)
  • Company: OpenAI
  • Number of Parameters: 175 billion (in v3.5)

2. BERT (Bidirectional Encoder Representations from Transformers)

  • Company: Google
  • Number of Parameters: The original BERT model had 340 million parameters, but there are variations with different parameter counts.

3. T5 (Text-To-Text Transfer Transformer)

  • Company: Google
  • Number of Parameters: T5 models can vary in size, with smaller models having fewer parameters and larger models having more. For example, T5-Small has 60 million parameters.

4. XLNet

  • Company: Google, Carnegie Mellon University, and others
  • Number of Parameters: XLNet models vary in size; the larger models can have around 340 million parameters.

5. RoBERTa (Robustly optimized BERT approach)

  • Company: Facebook AI
  • Number of Parameters: RoBERTa models vary in size; for example, RoBERTa-Large has 355 million parameters.

6. ALBERT (A Lite BERT)

  • Company: Google
  • Number of Parameters: ALBERT models come in different sizes, with the largest having 11 million parameters (ALBERT-xxlarge).

7. ERNIE (Enhanced Representation through kNowledge Integration)

  • Company: Baidu
  • Number of Parameters: ERNIE models come in different sizes; ERNIE 2.0 Large has 340 million parameters.

but.. how it’s actually work? see you in the next blog!

I hope you enjoyed and learned! If you see an error you noticed or something you want to share with me please drop a comment below!

--

--