Absolute beginners guide to making sense of key AI terms in 2023

Thiyagarajan Maruthavan (Rajan)
15 min readApr 4, 2023

--

Thanks to the explosion of ChatGPT, several terms that are used in the field of AI meant for researchers are getting thrown out into more mainstream conversations. If you are in this conversation and don’t know these terms it can trigger an alien feeling that is a cocktail of impostor and FOMO. It is the job of the one who understands this more to check his curse of knowledge but given the pace of change it does not happen.

This guide is to help make sense of some of these commonly used terms so that you can understand and converse better on such discussions.

There are two main sections to this guide, the first section defines things metaphorically. It assumes that you have zero knowledge of even computers. This is more closer to how Steve Jobs described a computer. “Computer is a bicycle for the mind”. It is ‘explain it like I am a five year old’ (ELI5) version. If you don’t prefer the ELI5 version then skip to the second section which explains the same things in a way that an 18 year old and above who is exposed to technology, computers & programming can understand. It is explained as mental models. If you crossed these two stages and want to reason from first principles then you must go to the source. Last section refers to key original papers and text.

Explained as a metaphor

Computers are this tool that can create enormous leverage to our thinking. Reason why they were dubbed as ‘bicycle of the mind’. However they always relied on our instructions. They can complete our instruction super fast but instructions must come from us humans. These instructions which initially started as machine language became sophisticated over a period of time to higher level language. Specialists who understand these languages, also called programmers, packaged these instructions into easy to use recipe books so that normal folks like us can use this as a shortcut to get something done by the computer. These package recipe books have the generic name of ‘applications’ or ‘apps’. They are more familiar to use through their specific names such as Chrome, MS office, Tiktok etc. Turns that the industry of building the tool of the computer, improving the language of instructing the tool, the value chain of packaging the recipe book is one of the high impact and margin rich industries. Companies in this industry such as Microsoft, Adobe have overtaken Oil companies in the stock market to be in the top 5 over the last 10 years.

As early as 70 years ago, few early designers of this tool dreamt of a different scenario. They asked the question: can this tool learn and formulate its instructions from observing us humans just like how babies do ? If it did it will have radical change on how this industry is organized. No more do you have to instruct i.e program them, you have to instead ask them to learn. If these learning can be sophisticated to the level that it starts resembling an average human adult then it would be called intelligence. As a parallel to world human intelligence, this is referred to as artificial intelligence.

This dream scenario has gone through its own ups and down. The vision has led to many dedicating their lives and making big breakthroughs. The philosophical implications of this are explored in several movies — Terminator, Bicentennial Man, Matrix, A.I. , HAL , ex-Machina,

In 2023 a big shift has happened, now mainstream adoption is happening of what once was a dream scenario and that is why terms used by small group of people are getting broadly discussed.

Let’s look at some key terms used when people chat about this dream scenario.

Machine Learning

Machine learning is the technique or art that enables computer systems to automatically learn and improve from experience without being explicitly instructed or programmed. Instead of machines getting instructions they receive the observation of both input and output and are expected to learn from that.

Think of machine learning as the process of learning cooking. Just like there are different cuisines, different ways of cooking there are different models of machine learning.

There are 3 types of machine learning

  • Supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning

Supervised Learning

Supervised learning can be thought of as learning how to cook a dish under the guidance of a mentor who provides them with feedback on their cooking

Unsupervised Learning

Unsupervised learning can be thought of as a chef experimenting with ingredients to discover new flavor combinations without a recipe or mentor’s guidance

Reinforcement Learning

Reinforcement learning can be thought of as a chef learning how to cook a dish by repeatedly trying different techniques and receiving and getting rewards on punishments

NLP — Natural Language Processing

NLP is the term used to refer to a subset of machine learning scope that is limited to when ingredients of learning are only in the natural language used by humans.

Machine Learning Model

Machine learning models are different from machine learning. If machine learning is the art of teaching a computer system how to cook a new recipe. Machine learning model would be a specific recipe for a dish that has been developed and refined over time.

Neural Networks

A neural network is a type of machine learning model that is inspired by the structure and function of the human brain.

Imagine you are trying to teach a group of chefs how to recognize different ingredients based on observation. A neural network is like organizing these chefs into several rows or teams, where each chef is responsible for recognizing specific characteristics. The first row team might identify basic features like portions, look/color, or way it should look. The second row team uses the first team’s findings to recognize more complex features, like texture or fragrance. Each subsequent team identifies even more intricate details based on the previous teams’ work. The final team makes the decision about what ingredient they’re looking at.

In this cooking analogy, each chef is like a neuron, and their decisions are based on the information they receive from the previous row. Neurons in a neural network receive input, process it, and pass the result to the next layer. Knowledge from each row is called as weights, and it determines the importance of each input. During the learning process, these weights are adjusted to minimize the error between the network’s predictions and the actual outcomes. All the chefs have to talk to each other based on what they have learnt by looking at the output given during the training

Deep Learning

Deep learning is a subset of neural networks (machine learning model) that uses many rows i.e layers (hence, “deep”) to learn and model more complex features and patterns. If machine learning is cooking, deep learning is advanced cooking. It is believed that any complex problem of learning can be solved by increasing the number of rows (layers) in a neural network. Up until 2012 it was believed that this can’t be done in a practical way. Computer researchers Ilya Sutseveker, Geoff Hinton, Andrej Karpathy created a breakthrough here in one of their research work named ImageNet that unlocked floodgates in machine learning techniques. Their research work suggested that if the rows or layers are organized in a specific way, if a special type of computer called GPU is used then computers can learn to see and visualize objects similar to how a baby does . This field of asking computers to see is called computer vision. Deep learning created a breakthrough in computer vision.

Transformers

A Transformer is a machine learning model. Think of them as chefs, not just a refined recipe. They are designed to understand and process sequences or combinations of information. The previous machine learning model never took context into account. Earlier popular ones were always looking at inputs in a sequence one at a time. This is the first model that takes context into account

To learn this in a way, the machine has to understand the relationships and dependencies between different ingredients and techniques. Transformer model learns to understand the relationships and dependencies between words or phrases in a text that is fed to it.

This model was released through a paper “Attention is all you need” and has been the most revolutionary research work of the last 5 years. This model has beaten every other model and also works for a wide variety of domains. It can do language translation, can answer questions, and do computer vision tasks. Researchers are still finding out that there are many artificial intelligence tasks that a transformer can handle. The reason it is so revolutionary is due to the general purpose applicability of this.

Back Propagation Algorithm

Back propagation is the most popular algorithm (step by step instructions) given to a computer which is running a machine learning model such as a neural network.

Back propagation refers to the technique in which the recipe improves or where a chef learns. Using the inputs (or ingredients) an output (or dish) is created for diners. After tasting the dish, the diners provide feedback on the dish’s quality, such as flavor, texture, or presentation.

In a neural network, this feedback comes in the form of a loss function, which measures the difference between the model’s output and the expected output (ground truth).

Based on the feedback, the chef identifies the aspects of the dish that need improvement, such as adjusting ingredient quantities or altering the cooking time.

This feedback he memorizes in his mind, that memory is called as weights.

The chef continues to receive feedback and make adjustments to their techniques and recipes until the diners are satisfied with the dish

Token

A token is a unit of text that represents a single meaningful element, such as a word, subword or a punctuation mark. Tokens are created by breaking down sentences or larger text into smaller, more manageable pieces for analysis, processing, or generating new text.

Using the cooking metaphor, tokens can be thought of as the individual ingredients that make up a recipe. Just as a recipe is a combination of various ingredients, a sentence or text is a combination of tokens. Each token contributes to the overall meaning or structure of the text, just as each ingredient contributes to the flavor and texture of a dish.

Tokenization is the process of converting a text into tokens, which involves splitting the text based on specific rules or criteria, such as spaces, punctuation marks, or other delimiters.

Tuning

Tuning, also known as fine-tuning, is a step in the machine learning process where a pre-trained model is further trained on a specific dataset or task to improve its performance and adapt to the nuances of the new data or problem. This process helps the model to become more specialized and proficient in the target task or domain.

Continuing with the cooking metaphor, imagine a chef who has a broad culinary background and general cooking skills. Tuning is like providing the chef with special training in a specific cuisine or technique to refine their expertise in that area. It is like a master chef who is training in Italian cuisine.

Tuning is a common practice in machine learning, as it allows researchers and developers to leverage the capabilities of pre-trained models and customize them to address specific tasks or problems, saving time and resources compared to training a new model from scratch.

Transfer Learning

Transfer Learning is like a chef leveraging their existing culinary knowledge and skills to quickly adapt to a new cuisine or cooking style. Building on their foundational knowledge from pre-training, the chef can apply their understanding of techniques, ingredients, and flavor combinations to create dishes in the new cuisine.

Pre-training

Pre-training is like a chef’s foundational culinary education and early experiences. during pre-training, a model like GPT learns the general structure of language, grammar, syntax, and some factual knowledge by processing a massive corpus of text from diverse sources.The word P in GPT stands for pre-training

Fine-tuning

Fine-tuning is the process of adjusting the chef’s techniques, experimenting with new ingredients, or altering cooking times to adapt their skills to the new cuisine

fine-tuning refers to the process of adapting the pre-trained model’s parameters using a smaller, task-specific dataset. This enables the model to generate more accurate and relevant outputs for the new task, building upon the general knowledge it acquired during pre-training.

Training Dataset

Training data set is like a recipe book that chef learns from, providing a structured guide for the chef to follow and learn new techniques and flavor combinations. In machine learning, the training dataset is a set of labeled examples that teach a model how to make accurate predictions by identifying patterns and relationships between the input features and the output labels.

LLM — Large Language Mode

LLM stands for Large Language Model. LLMs are deep neural networks (machine learning model) built specifically for language learning. Due to the popularity of Transformers, most LLMs are based on Transformer.s To continue with the cooking metaphor, an LLM can be thought of as a master chef who has studied and practiced countless recipes, techniques, and cuisines. This vast experience enables the master chef to understand and create a wide range of dishes, adapt to new ingredients, and innovate by combining different culinary elements.

So there is the art of cooking (machine learning), advanced cooking, refined recipe books (machine learning models), chefs (transformers) and LLMs (master chef).

Generative

The general expectation of the machine learning model has been that it will correctly identify what it has learnt through the enormous data set that it has trained on. Now when a model is able to express things which are not explicitly in the training set then it is said to be generative. It is like a chef who creates a new dish based on their knowledge and understanding of various ingredients and techniques.

GPT

GPT stands for Generative Pre-trained Transformer.

Transformer — A transformer is a LLM i.e large language model which means it is primarily built for language learning tasks. An LLM is a deep neural network. A neural network is a machine learning model. Unique aspect of this model is that it takes context into account when taking an input text, it recognises that the position of a word in a sentence changes its meaning. Transformer is like a master chef it is so powerful that it can learn and do so many different cuisines

Pre-training — stands for learning from a large dataset, much like a chef attending culinary school. During this stage, the chef learns fundamental cooking techniques, basic recipes, and the characteristics of various ingredients.

Generative — When a model is able to express things which are not explicitly in the training set then it is said to be generative. It is like a chef who creates a new dish based on their knowledge and understanding of various ingredients and techniques.

GPT had the most unprecedented adoption in the history of technology. It reached 100m users in less than 3 months.

GPT, T5, BART

They are all large language models (i.e master chef) from different large companies. They also different slightly in their approach.

GPT is the master chef from the house of OpenAI

BART is the master chef from the house of Facebook.

T5 is the master chef from the house of Google.

The difference among those three is that GPT is “free style” cooking master chef, T5 is “fusion cooking” and BART is a “precision cooking” master chef. Because of these differences each is good at doing one thing better over the other. GPT is the most popular of them all.

ChatGPT

If GPT is the master chef with broad expertise in text generation, ChatGPT is the specialized personal chef who engages in conversations with customers to cater to their specific needs. Together, they form a team that can produce engaging, contextually relevant, and personalized dialogues for users.

Parameters

This term is often interchangeably used with the term. weights. When you look at the back propagation definition you would notice that the process of learning involves collecting feedback and making adjustments on ingredients based on the feedback. The memory of adjustment is called weights. The more memory the neural network has learnt. GPT2 had $1.5 billion parameters (or memory points), GPT1 has 117 million, GP2 has 1.5 billion, GPT3 has 175 billion while GPT 4 has 170 trillion parameters. This is not to be confused with memory that GPT 3 and GPT 4 has.

Have explicitly left defining few foundational unit such as perceptron, expert systems, decision trees, support vector machine, recurrent neural network (RNN), convolutional neural network (CNN), generative adversarial network (GAN), long term short memory (LTSM) which is important to understand things from historical and evolution perspective.

Metaphors are never perfect. They are like this rickety sofa in your brain on which you can seat a more complex understanding. Now that you have this rickety sofa for machine learning and AI go forward to read more things to improve your understanding.

Explained as a mental model

Metaphors are good for starting an understanding but they are rarely actionable. If you are technologist and want to make some choices based then metaphors are not handy, a mental model or a framework is critical. To understand what is happening in the field of machine learning, AI and LLMs it is useful to think about LLM as equivalent to an OS.

LLM — Large Language Model is like the operating system.

GPT & BART — When comparing LLM to the operating system GPT3 is like Windows NT. while Google BART is like Macintosh. BART is more robust technical but GPT3 has more widespread adoption.

Neural Engine — A neural engine, also known as a neural processing unit (NPU) or AI accelerator, is a specialized hardware component designed to efficiently handle the computational tasks associated with artificial neural networks, deep learning, and machine learning algorithms. Many technology companies have developed their own neural engines or AI accelerators, including Apple’s Neural Engine used in their A-series and M-series chips, Google’s Tensor Processing Units (TPUs), and NVIDIA’s Deep Learning Accelerators (DLAs). Neural engines are optimized for the unique demands of AI workloads, such as performing matrix multiplications, tensor operations, and other complex mathematical functions at high speed and with low power consumption. This makes them more efficient for AI tasks compared to general-purpose processors like CPUs and, to some extent, GPUs.

Transformer — Just like how there was architecture design of OS there is architecture design of LLM. In the initial days of the operating system there was a huge debate between whether OS should be monolithic vs microkernel. Transformer is like a microkernel, the winning architecture. There will be several different LLMs that will be built, every big technology company would want to build its own but what is clear is that every one of them will be based on Transformer going forward.

LLM Applications — End user applications with user interfaces that are built on top of LLM can be called as LLM Applications. ChatGPT is a great example of a LLMApp built on top of the LLMOS called GPT, the latest version of the LLMOS is GPT4.

LLMStack refers to various sub components. These layers are defined in such a way that components within each layer can be independent units where they follow their different rate of progress. The base layer is the hardware layer which is now getting specialized as neural processing unit.

Base LLM — The next layer is the base LLM, it is the equivalent of the kernel in an operating system. Base LLM go through training on large generic datasets. As homogenization of models is happening, industry players are converging on standardization here. Those with an open mindset are sharing this base models

LLM Adap Layer — Base LLM are not finished and should not be used directly. Adap stands for adaptation layer, this is where the model goes through adaptation. De facto adaptation has been fine tuning. This adaptation layer can consist of mode patching as well as temporal adjustments for providing real time outputs

LLM Eval layer — This layer provides context to Base LLM by providing a means to track progress, understand models, and document their capabilities and bias. The context includes measure of accuracy, fairness, efficiency and environmental impact.

LLM SALI layer — Refers to safety and alignment layer to ensure that Base LLMs are reliable, robust and interoperable. This is very important when considering the potential real-world applications of these models. This also covers forecasting of emergent behaviors of the Base LLM models.

LLMOps

When you take an existing state-of-the-art model like GPT-3 and refine it into a model for your specific use case by teaching it with you industry-specific data. For example, your use case might have a desired output style and format (e.g. contract review). Through fine tuning you can use your proprietary datasets to refine the LLM’s ability to produce something that fits the description. This process of training and deploying LLMs is LLMOps

"The limits of my language is the limits of my world. " - Ludwig Wittgenstein

With an expanded vocabulary you can imagine newer worlds

--

--

Thiyagarajan Maruthavan (Rajan)

Assisting founders in avoiding getting lost in the product-market fit maze in AI SaaS.