Large Language Models: What, How, Why?

Nicholas Wade
MLPurdue
5 min readMar 6, 2023

--

*Some accuracy is sacrificed for simpler explanations, check out the papers/links at the bottom for deeper and more complex explanations

What Are LLMs?

According to a blog post by NVIDIA, large language models are “deep learning algorithm[s] that can recognize, summarize, translate, predict, and generate text and other content based on knowledge gained from massive datasets.”

They are generally interacted with by a user inputting text, usually some sort of request or question, then the model generates language output. Although this can be automatically done when using something like GitHub Copilot. Which is a code auto-complete program, created using the GitHub Codex dataset, that continuously runs while writing code in an IDE.

The most popular models currently are OpenAI’s GPT-3, Google's LaMDA, Facebook’s newly public LLaMA, and the previously mentioned GitHub Copilot. LaMDA isn’t available for public use, you can download LLaMA and create your own inference scripts, OpenAI’s ChatGPT is currently free online with a pro version, and GitHub Copilot is paid except for students.

ChatGPT
Example of GitHub Copilot auto-complete

How Do LLMs Work?

Before Google’s revolutionary paper “Attention Is All You Need” in 2017, most language models used recurrent neural networks (RNNs). The problem with this method was that the model was fed the input in sequential order, meaning the model only knew the words that came before the current word. Google introduced transformers, a type of architecture that gives “attention” to all the input at all times. Basically introducing the idea of context to these models. This is where the LLM revolution started.

RNN vs. Transformer

The process of creating an LLM is a sort of 4 step process:

  1. Gather a dataset — usually web scraping
  2. Develop the model architecture — most are similar and use the transformer architecture
  3. Train the model — explained below
  4. Tune the model — Reinforcement Learning with Human Feedback

When explaining how LLMs are trained it is easier to think about it in terms of images rather than language. Stable diffusion is the method behind models like OpenAIs DALL-E and works by taking an image from the dataset, adding noise to it, then asking the model to work backwards and recreate the original image from the noise. LLMs basically train by taking chunks of words and predicting the next word in the sequence kind of like predicting pixels in an image. Then with enough data, like all of Wikipedia, New York Times, and Reddit, these models are able to produce output that is almost indistinguishable from real human language.

Example of something that may be asked while training a model

After the model is trained, tuning is why something like ChatGPT can exist. Humans use the model and just give it the input they expect to have upon deployment, then rate the output and give the model the “perfect” output (as determined by the human). This allows for models to become more conversational, better at emotionless explanation, or use more creative words/tones. This is great because the extremely large models trained on terabytes of internet data sometimes become too robotic or aren’t good for everyday human use. The difference between ChatGPT and GPT-3 is huge even though ChatGPT is just the GPT-3 model with tuning.

Example of RLHF on ChatGPT

Why Do We Want LLMs?

This seems almost useless to explain after seeing the explosion of ChatGPT. It is nearly impossible not to see your news, Instagram, LinkedIn, and Twitter feeds full of use cases. Nevertheless, many worry about LLMs taking jobs or making humans useless. As of right now, unless your job is using Wikipedia to explain things to other people, your job probably isn’t being taken over by ChatGPT.

It is better to think of these as tools, just like any other software or apps you use to make your workflow more efficient. These models are great for rapid prototyping, code auto-completion, scripting, summaries of text, and explanations. This may sound like something that can take people's jobs, but these models are known to “hallucinate” answers to the question/tasks they are given. Basically, they will give you an extremely confident answer that to a layman may seem correct, but is not right at all. This is why the user has to be knowledgeable about the task they are using the model for. If a layman is using ChatGPT to write their website, it could very easily be wrong or have plenty of security flaws.

Ok, so right now they may not be taking anybody’s jobs or giving correct answers every time, but what does the future hold? As better datasets are created and more finely-tuned models are released we could potentially see LLMs used as financial advisors, legal advisors, personal assistants, teachers, or even life coaches. Of course, this is all far away, but exciting to dream about where they may take us!

Papers/References/Articles:

  1. Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. de O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., … Zaremba, W. (2021, July 14). Evaluating large language models trained on code. arXiv.org. Retrieved February 20, 2023, from https://arxiv.org/abs/2107.03374
  2. Dilmegani, C. (2023, February 3). Large language model training in 2023. AIMultiple. Retrieved February 20, 2023, from https://research.aimultiple.com/large-language-model-training/
  3. Lee, A. (2023, January 30). What are large language models used for and why are they important? NVIDIA Blog. Retrieved February 20, 2023, from https://blogs.nvidia.com/blog/2023/01/26/what-are-large-language-models-used-for/
  4. Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022, March 4). Training language models to follow instructions with human feedback. arXiv.org. Retrieved February 20, 2023, from https://arxiv.org/abs/2203.02155
  5. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017, December 6). Attention is all you need. arXiv.org. Retrieved February 20, 2023, from https://arxiv.org/abs/1706.03762
  6. https://blog.google/technology/ai/lamda/
  7. https://openai.com/blog/chatgpt/
  8. https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
  9. https://github.com/features/copilot

--

--

Nicholas Wade
MLPurdue

CS @ Purdue | Interested In ML, Autonomy, Reinforcement Learning