Meta Joins the AI Race with its New Large Language Model, LLaMA

The open-source language model collection trained on trillions of tokens

André Ribeiro
Data Is Awesome
Published in
5 min readFeb 25, 2023

--

Photo by Dima Solomin on Unsplash.

Meta, formerly known as Facebook, has launched its most recent and capable collection of Large Language Models (LLMs) named LLaMA¹.

In the Facebook announcement on Friday, February 24th, 2023, CEO Mark Zuckerberg emphasized the importance of LLMs and their impact in multiple natural language applications: “LLMs have shown a lot of promise in generating text, having conversations, summarizing written material, and more complicated tasks like solving math theorems or predicting protein structures. Meta is committed to this open model of research and we’ll make our new model available to the AI research community”².

Meta’s goal is to democratize access to this technology and make it more accessible for researchers who may not have access to large amounts of infrastructure. This lack of access has hindered researchers’ ability to gain a deeper understanding of these models, including how they work and how to address issues such as bias, toxicity, and misinformation¹.

LLaMA sets itself apart from other systems like OpenAI’s ChatGPT, Microsoft Bing, or Google’s unreleased Bard. It’s not designed for general conversation or public use, but rather as a research tool for experts in the field.

LLaMA is the latest addition to a growing list of impressive language models, including GPT-3, Gopher, Chinchilla, and PaLM. The paper³ reports exceptional performance, outperforming GPT-3 on most benchmarks and competing with the best models available³. What is most notable is the fact that LLaMA achieves state-of-the-art results with models 10x smaller than GPT-3 (LLaMA-13B).

Zero-shot performance on Common Sense Reasoning tasks³. Additional results available here³.

Technical advancements

In contrast to Chinchilla, PaLM, or GPT-3, the approach used by the authors employs solely publicly available data, compatible with open-source³. Although there are some exceptions, such as OPT, GPT-NeoX, BLOOM, and GLM, none of them are competitive with PaLM-62B or Chinchilla in terms of performance. The authors utilize data sources that have been employed to train other LLMs, with the restriction of only using data that is publicly available.

LLaMA pre-training data³. The data lists the source dataset, sampling proportion, number of epochs and size.³ Additional results available here³.

Based on recent developments in LLMs, the network utilized by the authors is based on the transformer architecture⁴. They have made substantial modifications to the original architecture and have integrated several advancements that were subsequently proposed and applied in various models, such as PaLM. The most notable were the use Pre-normalization (GPT-3), SwiGLU activation function (PaLM), and Rotary Embeddings (GPTNeo)³.

To train their models, the authors have implemented several recent optimizations to enhance the training speed. When training a model with 65B parameters, their code is able to process approximately 380 tokens/sec/GPU on 2048 A100 GPU with 80GB of RAM. This implies that training over their dataset, which contains 1.4T tokens, takes around 21 days³.

Environmental impact

As mentioned previously, training LLMs often entails the use of substantial computational resources, such as high-end GPUs, for extended periods of time, leading to significant energy consumption. This energy consumption contributes to the carbon footprint of LLMs, leading to concerns about their sustainability and environmental impact. Moreover, as the use of LLMs increases, so does the carbon footprint, making it a pressing issue that needs to be addressed.

Carbon footprint of training different LLMs using the same GPU type and data center³. Additional results available here³.

The authors aim to make a positive contribution to the reduction of future carbon emissions by releasing their pre-trained models. The availability of these models can reduce the computational resources needed for training new models, leading to a potential decrease in the energy consumption associated with training LLMs. Additionally, some of the models (LLaMA-7B, LLaMA-13B) are relatively small and can be trained on a single GPU, further minimizing the environmental impact related to their use.

Requesting Access to LLaMA

In order to protect its integrity and stop improper use, LLaMA is being distributed under a non-profit license which is intended for research purposes. Access to the model will be provided to individual academics, governmental, non-governmental, educational, and corporate research laboratories based on a case by case basis.

To apply for access follow this link and fill the Google form provided. To use the model clone the Github repository provided by the research team at facebookresearch/llama⁵ and follow the instructions provided there.

Further Research

LLaMA has caused a lot of buzz due to LLaMA’s 13B structure which performs better than GPT-3, even though it is 10 times smaller. The group of underlying models allows for quicker inference speed and ChatGPT-like real-time assistant capabilities, all while being both efficient and able to run on a single GPU.

However, LLaMA had not been fine-tuned for instruction tasks with a Reinforcement Learning from Human Feedback (RLHF) training process providing more ChatGPT-like capabilities. To fill in the gap, ChatLLaMA⁶ has been introduced as the first open source implementation of an RLHF process that leverages LLaMA.

ChatLLaMA enables the building of a ChatGPT-style service based on pre-trained LLaMA models. Compared to the original ChatGPT, the training process and single-GPU inference are significantly faster and cheaper owing to the smaller size of the LLaMA architectures. The library also supports all LLaMA model architectures (7B, 13B, 33B, 65B), so users can fine-tune the model according to their preferences for training time and inference performance.

Conclusion

The introduction of Meta’s LLaMA marks another significant milestone in the field of Large Language Models. The paper reports exceptional performance, with LLaMA outperforming GPT-3 on most benchmarks and achieving comparable results with the best models currently available. The introduction of LLaMA, alongside other impressive language models like Gopher, Chinchilla, and PaLM, signals exciting prospects for the future of natural language processing research.

Join my mailing list to get new content as soon as I publish it!

If you enjoy reading stories like these and want to support me as a writer, consider signing up to become a Medium member. It’s $5 a month, giving you unlimited access to Python, Machine Learning and Data science articles. If you sign up using my link, I’ll earn a small commission with no extra cost to you.

References

[1] Meta AI team, “Introducing LLaMA: A foundational, 65-billion-parameter large language model”, https://ai.facebook.com/blog/large-language-model-llama-meta-ai/

[2] Mark Zuckerberg, https://m.facebook.com/story.php?story_fbid=pfbid0tZ4Vt9nT887f1R998hgEP6diLipt8DZKhj8w4QSTygfEPgxvqtfPRJFTGLvYfj9ql&id=4

[3] H. Touvron, T. Lavril, G. Izacard, et al. “LLaMA: Open and Efficient Foundation Language Models”, https://research.facebook.com/file/1574548786327032/LLaMA--Open-and-Efficient-Foundation-Language-Models.pdf

[4] Ashish Vaswani, Noam Shazeer, Niki Parmar, et al. “Attention is all you need”, arXiv:1706.03762

[5] facebookresearch/llama, “LLaMA”, https://github.com/facebookresearch/llama

[6] nebuly-ai/nebullvm, “chatllama”, https://github.com/nebuly-ai/nebullvm/tree/main/apps/accelerate/chatllama

--

--

André Ribeiro
Data Is Awesome

Data scientist professional with a keen interest in Machine learning & Artificial Intelligence. https://www.linkedin.com/in/asantos-ribeiro