Advancements in AI Technology Lead to More Compact and Efficient Models

Published in

ReadyAI.org

5 min readNov 20, 2023

By: Rooz Aliabadi, Ph.D.

In 2023, excitement for artificial intelligence (AI) skyrocketed. This surge in interest followed six months after OpenAI introduced ChatGPT in November 2022, quickly becoming the internet’s most well-known and efficient chatbot. Searches for “artificial intelligence” on Google and other search engines nearly quadrupled during this period. By September 2023, a third of the most recent McKinsey Global Survey participants reported that their organizations had implemented generative AI.

As I look toward the new year, I believe the evolution of generative technology is centered on three key areas:

The compactness of models
The optimization of data usage
The expansion of practical applications

Let’s start with the element of size. In recent years, a general belief in AI research has been that larger models are inherently superior. However, this trend does not apply to large language models (LLMs), whose complexity is quantified by billions or even trillions of parameters. In stark contrast to the trend of computers becoming smaller and more powerful, LLMs have continued growing. GPT-4, an LLM that powers the enhanced version of ChatGPT, required over 16,000 specialized GPU chips for its training, which spanned several weeks and incurred over $100 million. Chipmaker Nvidia recently reported that the prices of inference — using trained models to answer user queries — have now begun to outstrip the costs of preparing these large models when deployed substantially.

As AI models evolve into marketable products, there’s an increasing emphasis on preserving their efficiency while reducing their size and enhancing their speed. One strategy to achieve this involves training a more compact model with a larger volume of data. A notable example is Chinchilla, an LLM created by Google DeepMind in 2022. It surpasses the performance of OpenAI’s GPT-3 despite being only a quarter of its size, thanks to being trained on four times as much data. Another method involves lowering the numerical precision of the model’s parameters. Recent research from the University of Washington demonstrates that it’s feasible to compress a model equivalent in size to Chinchilla onto a single GPU chip without significantly affecting its performance. Importantly, these smaller models are far more cost-effective for later use. Some are efficient enough to operate on personal devices like laptops or smartphones.

AI models, essentially predictive tools, become more adept at accessing larger data sets. However, the emphasis is gradually shifting from quantity to data quality. This shift is particularly relevant in light of the increasing difficulty in sourcing new, high-quality training data. The availability of such data might diminish significantly in the coming years. Relying on model-generated outputs for training subsequent models could lead to a decline in model capabilities, thereby reducing the internet’s value as a source of training data. However, mere data volume isn’t the only factor; determining the optimal mix of training data is more an art than a precise science. There’s a growing trend of training models on diverse data types, including natural language, computer code, images, and even videos, enhancing their range of abilities.

Regarding potential new applications, there’s a notable overhang in AI development, suggesting that AI technology has advanced faster than its practical utilization. The challenge has shifted from demonstrating AI’s potential to understanding and implementing its practical applications. The most significant developments will likely not be in enhancing the models’ inherent quality but in mastering more effective ways to utilize them.

Currently, there are three main methods for utilizing AI models.

The first approach, prompt engineering, involves using the models as they are but providing them with carefully crafted prompts. This technique requires designing specific input phrases or questions to direct the model towards generating the intended outputs.
The second method is to fine-tune a model, enhancing its proficiency in a particular task. This is achieved by subjecting an existing model to additional training with a specialized dataset focused on that task. For example, an LLM could be fine-tuned with medical journal content to respond more effectively to health-related inquiries.
The third strategy involves integrating LLMs into a larger, more sophisticated framework. An LLM functions similarly to an engine; to apply it for a specific purpose, one needs to construct the entire system, or car, around it.

An example of integrating large language models (LLMs) into a broader system is the technique known as retrieval augmented generation. This approach combines an LLM with additional software and a specialized knowledge database on a specific subject, reducing the likelihood of the model generating incorrect information. When asked a question, the system first searches through its database. If it finds relevant information, it then forwards the query and this factual data to the LLM, instructing it to base the response on the provided information. This method of including sources enhances user confidence in the accuracy of the answers. Additionally, it enables the personalization of the LLM, similar to Google’s NotebookLM, which allows users to input their knowledge databases to tailor the model’s outputs.

Despite the growing emphasis on the commercial applications of AI, the quest for artificial general intelligence AGI remains ongoing. Current technologies like large language models and other forms of generative AI could be integral components or steps in this journey. Still, they are unlikely to be the definitive solution. This is not the ultimate neural architecture; we will always find something better. Further advancements and discoveries in neural architectures are still possible, leading to the development of AGI.

This article was written by Rooz Aliabadi, Ph.D. (rooz@readyai.org). Rooz is the CEO (Chief Troublemaker) at ReadyAI.org

To learn more about ReadyAI, visit www.readyai.org or email us at info@readyai.org.

Advancements in AI Technology Lead to More Compact and Efficient Models

Written by ReadyAI.org