Has Generative AI Already Peaked?

The computational limitations of training bigger AI models — Computerphile

Rohan Kotwani
Lazy by Design
5 min readMay 31, 2024

--

https://youtu.be/dDUC-LqVrPU

Achieving General Intelligence?

Generative AI can be used to create new sentences from images. The goal is to understand images by analyzing pairs of images and text, allowing for the model to distill the essence of an image into language. The argument is that with enough images and a sufficiently large network, we will eventually achieve general intelligence or an extremely effective AI that works across all domains.

In the tech sector, particularly among big tech companies, there’s a prevalent belief that by continuously adding more data or creating larger models, or a combination of both, we can move beyond simple tasks like recognizing cats to achieving much more complex capabilities.

By showing enough examples of cats and dogs, the presence of an elephant is not implied.

Using the paper, # No “Zero-Shot” Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance, as an example, there is a notion that I performance will continuously improve with more data and models.

However, achieving general zero-shot performance on new tasks would require an impractically vast amount of data. However, this is just one paper and that results may vary depending on resources like GPU power.

Suppose we have an image, a big vision transformer, and a big text encoder, similar to those in large language models, which process text strings. These components share an embedded space, a numerical representation of the meaning in the image and text. They are trained on numerous images so that the same image and its descriptive text produce matching outputs in this space.

This system can be used for various tasks, such as classification and image recall. Streaming services like Spotify or Netflix use these kinds of recommender systems to suggest content based on what you’ve watched and other people have watched. The results may vary but the program is impressive given its task.

The paper discusses the limitations of applying downstream tasks like classification and recommendations to difficult problems without massive amounts of data. It argues that effective application of these tasks on complex issues, beyond simple ones like identifying cats and dogs, requires substantial data support.

Limitations of Generative AI Models

Generative AI models have difficulty in handling specific and complex tasks like identifying subspecies of trees or making difficult medical diagnoses. While generative AI is effective for broad problems, it may not be suitable for more nuanced tasks due to insufficient data.

The paper defines around 4,000 core concepts, ranging from simple to complex, and examines the prevalence of these concepts in datasets. The paper tests how well AI models perform on tasks like zero-shot classification and recommender systems, plotting performance against the amount of data available for each concept. There is a graph to illustrate the relationship between the number of training examples and the effectiveness of the AI model in handling specific concepts.

It seems that performance is on the Y-axis and the number of samples is on the X-axis.

The ideal scenario for an all-powerful AI is for its performance to improve rapidly. This is the kind of AI explosion argument that basically says we’re on the cusp of something that’s about to happen, whatever that may be, where the scale is going to be such that this can just do anything.

Then there’s the perhaps slightly more reasonable, shall we say, pragmatic interpretation, which is like, let’s call it balanced, right? But there’s a sort of linear movement, The idea is to add many examples to improve performance. By continuously adding examples, performance will keep getting better. The goal is to reach a point where the system can accurately identify any image under any circumstance.

One goal of generative AI is to create large language models that can write accurately on various topics and generate photorealistic images from prompts with minimal coercion.

There might also be limitations of current machine learning model architecture, where despite adding more examples and increasing model sizes, we may soon hit a performance plateau. This plateau would imply diminishing returns on investment, as training costs millions of dollars.

New strategies or methods, beyond just adding more data, are needed for significant improvements.

There might also be an issue of uneven data distribution, such as the overrepresentation of certain classes like cats. This is assuming the data are collected from the internet. There is certainly an underrepresentation of specific plants or trees in datasets. Often images are simply labelled as “tree.”

So there might be a limitation of a model’s ability to recognize or generate less common items. Models perform better with more common queries, like identifying an animal or drawing a castle, because these are well-represented in the training data. However, for more obscure items, such as a specific type of cat, tree, or a rare video game artifact, the models’ performance declines due to the lack of representation in the training set. This issue is observed in both image generation and large language models.

For example, when you talk to something like ChatGPT and ask it about an important topic from physics, it usually gives a pretty good explanation. But for other topics it may not. Simply collecting more data is inefficient and may not improve performance on these hard tasks. Instead, companies with more resources might improve models using better quality data and human feedback. It would be curious whether future versions of AI, like ChatGPT-7 or 8, will show significant improvements over current versions.

--

--

Rohan Kotwani
Lazy by Design

My goal is to share a collection of thoughts, ideas, and possibilities from high quality artists and content producers.