Learning and performance in the world of generative AI

Has AI learned how to complete tasks well, or can it simply perform them?

MIT Open Learning

Published in

MIT Open Learning

5 min readJul 8, 2024

A person wearing a yellow sweater writes on a notebook. — Photo: iStock

By MIT Horizon

Generative AI, in the short time that it has been available, continues to impress. Ask it to write a sonnet about the planet Venus or explain the significance of sorghum to the global economy, and you get a coherent version of what you asked for. This kind of output, if created by a human, would leave you impressed by how much skill and knowledge the person has (regarding poetry, astronomy, economics, agriculture, and persuasive writing). But do we think this output means that AI has learned how to do these things well, or simply that it can perform? While this may be a philosophical distinction, it reflects a critical aspect of education and training. To unpack it, let’s start by defining our terms.‍

Performance

Performance is observable. It refers to the execution of a task or display of a skill in a specific context. For example, when AI generates something a user requests or when a student correctly answers a question on a test, these are instances of performance. While these can be impressive displays, they don’t guarantee what will happen next time or what would happen under different circumstances. To get more confident that this performance is repeatable and valid, we would need to ask a variety of different questions, in different ways and at different times, such as on achievement tests. Using these measures of performance, we can begin to create an assessment of what we are usually more interested in: learning.‍

Learning

Learning is a long-term change in knowledge or skill that is retained over time and can be flexibly applied to new situations. Learning implies the ability to use acquired knowledge to solve new problems, understand related concepts, and adapt to varying contexts. It refers to the process of developing a deep understanding and capability rather than just reproducing a correct response at a given moment. And it can unlock a certain level of performance that would be impossible without truly having mastered the material, even if that learning is not directly observable. For example, when we watch an expert guitarist improvise a new masterpiece, we feel confident that it reflects an underlying expertise that has been developed over many years.‍

Boosting performance, or improving learning?

Interestingly, techniques that can help people perform better are not necessarily good for learning. We all likely have experienced what happens when we cram for an exam. Generally speaking, if you study the right information right before you need to repeat it on a test, you will be able to do so. Similarly, if you do very focused practice on very similar problems, you’ll do well when you encounter that exact kind of problem on the assessment. What happens if you need to remember that information, say, a week later, or if you need to solve a problem that looks a bit different (even if it uses the same process)? It can be like it was never learned at all! That is because, to remember information over a longer periodrequires learning it in more robust ways.‍

For example, as we discussed in an earlier post, learning happens better when it is spaced out and when the information is processed more actively. Interestingly, many of the techniques that facilitate learning can feel harder and more onerous, partly because they don’t help one feel good about one’s performance while studying. For that reason, these factors that improve learning have sometimes been dubbed “desirable difficulties” — things that make it harder to learn initially, but which ultimately produce a positive effect.‍

Performance and generative AI

When we request something from AI, we are primarily interested in its performance. We want it to generate text or solve problems for us, and we aren’t usually all that interested in probing how well it has truly understood that material. But how can we evaluate its performance? While we can be sure that the AI can generate that essay on sorghum and the economy, how can we tell whether it did a good job? In many cases, it requires a fair amount of learning to be able to gauge whether an AI’s output shows good performance, poor performance, or is just plain wrong. You need to have some amount of expertise to be able to evaluate its arguments.‍

What if you are not an expert in the domain you’ve asked for information on? We can get in trouble using AI systems, because generative AI tends to produce results that are confident. Our brains are predisposed to weigh how confident people are in our evaluation of information; someone who responds with authority is more likely to be seen as correct than someone who hems and haws about their conclusions. Generative AI’s responses can seem confident and authoritative, so it is important to remember to build in ways to validate the responses, whether checking other sources, asking other AI systems, or seeking out the opinion of an actual expert. And, of course, stop and do a bit of thinking yourself to see if the output really makes sense at all (as in this exchange posted by Marc-Oliver Gewaltig on X).

Conclusion

The distinction between learning and performance has always been important, but its significance has heightened with the advent of Generative AI. Tasks that were once considered markers of learning can now be swiftly accomplished with a few keystrokes. This seamless augmentation of our capabilities can lead us to overlook the intrinsic value of genuine learning. While AI can excel at generating responses or solving problems, true mastery and novelty will come from deeper understanding and adaptability that humans can excel at. In fact, the pursuit of expertise becomes not just a goal but a necessity in navigating a world in which AI-generated content is only going to become more prevalent.

Originally published at https://horizon.mit.edu.