Suddenly everywhere: What the heck are large language models? Are they risky? Who benefits from them?

Published in

daios

6 min readOct 11, 2022

Well, folks, here it is, the final interview of this series. Because this has to end sometime, right? And after all this time we’ve learned quite a lot. One might even think, “geez, another interview? What more could there possibly be to learn?” Admittedly, I, too, bore this sentiment, but then I had a conversation with J.D. Zamfirescu-Pereira, assistant professor in the Critical Studies Program at California College of the Arts. As it turns out, there’s still a lot I don’t know. Go figure.

J.D.’s interest in artificial intelligence had been long standing when, in 2019, he decided to start a Ph.D. in Human-Robot Interactions. Then, of course, the pandemic reared its ugly head and, as you can imagine, it became decidedly difficult to observe human interactions. That’s when his focus switched to studying people and models, specifically large language models.

“What is a large language model,” you ask? “And why on earth does it sound so menacing?”

J.D. informed me that large language models are a class of models that seem to exhibit behavior that comes from their scale and not necessarily anything else. They’re trained in the same way as regular or small-sized models, but they behave in ways we don’t expect — they have what J.D. refers to as “emergent behavior.”

At this point, J.D. regales me on the specifics and inner-workings of large language models. To be clear, I understood very little of it. The main breakdown, though, is that large language models use a large amount of data — like, a LARGE amount — in order to simulate human interactions. These models then can be deployed for a large variety of purposes, as opposed to smaller models that are often built and trained for one specific function. You probably have even heard of some of these models. AI research company Open AI deploys models like GPT-3, a model that generates human-like text from a small text input, and DALL-E 2, the model that is putting YouTube visual artists in a panic over it’s ability to generate images from any prompt.

“Fundamentally these are all prediction machines,” J.D. told me. “Large language models usually have hundreds of billions of parameters — several orders of magnitude larger than the previous set of models. Given a set of text, they then try to predict the following text. So, for example, they might be trained on the first half of a Wikipedia page and then have to predict subsequent words on that page. If they’re correct, they get a positive reinforcement signal, if they’re incorrect, they’re given a negative reinforcement signal. Essentially, they’re trained to produce text that simulates human text.”

That all seems simple enough, but it begs the question: what exactly is the application of this? Is society really in need of simulated human publications and interactions? (She wonders, as she contemplates how much easier life would be if her blogs would write themselves…)

“That there is the $64,000 question,” J.D. replied. “Models like these are used for things like chatbots and sentiment analysis for things like tweets.” His concern with this technology, though, is how challenging it could become to determine if an online interaction is real or not. “This is already a bit challenging. On the surface level it can be quite hard to tell apart text generated by these models and regular, human-generated text. In fact, I just read a paper that shows how if you instruct these models to seem explicitly human they will do a better job at being human than people do.” Apparently, there’s now even a subreddit community comprised entirely of bots.

If you’re anything like me, this might be sounding a bit doomsday. A dark future in which being catfished by a computer is an unavoidable aspect of everyday life? No, thank you. And while I have a tendency toward the dramatic, I know there are people out there who are really optimistic about AI systems and their potential. So I asked J.D. whether or not he thinks the field will need more rules and regulations in order to mediate risk.

“I do think there are places where regulation will be helpful. AI has begun to enter a lot of different spheres. For example, people are talking about how using AI can be helpful in determining whether an incarcerated person is eligible for parole, so now we’re seeing a lot of organizations pitching products and saying, ‘hey, here’s a non-biased way to do that.’ It turns out, though, that they’re not that really unbiased. These systems are only as good as their data. Also, it’s really hard to define bias in this context. Often what ends up happening is that someone who didn’t develop a model ends up employing that model for a purpose for which it wasn’t designed without validating whether or not that model is appropriate for said deployment. They’re borrowing technology and reapplying it, and making some pretty big assumptions about the transferability for these models.”

This reminded me of my conversation with Qian Yang and brought back the question, ‘how much should lay people know?’ If this trend persists, and people keep pushing to use AI systems and machine learning models in their lives and businesses, how do we make sure non-technical people have a higher standard of understanding when deploying models?

“One of the big challenges,” J.D. told me, “is that parts are accessible and parts are not.” As it so happens, large language models are frequently open source, meaning anyone can have access to the code. “There’s a company out there called Hugging Face, for example, that gives you like, 5 lines of python code and now you can run one of these models. The trouble with this, though, is that it’s not 5 lines of python code to evaluate whether that model is appropriate in your context. That’s a lot of data collection and curation and that is not accessible. You don’t need a Ph.D. to do it, but you do need time and energy and some level of knowhow with data science to make sure you’re collecting the kind of data you want and then analyzing its performance. If you’re just doing something for yourself, say, tracking every time a car drives by your house, and it misses every green car, whatever. But if you’re deploying it somewhere else where it affects people’s lives in a significant manner you probably are going to want to be more sure than that. Verification is hard work.”

Verification. There’s a familiar term from my interview with Adriano Soares Koshiyama. His company, Holistic AI, works to verify models within companies to catch and account for biases in their systems. Since building verification and trust in a system is so paramount, I asked J.D. if it was possible to automate verification processes themselves within a model. Can it become just another cog in the machine learning system?

The general sentiment in response to this — that’s not a great idea. He compared it to street traffic at the dawn of the automobile age. “Back then people rode horses, and horses are really good at not getting into accidents with other horses. Then people started driving cars, and cars had zero ability to avoid car accidents — it was all on the driver. Roads were filled with people and horses and cars and, ultimately, we as a society decided that it was not okay for cars to be hitting people. Then there was a bunch of lobbying and, of course, we dedicated streets to cars. I wouldn’t make that same trade-off now — literally removing people from the equation — that’s a bad idea.”

Automation be like..

“Ultimately, I think the challenge isn’t necessarily about the data or the software that lets people evaluate models. People think in abstractions. They don’t always know what they want. And, often, even if you get people to agree on an outcome, they might not agree on the path forward. If humans disagree on what the model should be, how could the model possibly deliver?”

And with that, the complexity of AI systems and their issues remains, because, of course, the complexity of humans remains. We are still in the infancy of this new artificial intelligence era, so it will likely take a lot longer to iron out the kinks in this marriage between human and machine. Or perhaps it’s destined to be Ross-and-Rachel-ing forever. For now, I’m ready to take a step back and contemplate all I’ve learned. See you for the final reflection.

Suddenly everywhere: What the heck are large language models? Are they risky? Who benefits from them?

Written by Solveig Neseth