A beginner’s guide to all things AI

Marina Sánchez Torrón
Unbabel Community
Published in
7 min readNov 22, 2023
A futuristic image with interconnected blue dots.

In this article, we’ll take a look at artificial intelligence, machine learning, deep learning and Large Language Models (LLMs) and discuss what they mean for translation

What is artificial intelligence and where has it come from?

You’ve no doubt heard a lot of talk about artificial intelligence (AI). AI is an umbrella term used to refer to intelligent tasks performed by machines. It is not a recent thing: it dates back to the 1950s when it started as an academic discipline. In its early days, AI was synonymous with rule-based systems. You can think of rule-based approaches as a means to model problems that can be posited in “if X then Y” statements. Chess and solitaire games are examples where such rule-based approaches were initially used with decent results.

Time passed, and with greater computing power and available data, came greater possibilities. Nowadays, AI is used mostly to refer to systems that learn actions from data without the need for a structured set of rules. These systems are far more flexible and powerful, and better simulate the human decision-making processes.

What is machine learning?

Machine learning (ML) is a subdomain of artificial intelligence that comprises systems that learn from data. High-quality data is therefore the foundation of any good ML system. The more relevant, accurate and representative of the phenomenon being modeled, the better the ML modeling.

Imagine, for example, an insurance company that wants to predict a house’s flood risk to set fair flood insurance prices, or even decide on not offering insurance. They can use an ML approach for this. Let’s say the company uses the variable “sea level” to try and predict the risk of flooding. They collect historical data on sea levels and whether houses at those levels have flooded in the past 5 years. The resulting dataset can look like this (each row is an observation for a different house).


+----------------------------+--------------------------------------------+
| Average sea level (metres) | Has the house flooded in the past 5 years? |
+----------------------------+--------------------------------------------+
| 30 | No |
| 11 | Yes |
| ... | ... |
+----------------------------+--------------------------------------------+

In the above table there are two components: an input (average sea level) and an output (whether the house has flooded). These known input-output pairs can then be used to train a classification system that predicts the flood risk based on sea level. With the prediction, the insurance company can provide a budget, or even decide not to insure a homeowner altogether.

What are neural networks?

Neural networks are a type of ML model in which variables are extracted and learned from data, setting them apart from other ML approaches where variables have to be defined beforehand.

Neural approaches took off around 2016, with the availability of GPUs (Graphics Processing Units), a very powerful piece of hardware that allows neural network models to be trained relatively quickly and cheaply.

Some problems lend themselves better to being solved with neural networks, rather than with other ML techniques. For example, imagine you want to train a system to distinguish whether a picture that you show to it is of a dog or a cat. Using an ML approach like the one to assess the flood risk described above wouldn’t be a good idea. You’d need to explicitly define what makes a cat a cat and a dog a dog — in itself very challenging, as both cats and dogs have ears, eyes, paws, a tail, etc. — and extract that information from pictures, an even more challenging feat. You would then need to associate that extracted information (input variables) to an output (cat or dog). If attempting to solve a harder problem, like face recognition, the task would become impossible.

Instead, you’re better off using a neural approach: you can feed labeled images of cats and dogs to a neural network, and these images will be converted to internal numerical representations. The neural network will learn to associate these representations with the corresponding labels (cat or dog) through a training process and will be able to identify what’s a dog and what’s a cat. Again, as with all ML approaches, high-quality data is crucial for the system to be accurate.

What is deep learning?

The term deep learning (DL) is used to refer to neural networks with many hidden layers. Think of a layer as a set of thousands of units (neurons), each processing inputs and issuing outputs. The outputs of one layer serve as inputs to the next, and so on. Online machine translation (MT) applications are examples of DL systems. In order for such systems to accurately translate a new sentence from one language to another, it is necessary to train them with millions of source-target sentence pairs. As the system processes this data, it acquires translation capabilities.

When you train a neural network of, say, ten layers, each with thousands of units, you get millions or even billions of parameters. Parameters can be seen as internal configurations that “store” the knowledge learned during training and enable the DL models to do all the tasks they do. The more hidden layers, the more parameters they have, and the more complex the tasks they can do. Hidden layers are called “hidden” because at the moment, in a training instance, we can only interpret what goes in and what comes out, but not what happens in between. There is a lot of research (one example) focusing on uncovering the specific details of how learning takes place within DL systems.

What are Large Language Models (LLMs)?

Large Language Models (LLMs) are a type of DL model that can generate text and do tasks such as translation, error analysis, and summarization, to name a few. They are trained on huge amounts of data from the internet and have billions of parameters. Because they’re large and deep, they can be trained on a wide variety of tasks and have the flexibility to solve them. LLMs are also able to use context. “Context” encompasses the relevant information and dialogue that precedes a specific point in a conversation, like the ones you’ve likely already had with OpenAI’s ChatGPT. This capability helps the models produce content that is more meaningful to the user’s needs.

ChatGPT is a closed-source model, which means the source code is owned by its developer, OpenAI. At the moment, open-source models don’t exhibit the full range of capabilities that a model like ChatGPT does, but there’s a lot of research into open models. You can test and rate many open-source and closed-source models in ChatArena.

When ChatGPT launched in November 2022, it gained immense popularity very quickly. With the right prompt — the text you enter in the text box to ask it something — you can get it to respond coherently to a lot of questions. It can give you recipes, code chunks, translations, summaries, poems… You can engage in conversation with it on any topic. Long gone are the mechanical responses of the original chatbot/psychologist Eliza.

If you’ve more or less used ChatGPT since it was introduced, you’ve probably found the sweet spot that maximizes your chance of getting useful responses. Some of the learnings along the way can look like this:

  • The more specific you are when prompting, the better (kind of like when you ask a person a question)
  • Bad grammar and spelling don’t seem to affect the quality of the answers.
  • It tends to be wordy and repetitive so you include instructions in your prompt to avoid that.
  • It can take some back and forth for it to reach an answer that you’re satisfied with. Sometimes, giving up altogether or starting from scratch is a better strategy.
  • It can provide a good starting point for doing a task yourself, but it may not be helpful for doing the task itself.

The emergence of LLMs in general, and ChatGPT in particular, has lowered the technology entry barrier to many users. A technical background or coding experience is not necessary to leverage all the capabilities they offer, which are bigger than anything we’ve seen before in natural language applications. But LLMs are not infallible. If you’ve been using ChatGPT more or less frequently, you’ve probably noticed that it always has an answer for everything. Of course, some are going to be inaccurate. Provided you stay alert, you can easily see when it gets things wrong in a topic you’re familiar with. Even so, it’s easy to lower your guard when the text they produce sounds so plausible.

A critical approach to using ChatGPT-generated text is therefore crucial. You don’t want to end up being misled like the lawyer who presented a brief full of cases fabricated by ChatGPT.

In fact, LLMs have no concept of what is real or what is not, and by design are prone to hallucinate, that is, they make stuff up. ChatGPT has guardrails that prevent it from issuing biased or inaccurate content, but it still produces such responses. For an example of biased content, see how in the below screenshot, the translation of “doctor” provided in the first instance is that of a male doctor. (Spanish requires gender markers in the translation of “doctor”). After providing it with more context, the translation was rectified:

Example of biased ChatGPT translation: “doctor” in English was translated as “doctor” (male) in Spanish. When asked to translate as “doctor” (female), it rectified.

Now for an example of inaccurate content. See below ChatGPT apologizing for providing an initial incorrect answer only to provide another incorrect answer (some other times it just doubles down on the initial incorrect answer):

When asked what’s the country between Spain and Portugal, ChatGPT’s answer is “Andorra”. When pointing out this is incorrect, it apologizes and says it’s “France”

If you’re using ChatGPT for any purpose, having good old Google searches as a backup plan is always a good idea.

What does the future hold?

Who knows! A quote of unknown origin states that making predictions is hard, especially about the future. Open-source LLMs capabilities will no doubt improve, as will the technology that powers them. Let’s buckle up.

--

--