Building Intuition: What’s a Model?

TL;DR: Data + Architecture = Model

Yujian Tang
Plain Simple Software
2 min readSep 4, 2023

--

If you get around AI/ML people for long enough, eventually you’ll hear something about a model. They’re not talking about the kind of model blowing clouds on the runway, but the kind that runs in the cloud. I tried really hard to make that sound clever so I hope you appreciate it.

A machine learning model is an umbrella term that refers to the thing that is doing the learning from the data. It could be a large language model (LLM) like GPT-4, an algorithm like support vector machines (SVM), a convolutional neural network (ConvNet or CNN) like AlexNet, or any other technique that can find algorithms in data.

For most of what I’m going to be talking about in the Building Intuition series, a machine learning model is a neural network. This includes large language models. There are many phases in a model’s lifecycle including: training, deploying, serving, inference, and retraining. These can be split into training (training, and retraining), and production (deploying, serving, inference).

The model part refers to both the untrained architecture and the architecture after it has been trained. “Architecture” describes the way that the perceptrons in the neural network are linked together. As the name suggests, there are many ways to link networks of neurons together, hence many architecture choices.

Once an architecture is chosen, a model is then trained. It is either given a set of data and target outcomes (supervised learning), or just a set of data and some problem to solve (unsupervised learning).

So, this means there are two things that affect what a model learns — the data its trained on, and the architecture you choose.

--

--