ai like im 5: multiple linear regression part 1, dimensionality, and a brief intro into linear algebra(article 11)

ai like im 5

15 min readJan 24, 2024

before i start, a couple things:

out of the loop? — https://medium.com/@ailikeim5/list/ai-like-im-5-in-order-87ef4064afe8
this content is not aimed at 5 year olds but instead simple
although linear regression is machine learning, i would not consider this artificial intelligence. you can skip it if you want!
however, this model lays a foundation for understanding artificial intelligence and deep learning models!

prerequisite knowledge:

a. basic understanding of data, data features, and datasets

b. good understanding of ai and machine learning

c. good understanding of some data assumptions especially linearity and non-linearity

d. good understanding of training and validating!

e. good understanding of models and parameters

f. good understanding of parameter optimization and loss

h. good understanding of simple linear regression, independent and dependent variables, and more!

ai is like a secret handshake, if you do not know the handshake, you will feel left out in this article! read the articles, they are super simple!

you will notice that as the articles go on, the prerequisite list gets longer and longer!

one more thing

if you are a statistic, math, or ai expert, this article is not for you!!!

if you are looking for a more refutable source, here ya go!

math and ai experts when i try to generalize some of the hardest concepts in the world:

so anyways,

so in the last article, we took a dive into regression, simple linear regression, and focused on the interaction between a single independent and single dependent variable!

and our goal was to fit a straight line with those 2 variables!

in lego terms, this meant predicting the price of a lego set based on the number of pieces!

and this linear relationship allowed us to make the price predictions!

we love linear regression!

simple linear regression is amazing, but it is… simple!

we are limited to 2 data features:

one to make the predictions with (independent variable)
the one we are making the predictions about (dependent variable)

what if we have more data features that can aid our predictions!!!

let’s look at a dataset to visualize this!

we have three independent variables we can use.

in order to use all three, we are going to introduce simple linear regression’s complicated counterpart:

multiple linear regression

multiple linear regression is focused on the relationship between multiple independent variables and a single dependent variable
and our goal is still to fit a straight “line” with this relationship

but there has to be a linear relationship between all features!

there is “no limit” to the number of independent variables but we will talk more about this later!

this is when things start to get confusing and a lot of people get lost, because they cannot comprehend a key concept:

dimensionality

visualizing the relationship between a single independent and single dependent variable is two dimensional!

the human brain is amazing at visually comprehending two dimensional relationships!

this should be very easy to understand, your brain should connect the dots!

what about three dimensional relationships?

or in lego terms:

uh oh…

it is much harder to tell if there is a linear relationship!

this visualization is about as useful and understandable for a deaf person with this lady!

and although we have some great tools to represent data visually in three dimensions:

this illustrates in important point:

because our world is only three dimensions, we are limited to visualizing three dimensions!

the best tools for visually representing data beyond three dimensions will, funny enough, also be in 3d dimensions

there’s a problem…

if every independent variable or data feature we introduce, we have to increase the number of dimensions, it will be weally weally hard for us to visualize relationships!

we will often have to work with data relationships beyond the third dimension! (beyond our visual comprehension)

in plain english: for our lego example, there is no way of visually telling if there is a linear relationship between our dependent variable (price) and our 3 independent variables (pieces, year, rating), we are in the 4th dimension!

and we can visualize adding a data feature like:

how are we so supposed to visualize the relationships between 1 million data features!

the truth is: conceptualizing multiple linear regression and ai is really about comprehending higher dimensions beyond our visualization abilities!!!

we need to rewire your brain to think in high dimensional spaces!!!

here is a great video of this (please watch it)!

so let me ask you an important question:

the answer is yes… well kinda!!!

the terminology has to change a little, but the idea will stay conceptually the same!

it’s not that a line can’t exist in multiple dimensions, but we lose important properties in our data by using just a line!
i.e -> a plane or hyperplane is the better tool for representing the relationships
this is a really general statement too and these complicated properties will be covered than this but i still think this is important introduction (don’t get mad at me math folks)

when working with ai, you will often see the words line, plane, and hyperplane and maybe get a line confused…

don’t be confused, they all accomplish the same thing!

the key takeaway: we won’t use a line in 100 dimensions… there is a better tool for the job!

sometimes i will call planes and hyperplane a line, just to make it easier on you
i.e -> speaking in plain english is much easier on your brain!!!

so what does this mean for multiple linear regression?

we are predicting a plane or hyperplane instead of a line!!!
and this plane/hyperplane is going to be straight`!!!

here is an amazing visualization of the contrast:

a plane is more of a surface!!!
even in 4, 5, and a million dimensions, this key concept will stay the same, but fitting a hyperplane is hard to visualize and the equivalent to the sketch of this thief

think about a hyperplane just like this guy’s sketch!!!

although we cannot visualize a hyperplane entirely, we can still create pretty good visualizations of them!!!

the math of multiple linear regression (important to understanding neural networks)

the math of simple linear regression was simple (slope from grade school)

fitting a straight line is very easy
all we have to do is solve for our slope/coefficient!

the math of multiple linear regression is going to be a little more complicated! (fitting a plane/hyperplane is a bit harder)

with a 3rd dimension in mathematics fitting a plane looks like this!!!

you have probably never solved this one, and that is okay!!!

and for your math nerds out there, yes i know slope technically doesn’t exist!!! it is a generalization!!! and why it’s wrong is impossible to understand for the average person!

mathematics is important and so amazing, but this is how i feel about math experts that refuse to generalize really hard concepts!!!

it’s okay to lose some information with a generalization, understanding the concept is a lot more important!

beyond the 3rd dimension is… interesting:

we can still use mathematics in higher dimensions, but the notation and everything gets complicated!

we will take a deep dive into linear algebra in the future, it allows us to do math in these high dimensional spaces!

but, for now, let’s just look at the more basic equation for multiple linear regression:

notice, it is still pretty similar, but we just have more coefficients!
and our error term has the same properties mentioned in simple linear regression!

let’s say we had a model with 1001 data features!

we can’t visualize the hyperplane this creates
but train your brain to convert 1000 dimensions to think about it like a line in 2 dimension or a plane in 3
remember -> they all accomplish the same goal just in a different dimension!!!

we can mathify this a little and make it a little more pretty and simple for some and ugly for others!

remember, understand the plain english of everything, the machine will do the math for us!!!

so what do these coefficients actually mean

with simple linear regression, it was very simple to understand the coefficient:

what about for multiple linear regression:

the coefficients mean the same thing, there are just multiple!
constant, constant, constant… remember that!!!
we can do a lot of things with the coefficients and we will talk more about this later on, but just understand that they are measuring the change in the prediction!

the parameters of multiple linear regression!

recall, models are like a cake

a cake takes:

great ingredients
great recipe
and great steps for that recipe!

a great machine learning/ai model takes

great data
great model
and great parameters for that model

if the parameters of simple linear regression were a bias and coefficient/slope!

then he parameters of multiple linear regression were a bias and coefficient/slopes!

the parameters are pretty simple!!!

measuring the predictions

recall that in order to measure the error of our predictions in 2 dimensions, we would look at the distance between the actual price and the prediction

or in lego terms, we can look at two points in the graph and see how good of our predicted price was!!!

we called this distance a residual and could better represented by the following formulas!!!

the coolest part about the residual: the calculation remains true regardless of the dimension!

but visualizing is a bit harder though!

this distance is purely a measure of the actual dependent variable and its prediction
i.e -> it would just be height in this visualization!!!
this is a great visualization, but there are a lot of points under the plane we cannot see, and i am not going to draw all the residuals

imagine how hard visualization is beyond the 3rd dimension!

but if there is 1000 dimensions or data features, there’s a prediction point in that 1000 dimension space
and that residual is still just the distance from the prediction plane!

measuring overall performance and finding optimal parameters!

in the last article, i talked about sum of square residuals (ssr) and that it was the key to finding the optimal parameters of our simple linear regression model

so i want to ask you a pretty simple and easy question: if residuals are mathematically the same idea even in higher dimensions:

can we use the sum of square residuals to measure the performance of our multiple regression model?

yes, we can use ssr regardless of the number of dimensions

it remains consistent as we increase the number of features in our models
the calculation is the exact same!!!
meaning that we can use all the statistical theory and a lot of the math behind it to make everything easier!

the bad news is:

the optimization of our loss function and minimizing it behaves much different in higher dimensions!
in plain english: obtaining the optimal parameters is much harder than simpler linear regression!

a visual intro to linear algebra!

linear algebra is crucial to making sure our ai’s don’t even up like this:

so what does our equation look like in linear algebra?

we now have introduce a two complex mathematical objects: a vector and a matrix!

i’m sure you feel like this after looking at this!

let’s start at the beginning in lego terms:

a single value in linear algebra is like a single block:

and so if 1 data point is just a single value, we can think of our data as just a bunch of lego blocks!!!

and we can visualize our dataset like this!

data, like lego blocks, comes in all shapes and sizes!

so what’s a vector?

a collection of single values
but they are arranged in a specific order!!!

think of a vector like stacking the blocks in 1 direction!

vectors have a lot of important properties

here are two important ones to understand!

we have a special notation for representing vectors in math!

and most importantly, we will turn a single data feature or a single variable into a vector !!!

we will do a lot of important math with these data vectors!
our name is just a label, we are not interested in it!!!

lets look at the three vectors in our original equation:

let’s look at our y or independent variable!!! (the acutal prices from our dataset)

let’s look at the parameter vector (what we are going to be solving for)

1 parameter per dimension!!!!

and let’s look at error vector, explaining how bad are predictions really are and more!!!

hopefully this gives you better insight into how we can approach multiple linear regression in math that is much better for higher dimensions

let’s look at the next component in linear algebra and our equation:

a matrix!

i like to view a matrix as a collection of vectors!
matrixes have two things: a row and column
i.e -> it is a two dimensional!
so in lego terms this means

and so you will often see matrixes in this form!

but they will have values instead of legos!

that means we can turn our multiple independent variables or the rest of our dataset into a matrix!!!

so what does this look like in our original equation!!!

matrixes are even more complex than vectors!
i cannot wait to cover them in depth!!!
but for now, just understand they are perfect for 2 dimensional data like representing multiple independent variables!!!

now that you hopefully get the gist of linear algebra, we can get into why we love it!

can i use the the linear algebra formula for simple linear regression?

of course you can!!!

and it looks like this!

because the key laws and theory of linear algebra can be extended to as many dimensions as we want, they also work in the really simple one! (2d)!

solving for the optimal parameters

we can use these vectors and matrixes, and the theory behind linear algebra, calculus, and more to obtain a closed form solution
i.e -> all that math wumbo jumbo you just saw allows us to solve for the optimal parameters!

and that formula looks like!

here is the full proof for this, as you can see, this is one of your alternatives to these articles!

don’t know how to do the math, that’s okay a machine will do it for you and will teach you in the future!

but here’s how to visually comprehend it!

the transpose operation:

this is a complex idea but it can best visualized as!

the inverse operation:

this one is a bit complex

so for now, just think of it as a uno reverse card!

linear algebra is complicated, it is hard and difficult, sometimes it has you acting like this dude

but once things start to click, you will end up like this dude:

the math is important, yes, but the most important thing is that you understand

once again, we can find the amazing parameters of our multiple linear regression model without making a single prediction!
and although this process does involve complex linear algebra and still minimizing the measurement of our predictions that didn’t happen (oh the inception!!!)

in plain english: this means our model will start out smart and there is a instant progression from dumb to smart

so even if we, hypothetically, had a million data features, we could find amazing parameters
this is more of a hypothetical because as we can increase the number of features, challenges arise and our computational resources are limited

you can understand one of the key reasons we love linear regression

there is a fast, quick, and reliable way of finding the optimal parameters!

remember the machine will do all the math for us!!!

one more thing:

this method is called also ordinary least squares (ols)

i know, this is very confusing…
why have the same name for an entirely different process and different complexities.

there is probably a boring mathematical and statistical answer

but the answer i like is both of them have the same goal and obtain it the same way

the goal: find the parameters that minimize the sum of square residuals!!!

there is going to be a part 2 to this article covering transforming non-linear relationships, assumptions, and more!

but anyways,

this is a really interesting article because i was originally going to introduce linear algebra in the future because it can be weally weally hard for a lot of people, but i realized that this is the perfect starting point. getting to visualize the practical applications of linear algebra before you learn it, makes everything click so much easier in the future. i promise you, if you can see it in lego or data form now, the math will so much easier for you!!! i wish i saw it like this first.

if you don’t enjoy mathematics, machine learning and ai is learning how visualize and rewire your brain to think in these high dimensional spaces!

my human moment of the day goes to a very special video in my heart and what i consider the greatest piece of content to ever touch the internet! a hour long video about the fast pass system at disney!

this is peak humanity, this is what i want to show the aliens when they come to invade us

have an beautiful day and