Simple Linear Regression with basic R

鄧昱辰
軟體與金融的小小白故事
3 min readJun 4, 2022

This page will show the basic R programming language of modeling Simple Linear Regression, which includes the interprets of coefficients.

Basic concepts with simple linear regression

Use pair data to find out the relationship between variables.

u is the random error, which denotes the part that Y can’t be explained by X.

Simple Linear Regression (SLR)

  • x is explanatory variable, y is dependent variable
  • α is the intercept,β is the slope
  • β = ∂ E(y|x)/∂ x,it means when x change 1 unit, y will change β units
  • α = E(y|x=0)

Population Regression Line

Given E(y|x)=α+βx

The assumption of linear regression model

  1. linear in parameters

2. random sampling

3. sample variation in the explanatory variable

隨機樣本中,X_i 之值不能完全相同

4. exogeneity

給定解釋變數下,誤差項之條件期望值 = 0

5. homoskedasticity

給定解釋變數下,誤差項之條件變異數為常數

Method of Ordinary Least-Squares(OLS)

We set the model as

If we use OLS to estimate α and β ,and we denote \hat α and \hat β as OLS estimator,let residual = \hat u_i

The main goal of OLS is to get the minimum value of residual sum of squares

We can get \hat α, \hat β

Applied R in SLR model

We try to estimate the effect of education years on salary.

step 1. read data

data <- read.csv("/Users/abnerteng/Downloads/wagedata.csv") head(data)

step 2. set up the SLR model

slr <- lm(data = data, wage~educ) 
summary(slr)

step 3. Draw a scatter plot

plot(data$educ, data$wage,      
main = 'Regression for education level and wage',
xlab = 'education level',
ylab = 'wage')
abline(slr, col = 'blue', lwd = 2)
  • the blue line is the slr model

--

--