Simple Linear Regression with basic R
This page will show the basic R programming language of modeling Simple Linear Regression, which includes the interprets of coefficients.
Basic concepts with simple linear regression
Use pair data to find out the relationship between variables.
u is the random error, which denotes the part that Y can’t be explained by X.
Simple Linear Regression (SLR)
- x is explanatory variable, y is dependent variable
- α is the intercept,β is the slope
- β = ∂ E(y|x)/∂ x,it means when x change 1 unit, y will change β units
- α = E(y|x=0)
Population Regression Line
Given E(y|x)=α+βx
The assumption of linear regression model
- linear in parameters
2. random sampling
3. sample variation in the explanatory variable
隨機樣本中,X_i 之值不能完全相同
4. exogeneity
給定解釋變數下,誤差項之條件期望值 = 0
5. homoskedasticity
給定解釋變數下,誤差項之條件變異數為常數
Method of Ordinary Least-Squares(OLS)
We set the model as
If we use OLS to estimate α and β ,and we denote \hat α and \hat β as OLS estimator,let residual = \hat u_i
The main goal of OLS is to get the minimum value of residual sum of squares
We can get \hat α, \hat β
Applied R in SLR model
We try to estimate the effect of education years on salary.
step 1. read data
data <- read.csv("/Users/abnerteng/Downloads/wagedata.csv") head(data)
step 2. set up the SLR model
slr <- lm(data = data, wage~educ)
summary(slr)
step 3. Draw a scatter plot
plot(data$educ, data$wage,
main = 'Regression for education level and wage',
xlab = 'education level',
ylab = 'wage')
abline(slr, col = 'blue', lwd = 2)
- the blue line is the slr model