This week’s reading is on Linear Regression which is taken from An Introduction to Statistical Learning by G. James and I am going to focus on simple linear regression in my discussion (SLR). Linear regression is a tool used in many careers such as mathematics, engineering, science, health care and machine learning and seeks out to predict the quantitative response. We as students in the Human Systems Engineering program are using it for statistics or data analytics and have been exposed to it in algebra. Most of us know the regression equation as y=mx+b which will show both the slope (average change in y) and the intercept (y when x=0). Simple linear regression is looking for a relationship between the dependent variable on the independent variable which shows a straight line and is considered the prediction method. Simple linear regression is illustrated by the below graph.
Fitting the data may be done by using the least squares line which estimates the coefficients. In real world data, we are looking for a true relationship of the coefficient estimates to create a least squares line. With the coefficients in linear regression unknown we are creating a population line. In the R program, we can use plot () and lm () to create plots and a linear regression line of variables; and abline() will do a best fit line for data. By using various arguments in these R commands, we can enhance our plots and lines (colors, scatter plots). We may be practicing some of these in our R Practicums this semester.
James (2013) referred to standard error in regression to tell us the average amount of the estimate that differs from the actual value. Standard errors are used to calculate the confidence interval (CI). 95% confidence interval shows the upper and lower limits which is computed from the data. Here we can then perform our hypothesis testing. When discussing the hypothesis, we are looking for a relationship between x and y. We are seeking to test the null hypothesis (H0) which confirms the slope is equal to zero or that there is no relationship between x and y. The alternative hypothesis (H1) shows a relationship between x and y.
As we know simple linear regression is important to research and shows us the relationship between two variables x and y or the dependent and independent variables. We need to know this in research to predict the outcome variable. To me linear regression does not seem so simple since math and statistics are not my best subjects, however, it is a necessary tool that has been used many years, is currently used and will continue to be used in the future.
James, G. (2013). An Introduction to Statistical Learning: with Applications in R. Springer Texts in Statistics. DOI: 10.1007/978–1–4614–7138–7_3.