# Least Absolute Shrinkage and Selection Operator: MATLAB, R and Python codes– All you have to do is just preparing data set (very simple, easy and practical)

I release MATLAB, R and Python codes of Least Absolute Shrinkage and Selection Operator (LASSO). They are very easy to use. You prepare data set, and just run the code! Then, LASSO and prediction results for new samples can be obtained. Very simple and easy!

You can buy each code from the URLs below.

#### MATLAB

https://gum.co/czarD

Please download the supplemental zip file (this is free) from the URL below to run the LASSO code.

http://univprofblog.html.xdomain.jp/code/MATLAB_scripts_functions.zip

#### R

https://gum.co/IhBEm

Please download the supplemental zip file (this is free) from the URL below to run the LASSO code.

http://univprofblog.html.xdomain.jp/code/R_scripts_functions.zip

#### Python

https://gum.co/hUzpu

Please download the supplemental zip file (this is free) from the URL below to run the LASSO code.

http://univprofblog.html.xdomain.jp/code/supportingfunctions.zip

### Procedure of LASSO in the MATLAB, R and Python codes

To perform appropriate LASSO, the MATLAB, R and Python codes follow the procedure below, after data set is loaded.

**1. Autoscale objective variable (Y) and explanatory variable (X)**

Autoscaling means centering and scaling. Mean of each variable becomes zero by subtracting mean of each variable from the variable in centering. Standard deviation of each variable becomes one by dividing standard deviation of each variable from the variable in scaling.

**2. Determine candidates of lambda**

Lambda is regularization parameter controlling the balance between accuracy and sparseness of LASSO model.

For example, candidates of lambda are 0, 0.001, 0.002, 0.003, …, 0.699, 0.7.

**3. Estimate objective variable (Y) with cross-validation (CV) for each lambda candidate**

Leave-one-out CV is very famous, but it causes over-fitting when the number of training samples is high. So, 5-fold or 2-fold CV is better. First, training samples are divided into 5 or 2 groups. Second, one group is handled as test samples and model is built with the other group(s). This is repeated 5 or 2 times until every group is handled as test samples. Then, not calculated Y but estimated Y can be obtained.

**4. Calculate Root-Mean-Squared Error (RMSE) between actual Y and estimated Y with CV for each lambda candidate**

**5. Decide the optimal lambda with the minimum RMSE value**

**6. Construct LASSO model with the optimal lambda**

**7. Calculate determinant coefficient and RMSE between actual Y and calculated Y (r2C and RMSEC) and determinant coefficient and RMSE between actual Y and estimated Y (r2CV and RMSECV), for the optimal lambda**

r2C means the ratio of Y information that the LASSO model can explain.

RMSE means the average of Y errors in the LASSO model.

r2CV means the possible ratio of Y information that the LASSO model can estimate for new samples.

RMSECV means the possible average of Y errors for new samples.

Better LASSO models have higher r2CV values and lower RMSECV values.

Large difference between r2C and r2CV and that between RMSEC and RMSECV mean LASSO model’s overfitting to training samples.

*Caution! r2CV and RMSECV cannot represent true predictability of the LASSO model since it is CV not external validation.

**8. Check plots between actual Y and calculated Y, and between actual Y and estimated Y**

Outliers of calculated and estimated values can be checked.

**9. In prediction, subtract the mean in the autoscalling of X in 1. from X-variables, and then, divide X-variables by the standard deviation in the autoscalling of X in 1., for new samples**

**10. Estimate Y for new samples, based on the LASSO model in 6.**

**11. Multiply the standard deviation in the autoscalling of Y in 1. by estimated Y, and then, add the mean in the autoscalling of Y in 1. to estimated Y**

### How to perform LASSO?

#### 1. Buy the code and unzip the file

**MATLAB**: https://gum.co/czarD

**Python**: https://gum.co/hUzpu

#### 2. Download and unzip the supplemental zip file (this is free)

**MATLAB**: http://univprofblog.html.xdomain.jp/code/MATLAB_scripts_functions.zip

**R**: http://univprofblog.html.xdomain.jp/code/R_scripts_functions.zip

**Python**: http://univprofblog.html.xdomain.jp/code/supportingfunctions.zip

#### 3. Place the supplemental files at the same directory or folder as that of the LASSO code.

#### 4. Prepare data set. For data format, see the article below.

#### 5. Run the code!

Estimated values of Y for “data_prediction2.csv” are saved in ”PredictedY2.csv”. Standard regression coefficients are saved in “RegressionCoefficients.csv”. The variables having non-zero standard regression coefficients mean selected variables.

### Required settings

Please see the article below.

https://medium.com/@univprofblog1/settings-for-running-my-matlab-r-and-python-codes-136b9e5637a1#.paer8scqy