# Least Absolute Shrinkage and Selection Operator: MATLAB, R and Python codes– All you have to do is just preparing data set (very simple, easy and practical)

I release MATLAB, R and Python codes of Least Absolute Shrinkage and Selection Operator (LASSO). They are very easy to use. You prepare data set, and just run the code! Then, LASSO and prediction results for new samples can be obtained. Very simple and easy!

You can buy each code from the URLs below.

#### MATLAB

https://gum.co/czarD
Please download the supplemental zip file (this is free) from the URL below to run the LASSO code.
http://univprofblog.html.xdomain.jp/code/MATLAB_scripts_functions.zip

#### R

https://gum.co/IhBEm
Please download the supplemental zip file (this is free) from the URL below to run the LASSO code.
http://univprofblog.html.xdomain.jp/code/R_scripts_functions.zip

#### Python

https://gum.co/hUzpu
Please download the supplemental zip file (this is free) from the URL below to run the LASSO code.
http://univprofblog.html.xdomain.jp/code/supportingfunctions.zip

### Procedure of LASSO in the MATLAB, R and Python codes

To perform appropriate LASSO, the MATLAB, R and Python codes follow the procedure below, after data set is loaded.

1. Autoscale objective variable (Y) and explanatory variable (X)
Autoscaling means centering and scaling. Mean of each variable becomes zero by subtracting mean of each variable from the variable in centering. Standard deviation of each variable becomes one by dividing standard deviation of each variable from the variable in scaling.

2. Determine candidates of lambda
Lambda is regularization parameter controlling the balance between accuracy and sparseness of LASSO model.
For example, candidates of lambda are 0, 0.001, 0.002, 0.003, …, 0.699, 0.7.

3. Estimate objective variable (Y) with cross-validation (CV) for each lambda candidate
Leave-one-out CV is very famous, but it causes over-fitting when the number of training samples is high. So, 5-fold or 2-fold CV is better. First, training samples are divided into 5 or 2 groups. Second, one group is handled as test samples and model is built with the other group(s). This is repeated 5 or 2 times until every group is handled as test samples. Then, not calculated Y but estimated Y can be obtained.

4. Calculate Root-Mean-Squared Error (RMSE) between actual Y and estimated Y with CV for each lambda candidate

5. Decide the optimal lambda with the minimum RMSE value

6. Construct LASSO model with the optimal lambda

7. Calculate determinant coefficient and RMSE between actual Y and calculated Y (r2C and RMSEC) and determinant coefficient and RMSE between actual Y and estimated Y (r2CV and RMSECV), for the optimal lambda
r2C means the ratio of Y information that the LASSO model can explain.
RMSE means the average of Y errors in the LASSO model.
r2CV means the possible ratio of Y information that the LASSO model can estimate for new samples.
RMSECV means the possible average of Y errors for new samples.
Better LASSO models have higher r2CV values and lower RMSECV values.
Large difference between r2C and r2CV and that between RMSEC and RMSECV mean LASSO model’s overfitting to training samples.

*Caution! r2CV and RMSECV cannot represent true predictability of the LASSO model since it is CV not external validation.

8. Check plots between actual Y and calculated Y, and between actual Y and estimated Y
Outliers of calculated and estimated values can be checked.

9. In prediction, subtract the mean in the autoscalling of X in 1. from X-variables, and then, divide X-variables by the standard deviation in the autoscalling of X in 1., for new samples

10. Estimate Y for new samples, based on the LASSO model in 6.

11. Multiply the standard deviation in the autoscalling of Y in 1. by estimated Y, and then, add the mean in the autoscalling of Y in 1. to estimated Y

### How to perform LASSO?

#### 1. Buy the code and unzip the file

MATLAB: https://gum.co/czarD

Python: https://gum.co/hUzpu

#### 4. Prepare data set. For data format, see the article below.

https://medium.com/@univprofblog1/data-format-for-matlab-r-and-python-codes-of-data-analysis-and-sample-data-set-9b0f845b565a#.3ibrphs4h

#### 5. Run the code!

Estimated values of Y for “data_prediction2.csv” are saved in ”PredictedY2.csv”. Standard regression coefficients are saved in “RegressionCoefficients.csv”. The variables having non-zero standard regression coefficients mean selected variables.

### Required settings

Please see the article below.
https://medium.com/@univprofblog1/settings-for-running-my-matlab-r-and-python-codes-136b9e5637a1#.paer8scqy

### Examples of execution results

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.