One-Class Support Vector Machine: R and Python codes– All you have to do is just preparing data set (very simple, easy and practical)

I release R and Python codes of One-Class Support Vector Machine (OCSVM). They are very easy to use. You prepare data set, and just run the code! Then, OCSVM and prediction results for new samples can be obtained. Very simple and easy!

You can buy each code from the URLs below.

R

https://gum.co/nbjri
 Please download the supplemental zip file (this is free) from the URL below to run the OCSVM code.
 http://univprofblog.html.xdomain.jp/code/R_scripts_functions.zip

Python

https://gum.co/oPLZ
 Please download the supplemental zip file (this is free) from the URL below to run the OCSVM code.
 http://univprofblog.html.xdomain.jp/code/supportingfunctions.zip

Procedure of OCSVM in the MATLAB, R and Python codes

To perform appropriate OCSVM, the R and Python codes follow the procedure below, after data set is loaded.

1. Autoscale explanatory variable (X)
 Autoscaling means centering and scaling. Mean of each variable becomes zero by subtracting mean of each variable from the variable in centering. Standard deviation of each variable becomes one by dividing standard deviation of each variable from the variable in scaling.

2. Decide nu
 nu means the ratio of outlier samples in data set. For example, nu is 0.003, based on three-sigma rule.

3. Decide candidates of gamma
 Gamma is the parameter in Gaussian kernel, which is one of the most famous kernel functions.
 For example,

Gamma: 2^-20, 2^-19, …, 2⁹, 2¹⁰.

4. Calculate gram matrix of Gaussian kernel and its variance for each gamma candidate
 If the size of gram matrix is 100×100, for example, variance is calculated for resized 10000×1 vector.

5. Decide the optimal gamma with the maximum variance value
 This means that gram matrix with the optimal gamma has diverse kernel values.

6. Construct OCSVM

7. Estimate whether training samples are outliers or not, based on OCSVM in 6.

8. In prediction, subtract the mean in the autoscalling of X in 1. from X-variables, and then, divide X-variables by the standard deviation in the autoscalling of X in 1., for new samples

9. Estimate whether new samples are outliers or not, based on OCSVM in 6.

How to perform OCSVM?

1. Buy the code and unzip the file

R: https://gum.co/nbjri

Python: https://gum.co/oPLZ

2. Download and unzip the supplemental zip file (this is free)

R: http://univprofblog.html.xdomain.jp/code/R_scripts_functions.zip

Python: http://univprofblog.html.xdomain.jp/code/supportingfunctions.zip

3. Place the supplemental files at the same directory or folder as that of the OCSVM code.

4. Prepare data set. For data format, see the article below.

https://medium.com/@univprofblog1/data-format-for-matlab-r-and-python-codes-of-data-analysis-and-sample-data-set-9b0f845b565a#.3ibrphs4h

For OCSVM, data_prediction.csv is required in addition to data.csv at “Visualization, clustering and data domain estimation”. The data format of data_prediction.csv is the same as that of data.csv. If you cannot prepare this data set, please copy data.csv and rename it as data_prediction.csv.

5. Run the code!

Estimated values for “data.csv” are saved in ”CalculstedY.csv”. Estimated values for “data_prediction.csv” are saved in ”PredictedY.csv”. “TRUE” or “1” means normal data, and “FALSE”or “-1” means outliers.

Required settings

Please see the article below.
 https://medium.com/@univprofblog1/settings-for-running-my-matlab-r-and-python-codes-136b9e5637a1#.paer8scqy

Examples of execution results