Gaussian Mixture Model: R and Python codes– All you have to do is just preparing data set (very simple, easy and practical)

I release R and Python codes of Gaussian Mixture Model (GMM). They are very easy to use. You prepare data set, and just run the code! Then, GMM clustering can be performed. Very simple and easy!

You can buy each code from the URLs below.

R

https://gum.co/pBJhf
 Please download the supplemental zip file (this is free) from the URL below to run the GMM code.
 http://univprofblog.html.xdomain.jp/code/R_scripts_functions.zip

Python

https://gum.co/muEXh
 Please download the supplemental zip file (this is free) from the URL below to run the GMM code.
 http://univprofblog.html.xdomain.jp/code/supportingfunctions.zip

Procedure of GMM in the MATLAB, R and Python codes

To perform appropriate GMM, the R and Python codes follow the procedure below, after data set is loaded.

1. Autoscale each variable (if necessary)
 Autoscaling means centering and scaling. Mean of each variable becomes zero by subtracting mean of each variable from the variable in centering. Standard deviation of each variable becomes one by dividing standard deviation of each variable from the variable in scaling.

2. Decide the number of clusters or Gaussian distributions

3. Decide constraints on the covariance matrix
 Constraints are zero covariance, constant variance and so on. You can decide the combination of the number of clusters of Gaussian distributions and constraints on the covariance matrix by changing the number and constraints and minimizing Bayesian Information Criterion (BIC).

4. Run GMM

5. Visualize clustering result
 Data visualization is performed by PCA, for example. It is easy to see clusters by changing colors for different clusters in scatter plot.

How to perform GMM?

1. Buy the code and unzip the file

R: https://gum.co/pBJhf

Python: https://gum.co/muEXh

2. Download and unzip the supplemental zip file (this is free)

R: http://univprofblog.html.xdomain.jp/code/R_scripts_functions.zip

Python: http://univprofblog.html.xdomain.jp/code/supportingfunctions.zip

3. Place the supplemental files at the same directory or folder as that of the GMM code.

4. Prepare data set. For data format, see the article below.

https://medium.com/@univprofblog1/data-format-for-matlab-r-and-python-codes-of-data-analysis-and-sample-data-set-9b0f845b565a#.3ibrphs4h

5. Run the code!

Cluster number for each sample is saved in ”ClusterNum.csv”.

Required settings

Please see the article below.
 https://medium.com/@univprofblog1/settings-for-running-my-matlab-r-and-python-codes-136b9e5637a1#.paer8scqy

Examples of execution results