Generative Topographic Mapping (GTM): R code — All you have to do is just preparing data set (very simple, easy and practical)

I release R code of Generative Topographic Mapping (GTM). They are very easy to use. You prepare data set, and just run the code! Then, GTM can be obtained. Very simple and easy!

You can buy each code from the URL below.

R

https://gum.co/achdE
 Please download the supplemental zip file (this is free) from the URL below to run the GTM code.
 http://univprofblog.html.xdomain.jp/code/R_scripts_functions.zip

Procedure of GTM in the R code

To perform appropriate GTM, the R code follow the procedure below, after data set is loaded.

1. Decide map size of GTM
 You decide the number of grids in column and that in row. For example, 10 x 10 grids. The number of all grids, which means (the number of columns) x (the number of rows), should be lower than the number of samples.

2. Decide the number of Radial Basis Functions (RBFs)
 Gaussian function is used as RBF basically. You decide the number of RBFs in column and that in row. For example, it is 3 x 3 RBFs. These RBFs are placed on GTM map uniformly. When the number of RBFs is large, data set can be visualized flexibly, but GTM map tends to be complicated.

3. Decide variance of Gaussian function
 The larger variance produces more smooth GTM map. For example, it is 1.

4. Decide the parameter lambda in EM algorithm
 This lambda is related to the weight W. Default is 0. If you cannot obtain desired GTM map after changing the other parameters, you change lambda to tiny value such as 0.001 and 0.0001.

5. Decide the number of iterations in training GTM
 This number should be large. For example, it is 100.

6. Initialize the inverse of variance in data space beta and the weight W, based on Principal Component Analysis (PCA)

7. Autoscale each variable (if necessary)
 Autoscaling means centering and scaling. Mean of each variable becomes zero by subtracting mean of each variable from the variable in centering. Standard deviation of each variable becomes one by dividing standard deviation of each variable from the variable in scaling.

8. Train GTM map

9. Check the points on GTM for each sample
 On GTM map, each sample is represented as probability distribution, which means that each grid has existence probability of each sample. Of course, sum of the probability is one for a sample. There are two ways to decide the point on GTM map for each sample. One is the point where the probability is the highest. The other is the average point weighted by the probability. Both points should be checked.

How to perform GTM

1. Buy the code and unzip the file

R: https://gum.co/achdE

2. Download and unzip the supplemental zip file (this is free)

R: http://univprofblog.html.xdomain.jp/code/R_scripts_functions.zip

3. Place the supplemental files at the same directory or folder as that of the GTM code.

4. Prepare data set. For data format, see the article below.

https://medium.com/@univprofblog1/data-format-for-matlab-r-and-python-codes-of-data-analysis-and-sample-data-set-9b0f845b565a#.3ibrphs4h

5. Run the code!

Required settings

Please see the article below.
 https://medium.com/@univprofblog1/settings-for-running-my-matlab-r-and-python-codes-136b9e5637a1#.paer8scqy

Examples of execution results