Radial basis function network

Published in

Geek Culture

5 min readJan 6, 2019

If the classes or pattern are linearly separable then single layer Perceptron is sufficient otherwise we need to incorporate hidden layers in the network to introduce non-linearity in network.

Hidden layer simply represent non linear boundaries by set of pair wise linear boundary.

In general there is always a possibility that the input feature vector which is not linearly separable on current dimensions can become linearly separable while projecting on higher dimensions.

Lets look at an example of OR and XOR gates output we can see OR output can be linearly classified but the same is not true with XOR outputs as we need two linear lines to separate the boundaries of output (0 and 1) .

So coming to Radial Basis Function (RBF) what it does for our above problem of non linear separable patterns.

RBF performs nonlinear transformation over input vector before they are fed for classification with help of below transformations.

a) Imposes non linear transformation on input feature vector.

b) Increases Dimensionality of feature vector.

In above image green and red are feature vector of two different classes which are quite evident that they are non linearly separable so once we impose RBF what it does that it perform non linear transformation on feature vectors which in effect compress them along width and stretches them along length due to which it becomes linearly separable.

Along with the non linear transformation the dimensionality is increased up to M dimension from P original dimension with below equations

RBF behaviour:

Each RBF function have a receptor and values of the function either increases or decreases while moving away from receptor t.

r is distance between receptor t and any input feature vector X

Above are the general choice for RBF but gaussian is most popular one, we can see the function decays while going away from the receptor.

NETWORK ARCHITECTURE OF RBF NEURAL NETWORK :

Radial basis function (RBF) networks typically have three layers: an input layer, a hidden layer with a non-linear RBF activation function and a linear output layer. The input can be modeled as a vector of real numbers . The output of the network is then a scalar function of the input vector.

d is dimensionality of input feature space , M is dimensionalty of transformed feature space where we have imposed our RBF, C is number of classes to be identified.

Training Comprises for these kind of network in two phases:

a) Training Hidden layer which comprises of M RBF functions, the parameters to be determined for RBF function are receptor position t and the Sigma in case of Gaussion RBF.

b) Training weight vectors Wij for output layer.

Training Hidden layer:

So for training hidden layers there are different approaches, let us assume for now we are dealing with Gaussian RBF so we need to determine receptor t and Spread ie Sigma. One of the approach is to randomly select M number of receptor from N number of sample feature vector but this does not seems logical so we can go ahead with clustering mechanism to determine receptors ti.

As we have M nodes in hidden layer and N samples so for clustering to work here N>M.

Calculation of receptors:

Let’s look at above example where we have M=3 so we need to to determine three t’s. so initially we divide out feature vector space in to three arbitrary clusters and took their means as the initial receptors, then we need to iterate for every sample feature vector and perform below steps:

a) From the selected input feature vector x determine distances of means(t1,t2,t3) of three different clusters whichever distance mean is minimum the sample x will get assigned to that cluster.

b) After x got assigned to different cluster all the means(t1,t2,t3) gets recomputed

c) Perform step 1 and step 2 for all sample points

Once the iteration finishes we will get the optimal t1,t2 and t3.

Calculation of Sigma:

once receptors are calculated we can use K nearest neighbor algorithm to calculate sigma the formula is there in above image. we need to select the values of P.

Training Weight Vectors :::

Let us assume the dimensionality of hidden layer as M and sample size as N the we can calculate the optimal weight vector for the network using the pseudo inverse matrix solution.

Let us assume we don’t have perfect solution so we will take the error e and try to optimize our criteria function J(wj) using closed form solution to get our optimum Wj

The above Wj calculation can be done for every output node j =1,2….C to get the respective weight vectors for them.

ADVANTAGES:

1) In Comparison to MultiLayer Perceptron the training phase is faster as there is no back propagation learning involved.

2)The interpretation roles of Hidden layer nodes are easy as in comparison to multi layer perceptron.

3) Number of hidden layer and number of nodes in hidden layer are decided in case of RBF network but in multi layer perceptron there is no analytical approach to decide the number of nodes in hidden layer or number of hidden layer.

DISADVANTAGES:

Although the training is faster in RBF network but classification is slow in comparison to Multi layer Perceptron due to fact that every node in hidden layer have to compute the RBF function for the input sample vector during clssification.

Example: Solution for XOR gate using RBF Network:

Here optimal weights are -1,+1,+1 and -1 and at output we have enforced hard thresholding function as if value is -ve o/p is 0 if it’s +ve o/p is 1

Architecture diagram for XOR:

Thank You

Radial basis function network

Written by Rahul Kumar